HIVE-29465:Prevent excessive query results cache usage at runtime by ramitg254 · Pull Request #6376 · apache/hive

ramitg254 · 2026-03-18T07:16:24Z

What changes were proposed in this pull request?

Introducing safe cache writing conf which is when enabled writing to cache directory should not happen dirctly and if the entry is valid then only that entry should be copied to cache directory.

Why are the changes needed?

spilling of cache directory was happening when query as cleanup is done in the post execution.

Does this PR introduce any user-facing change?

No

How was this patch tested?

locally and ci test

sonarqubecloud · 2026-03-23T08:41:11Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.0% Duplication on New Code

See analysis details on SonarQube Cloud

abstractdog · 2026-03-23T14:33:58Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

        return false;
      }

+      if (isSafeCacheWriteEnabled) {


@ramitg254 : thanks for working on this so far
I'm not sure if the approach fully addresses what has been reported: as far as I can understand, there is a safe buffer directory, where the files are placed, and this safe folder is on the same storage, but that's not the only issue, it's rather that this doesn't prevent big files actually landing on the filesystem that holds the cache

the original report showed something like this:

du -h -d 1 /efs/tmp/hive/_resultscache_/results-9d89cc59-c99d-46a5-9d93-2b550576532012.0K ./66356edb-57a6-4f0a-90cd-7d14d9e2b739 ... 1.1T ./0fe343fb-6a89-4d28-b2fd-caed2f2e42f6 ... 1.1T .

I missed something to double-check before creating the jira: if the "0fe343fb-6a89-4d28-b2fd-caed2f2e42f6" folder belongs to a finished query result? if so - and given that it clearly exceeded the configured 2G max cache size - query results cache should have taken care of that, so I think the original problem/usecase should be investigated thoroughly first

@abstractdog I have written down my understanding of the problem below please validate:

Problem statement -> so when cache is enabled cache entry validation takes place at post execution causing cache entry already been written to cache directory in runtime exceeding max cache size which should not happen.

Replication of Problem Statement scenario -> I have added unit test which replicate this problem statemnt in TestCachedResults.java‎ in which for the same set of queries testUnsafeCacheWrite passes when at runtime cache directory size at runtime increases beyond max cache size allowed which is the current behaviour we have and the other is testSafeCacheWrite which is solving this issue for the same set of queries and not exceeding at any moment on runtime beyond max cache size allowed.

Solution: we are introducing safe cache write conf which is when enabled then query files for fetch work does not directly written to cache directory but we proceed it as normal query execution and since normal query execution also stores these files somewhere during runtime (like in local scratch dir in case of these unit tests) and by this way we are not maintaining any extra storage temporary buffer and if it fails then fails as normal query and if it succeeds then we perform validation checks for those files in normal query execution and if it is valid then we just copy those files to cache dir in post execution and if while moving these files if it fails in between then I am performing cleaanup as well for it.
And I made this configurable as there is overhead in copying files from location of normal query execution to cache directory

abstractdog · 2026-04-14T12:55:42Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+  private static final Logger LOG = LoggerFactory.getLogger(TestCachedResults.class);
+
+  @ClassRule
+  public static HiveTestEnvSetup envSetup = new HiveTestEnvSetup();


we don't need this, HiveConfForTest is sufficient in most of the cases

abstractdog · 2026-04-14T12:55:56Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+  public static HiveTestEnvSetup envSetup = new HiveTestEnvSetup();
+
+  @Rule
+  public TestRule methodRule = envSetup.getMethodRule();


I think don't need this either

abstractdog · 2026-04-14T12:56:30Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+
+  @BeforeClass
+  public static void setUp() throws Exception {
+    conf = envSetup.getTestCtx().hiveConf;


new HiveConfForTest(...)

abstractdog · 2026-04-14T12:59:20Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+    driver.run(sql);
+  }
+
+  private static long getFolderSize(File folder) {


this looks like reinventing the wheel, is there a chance we already have this implemented somewhere else?

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a “safe cache write” mode for Hive query results cache to prevent oversized/invalid entries from being written directly into the cache directory (reducing runtime cache overspill), with accompanying tests.

Changes:

Introduces hive.query.results.safe.cache.write.enabled configuration flag.
Updates results destination selection so safe mode writes outside the cache and later conditionally copies into the cache.
Adds runtime copy logic in QueryResultsCache#setEntryValid and a new test suite validating safe vs unsafe behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.

File	Description
ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java	Adds tests and a cache-size monitor to verify safe vs unsafe cache write behavior.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java	Routes output to default destination in safe mode and records a “safe” source directory.
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java	Implements safe-mode copy into cache on entry validation and cleanup helpers.
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java	Adds the new configuration variable and description text.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T13:03:50Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+
+  @Test
+  public void testSafeCacheWrite() throws Exception {
+    HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_QUERY_RESULTS_SAFE_CACHE_WRITE_ENABLED, true);


The tests mutate a shared static conf but testUnsafeCacheWrite never explicitly sets HIVE_QUERY_RESULTS_SAFE_CACHE_WRITE_ENABLED back to false. If testSafeCacheWrite runs first (test order is not guaranteed), testUnsafeCacheWrite will execute in safe mode and assert the opposite behavior. Make each test explicitly set the flag to the intended value (or reset it in @Before / @After).

Copilot · 2026-04-14T13:03:50Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+  }
+
+  @Test
+  public void testUnsafeCacheWrite() throws Exception {


The tests mutate a shared static conf but testUnsafeCacheWrite never explicitly sets HIVE_QUERY_RESULTS_SAFE_CACHE_WRITE_ENABLED back to false. If testSafeCacheWrite runs first (test order is not guaranteed), testUnsafeCacheWrite will execute in safe mode and assert the opposite behavior. Make each test explicitly set the flag to the intended value (or reset it in @Before / @After).

Suggested change

public void testUnsafeCacheWrite() throws Exception {

public void testUnsafeCacheWrite() throws Exception {

HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_QUERY_RESULTS_SAFE_CACHE_WRITE_ENABLED, false);

Copilot · 2026-04-14T13:03:51Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+  private static String cacheDir;
+
+  private ScheduledExecutorService scheduler;
+  private long maxCacheSize = 0;


maxCacheSize is written from the scheduler thread and read from the test thread without any memory-visibility guarantees, which can make assertions flaky (e.g., reading a stale 0). Use an AtomicLong (or make it volatile and update carefully) to ensure cross-thread visibility.

Suggested change

private long maxCacheSize = 0;

private volatile long maxCacheSize = 0;

Copilot · 2026-04-14T13:03:51Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+  private void startCacheMonitor(long intervalMs) {
+    scheduler.scheduleAtFixedRate(() -> {
+      long size = getFolderSize(new File(cacheDir));
+      maxCacheSize = Math.max(maxCacheSize, size);
+    }, 0, intervalMs, TimeUnit.MILLISECONDS);
+  }


maxCacheSize is written from the scheduler thread and read from the test thread without any memory-visibility guarantees, which can make assertions flaky (e.g., reading a stale 0). Use an AtomicLong (or make it volatile and update carefully) to ensure cross-thread visibility.

Copilot · 2026-04-14T13:03:51Z

ql/src/test/org/apache/hadoop/hive/ql/TestCachedResults.java

+    conf = envSetup.getTestCtx().hiveConf;
+
+    HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_QUERY_RESULTS_CACHE_ENABLED, true);
+    HiveConf.setVar(conf, HiveConf.ConfVars.HIVE_QUERY_RESULTS_CACHE_DIRECTORY, "/tmp/hive/cache");


Hard-coding /tmp/hive/cache in a test can cause collisions across concurrent test runs and may fail on environments without a writable /tmp (or on non-Unix setups). Prefer using a per-test temporary directory (e.g., test framework temp dirs / java.nio.file.Files#createTempDirectory) and set the conf to that path.

Copilot · 2026-04-14T13:03:52Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+            Path destFile = new Path(resultDir,
+                new Path(fs.getPath().toString().substring(safeDir.length() + 1)));


Building relative paths via substring(safeDir.length() + 1) is fragile: it assumes the string form of each path always starts with safeDir (same scheme/authority, same qualification, no trailing slash differences). If that assumption breaks, it can produce incorrect paths or throw StringIndexOutOfBoundsException. Prefer computing a proper relative path using qualified Path/URI comparison (e.g., qualify both paths and derive the relative suffix from URI paths) and validate that the source is inside safeDir before deriving the destination.

Copilot · 2026-04-14T13:03:52Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+          return false;
+        }
+        fetchWork.setFilesToFetch(cacheFilesToFetch);
+        fetchWork.setTblDir(new Path(resultDir, fetchWork.getTblDir().toString().substring(safeDir.length() + 1)));


Building relative paths via substring(safeDir.length() + 1) is fragile: it assumes the string form of each path always starts with safeDir (same scheme/authority, same qualification, no trailing slash differences). If that assumption breaks, it can produce incorrect paths or throw StringIndexOutOfBoundsException. Prefer computing a proper relative path using qualified Path/URI comparison (e.g., qualify both paths and derive the relative suffix from URI paths) and validate that the source is inside safeDir before deriving the destination.

Copilot · 2026-04-14T13:03:53Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+        rwLock.writeLock().lock();
+        boolean succeeded = true;
+        try {
+          for (FileStatus fs : fetchWork.getFilesToFetch()) {


The cache write lock is held while performing potentially slow filesystem copies. This can block unrelated cache operations for the duration of the copy and hurt concurrency under load. Consider copying files outside the cache rwLock, and only acquiring the lock for the minimal critical section where shared cache state is updated.

Copilot · 2026-04-14T13:03:53Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+            Path srcFile = fs.getPath();
+            Path destFile = new Path(resultDir,
+                new Path(fs.getPath().toString().substring(safeDir.length() + 1)));
+            succeeded = FileUtil.copy(srcFs, srcFile, cacheFs, destFile, false, conf);


The cache write lock is held while performing potentially slow filesystem copies. This can block unrelated cache operations for the duration of the copy and hurt concurrency under load. Consider copying files outside the cache rwLock, and only acquiring the lock for the minimal critical section where shared cache state is updated.

Copilot · 2026-04-14T13:03:53Z

common/src/java/org/apache/hadoop/hive/conf/HiveConf.java

+        "If the query results safe cache is enabled. This will safely write to cache directory by first evaluating " +
+            "the cache entry is not overspilling the the cache directory before writing it to cache directory "),


The new conf var description has grammatical issues (e.g., duplicated 'the', awkward phrasing) and it doesn’t clearly describe the actual behavior implemented (write results to a non-cache destination first, then copy into cache only after the entry is deemed valid/acceptable). Please reword to be precise and user-facing.

Suggested change

"If the query results safe cache is enabled. This will safely write to cache directory by first evaluating " +

"the cache entry is not overspilling the the cache directory before writing it to cache directory "),

"If enabled, query results are written to a temporary non-cache location first and copied into the " +

"cache directory only after the cache entry is accepted as valid, for example after size checks."),

agreed, I was about to comment the same, configuration should be clear about the tradeoff which is present under normal circumstances

abstractdog · 2026-04-14T13:05:37Z

common/src/java/org/apache/hadoop/hive/conf/HiveConf.java

+        "If the query results safe cache is enabled. This will safely write to cache directory by first evaluating " +
+            "the cache entry is not overspilling the the cache directory before writing it to cache directory "),


duplicated "the the"

abstractdog · 2026-04-14T13:22:08Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

+        cacheFs.mkdirs(resultDir);
+
+        Set<FileStatus> cacheFilesToFetch = new HashSet<>();
+        rwLock.writeLock().lock();


I can see that in other places in this class, this is handled via a separate variable:

Lock writeLock = rwLock.writeLock(); try{ writeLock.lock(); }

but that usage pattern is not consistent: feel free to pick one, and unify all the lock usages

abstractdog · 2026-04-14T13:28:43Z

ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java

        return false;
      }

+      if (isSafeCacheWriteEnabled) {


maybe refactor this whole block to a separate method

abstractdog · 2026-04-14T13:37:44Z

common/src/java/org/apache/hadoop/hive/conf/HiveConf.java

        "If the query results cache is enabled. This will keep results of previously executed queries " +
        "to be reused if the same query is executed again."),
-
+    HIVE_QUERY_RESULTS_SAFE_CACHE_WRITE_ENABLED("hive.query.results.safe.cache.write.enabled", false,


for correct alphabetical order this should rather be hive.query.results.cache.safe.write.enabled

ramitg254 changed the title ~~[WIP]HIVE-29465:Prevent excessive query results cache usage at runtime~~ [WIP] HIVE-29465:Prevent excessive query results cache usage at runtime Mar 18, 2026

asf-ci-hive added tests pending tests unstable and removed tests pending labels Mar 18, 2026

ramitg254 force-pushed the HIVE-29465 branch from 0cd2720 to 2a14d47 Compare March 19, 2026 07:35

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Mar 19, 2026

ramitg254 force-pushed the HIVE-29465 branch from 2a14d47 to 8a94cba Compare March 23, 2026 04:47

asf-ci-hive added tests pending and removed tests passed labels Mar 23, 2026

ramitg254 changed the title ~~[WIP] HIVE-29465:Prevent excessive query results cache usage at runtime~~ HIVE-29465:Prevent excessive query results cache usage at runtime Mar 23, 2026

ramitg254 changed the title ~~HIVE-29465:Prevent excessive query results cache usage at runtime~~ [WIP]HIVE-29465:Prevent excessive query results cache usage at runtime Mar 23, 2026

ramitg254 force-pushed the HIVE-29465 branch from 8a94cba to c7b6adc Compare March 23, 2026 06:26

ramitg254 changed the title ~~[WIP]HIVE-29465:Prevent excessive query results cache usage at runtime~~ HIVE-29465:Prevent excessive query results cache usage at runtime Mar 23, 2026

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Mar 23, 2026

ramitg254 force-pushed the HIVE-29465 branch from c7b6adc to bbb5c30 Compare March 23, 2026 06:56

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Mar 23, 2026

HIVE-29465:Prevent excessive query results cache usage at runtime

f841c0e

ramitg254 force-pushed the HIVE-29465 branch from bbb5c30 to f841c0e Compare March 23, 2026 07:32

asf-ci-hive added tests failed and removed tests pending tests failed labels Mar 23, 2026

asf-ci-hive added the tests pending label Mar 23, 2026

asf-ci-hive added tests passed and removed tests pending labels Mar 23, 2026

abstractdog reviewed Mar 23, 2026

View reviewed changes

ramitg254 requested a review from abstractdog March 23, 2026 15:47

abstractdog requested a review from Copilot April 14, 2026 12:54

abstractdog reviewed Apr 14, 2026

View reviewed changes

Copilot AI reviewed Apr 14, 2026

View reviewed changes

abstractdog reviewed Apr 14, 2026

View reviewed changes

Copilot started reviewing on behalf of abstractdog April 14, 2026 13:15 View session

abstractdog reviewed Apr 14, 2026

View reviewed changes

abstractdog requested changes Apr 14, 2026

View reviewed changes

abstractdog reviewed Apr 14, 2026

View reviewed changes

abstractdog requested changes Apr 14, 2026

View reviewed changes

	public void testUnsafeCacheWrite() throws Exception {
	public void testUnsafeCacheWrite() throws Exception {
	HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_QUERY_RESULTS_SAFE_CACHE_WRITE_ENABLED, false);

	private long maxCacheSize = 0;
	private volatile long maxCacheSize = 0;

		Path destFile = new Path(resultDir,
		new Path(fs.getPath().toString().substring(safeDir.length() + 1)));

		"If the query results safe cache is enabled. This will safely write to cache directory by first evaluating " +
		"the cache entry is not overspilling the the cache directory before writing it to cache directory "),

Conversation

ramitg254 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Mar 23, 2026

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

abstractdog Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ramitg254 commented Mar 18, 2026 •

edited

Loading

abstractdog Apr 14, 2026 •

edited

Loading