[SPARK-56853] Improve PATH Tests by srielau · Pull Request #55866 · apache/spark

srielau · 2026-05-13T19:41:19Z

What changes were proposed in this pull request?

This is a test-only PR that closes coverage gaps identified by a follow-up QA audit of the SQL Standard PATH feature delivered in SPARK-56489, SPARK-56501, SPARK-56518, SPARK-56520, SPARK-56605, SPARK-56639, SPARK-56681, and SPARK-56750. No product code is changed.

New test surfaces:

Spark Connect E2E — sql/connect/.../SqlPathE2ETestSuite.scala (new): SET PATH and current_path() round-trip over the gRPC client; a persisted view created under one path resolves its body under the frozen path even when the invoker switches PATH; SET PATH is rejected with UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED over Connect when the feature flag is off. Previously, ProtoToParsedPlanTestSuite pinned PATH_ENABLED=false for analyzer isolation and nothing exercised the feature over the wire.
PySpark API — python/pyspark/sql/tests/test_catalog.py: three new methods added to CatalogTestsMixin (test_path_current_path_disabled, test_path_set_path_and_current_path, test_path_set_path_rejected_when_disabled). Because the mixin is shared with test_parity_catalog.py, the same tests run under classic PySpark and under Connect parity. Previously pyspark.sql.functions.current_path had no test calling it, and SET PATH was never exercised from Python.
Golden SQL — sql-tests/inputs/sql-path.sql + matching results/ and analyzer-results/ outputs. The file has a TOC and ten labeled sections: default path observability; SET PATH grammar (literal, DEFAULT_PATH, SYSTEM_PATH, PATH append, current_schema/current_database, backticks/case); CURRENT_PATH() and the ANSI no-parens form; the full set of static error conditions; routine resolution via PATH for scalar AND table functions; relation resolution via PATH; persisted view frozen-path behavior incl. invoker-context current_schema/current_path; SQL function frozen-path for scalar and table functions incl. the invoker-context guard; the spark.sql.defaultPath conf (override, expansion to conf, invalid-value rejection); and PATH-disabled behaviors. Previously zero .sql input files referenced PATH.
COUNT(*) → COUNT(1) rewrite gate — three tests in SetPathSuite exercise the path-driven gate: (1) the fixed-point analyzer (Analyzer.matchesFunctionName / FunctionResolution.isSessionBeforeBuiltinInPath); (2) the gate fires ONLY when a temp count exists (an unrelated temp must not affect the rewrite); (3) the single-pass-resolver counterpart (FunctionResolverUtils.isUnqualifiedCountShadowedByTemp via isNonDistinctCount / handleStarInArguments), gated under spark.sql.analyzer.singlePassResolver.enabled = true and asserted via the analyzed-plan output type (BIGINT vs INT) to avoid the DeserializeToObject / CreateSQLFunctionCommand operators the single-pass analyzer does not yet support.
ALTER VIEW ... WITH SCHEMA preserves the frozen path — new test in v1.AlterViewSchemaBindingSuite asserts that generateViewProperties(captureNewPath=false) keeps the persisted VIEW_RESOLUTION_PATH intact even when the caller's session PATH differs from the create-time path.
Catalyst unit tests — new SqlPathFormatSuite (toDescribeJson valid / malformed payloads, formatForDisplay quoting behavior, multi-level namespaces). CatalogManagerSuite is extended with direct PathElement.validateNoStaticDuplicates cases (case sensitivity, repeated CurrentSchemaEntry, literal-vs-CurrentSchemaEntry toleration, multi-part and dot-containing identifier error formatting) plus serializePathEntries round-tripping (incl. multi-level / quoted / space-containing identifiers) and deserializePathEntriesOrFail error messaging. The path-element tests live in CatalogManagerSuite rather than a separate suite because they are pure-data tests on a single-purpose helper that CatalogManager is the only consumer of; merging keeps the catalyst test-file count down.
PATH-disabled compat for persisted views — two new tests in SQLViewSuite pin the documented behavior of CatalogManager.resolutionPathEntriesForAnalysis when spark.sql.path.enabled=false: the pinned frozen path is dropped and analysis falls back to the recorded viewCatalogAndNamespace. The fully-qualified body path keeps working; the unqualified body resolves through the recorded namespace (or raises TABLE_OR_VIEW_NOT_FOUND when nothing in scope matches).
Concurrency smoke test — new SetPathSuite case runs SET PATH (alternating between two paths) and unqualified count(*) lookups concurrently for 200 iterations each, validating the in-source claim that SessionCatalog.lookupBuiltinOrTempFunction is intentionally non-synchronized to avoid lock-order inversion with CatalogManager.synchronized.
cloneSession() propagation matrix — replaces the in-source TODO in SetPathSuite ("audit and pin down clone semantics in a follow-up") with six tests pinning what does and does not survive spark.cloneSession(). Confirmed: stored SET PATH, USE SCHEMA, temp views, and temp functions (because functionRegistry.clone() deep-copies them in BaseSessionStateBuilder) propagate; temp variables do not; a child's SET PATH does not leak back to the parent.
V2 catalogs in SET PATH — new SqlPathV2CatalogSuite registers two InMemoryCatalog instances and verifies first-match resolution for unqualified tables (id=10 vs id=20 distinguishes which catalog supplied the row) and for unqualified V2 functions (a tiny StrLenTimes100 ScalarFunction[Int] returning s.length * 100 is registered on pathcat2, so strlen('abc') returns 3 or 300 depending on path order), plus the negative case where the unqualified name only lives in a catalog that is not on the path. Previously ProcedureSuite had a single V2-catalog test for the procedure side; tables and functions through real V2 catalogs were uncovered.

Explicitly deferred:

DESCRIBE FUNCTION rendering of the frozen path (DescribeFunctionCommandUtils.storedResolutionPathString). DESCRIBE FUNCTION needs broader improvements first; will land in a follow-up.

Why are the changes needed?

A QA audit of the PATH feature found several substantive gaps:

Spark Connect serialization of AnalysisContext.resolutionPathEntries was entirely untested; a regression dropping pinned entries from the proto plan would not have been caught.
The public current_path() PySpark function was documented (with a +SKIP doctest) but never called from any test.
The PATH feature had no SQLQueryTestSuite golden file, despite golden coverage being the project convention for SQL-level features of similar surface (CURRENT_SCHEMA, variables, identifier clauses).
The path-driven gate for COUNT(*) → COUNT(1) was a behavior change called out in the SPARK-56750 description but had no test that flipped PATH and verified the suppression -- in either the fixed-point or the single-pass resolver.
ALTER VIEW ... WITH SCHEMA preserving the frozen path was undocumented in tests.
PathElement, SqlPathFormat, and CatalogManager.serializePathEntries were only exercised end-to-end via SQL.
The PATH-disabled read path through a view that already persisted a frozen path was untested (a forward-compat scenario likely to occur during rolling upgrades).
The deadlock-safety claim in SessionCatalog about non-synchronized lookups was unvalidated.
The cloneSession() propagation matrix was flagged as "incidental" with an in-source TODO.
Real V2 catalogs in SET PATH were not exercised end-to-end (only stub cat/cat2 for procedures).

These tests pin the documented intent so future changes have to update the assertions deliberately.

Does this PR introduce any user-facing change?

No. Test-only changes.

How was this patch tested?

The new tests are themselves the patch. They all pass locally:

build/sbt "connect-client-jvm/testOnly *SqlPathE2ETestSuite"
build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z sql-path.sql" (and regenerated via SPARK_GENERATE_GOLDEN_FILES=1)
build/sbt "sql/testOnly org.apache.spark.sql.SetPathSuite"
build/sbt "sql/testOnly org.apache.spark.sql.execution.command.v1.AlterViewSchemaBindingSuite"
build/sbt "sql/testOnly *SimpleSQLViewSuite -- -z SPARK-56853"
build/sbt "sql/testOnly *SqlPathV2CatalogSuite"
build/sbt "catalyst/testOnly *SqlPathFormatSuite *CatalogManagerSuite"
./dev/lint-scala passes (Scalastyle and Scalafmt).

PySpark tests (test_path_* in CatalogTestsMixin and the Connect parity variant) are syntactically validated; they are picked up automatically by the existing pyspark.sql.tests.test_catalog and pyspark.sql.tests.connect.test_parity_catalog modules registered in dev/sparktestsupport/modules.py and will run in CI.

Was this patch authored or co-authored using generative AI tooling?

Yes. Test design and implementation were iterated with coding-assistant tooling; the author reviewed and owns the final patch.

Generated-by: Cursor with Claude Opus 4.7

cloud-fan

This is a test-only PR closing coverage gaps in the SQL Standard PATH feature (SPARK-56489 et al.). Coverage is broad and the in-source claims that the tests reference (Analyzer.matchesFunctionName, lookupBuiltinOrTempFunction non-synchronization, BaseSessionStateBuilder.functionRegistry.clone(), generateViewProperties(captureNewPath = false), resolutionPathEntriesForAnalysis semantics) all hold up.

A few small consistency / accuracy items inline. None blocking.

Summary of additions

PySpark API (current_path, SET PATH, disabled-feature error) — runs under classic and Connect parity via CatalogTestsMixin
Spark Connect E2E — proves AnalysisContext.resolutionPathEntries survives gRPC plan reification; persisted view frozen-path preserved across SET PATH
Golden SQL (sql-path.sql + analyzer/results) — 10 sections covering grammar, shortcuts, errors, frozen path, DEFAULT_PATH conf, and disabled state
COUNT(*) → COUNT(1) rewrite gate — suppression when session shadows builtin under SET PATH
ALTER VIEW ... WITH SCHEMA preserves the frozen path (v1 only; v2 uses viewInfoBuilderFrom and doesn't hit the same code path)
Catalyst unit tests — SqlPathFormat, PathElement.validateNoStaticDuplicates, CatalogManager.serializePathEntries round-trip / deserializePathEntriesOrFail error format
cloneSession matrix — 6 tests pinning what propagates: SET PATH ✓, USE ✓, temp views ✓, temp functions ✓ (with snapshot semantics check), temp variables ✗, child→parent leak ✗
V2 catalogs in SET PATH — table resolution (first-match verified) and function resolution
PATH-disabled view fallback — pinned entries dropped, falls back to viewCatalogAndNamespace
Concurrency smoke test — validates the non-synchronized lookupBuiltin* claim in SessionCatalog

On the "Explicitly deferred" note

The PR description says testing FunctionResolverUtils.isUnqualifiedCountShadowedByTemp "is not actionable yet" because "the single-pass resolver wiring for PATH was reverted in the SPARK-56750 follow-ups." That method still exists at FunctionResolverUtils.scala:115 and is called by isNonDistinctCount at line 107 — the wiring looks live, not reverted. Either the claim is stale (and a test parallel to the fixed-point SetPathSuite cases is in fact actionable), or there's a runtime gate that keeps this path dormant. Worth saying which one so a future author knows when the deferral resolves.

srielau · 2026-05-14T16:55:32Z

Thanks for the careful review @cloud-fan! All four items addressed in 542eee8:

SqlPathV2CatalogSuite first-match function test — your read was right, the two StrLen impls returned identical lengths so swapping path order proved nothing. Now uses a tiny StrLenTimes100 ScalarFunction[Int] on pathcat2 so strlen('abc') returns 3 or 300 depending on path order. Structurally symmetric with the table test (id=10 vs id=20).
SQLViewSuite test names — SPARK-56853: prefix restored on both new tests; matches the file-local convention (your SPARK-56639 frozen-path tests right next to them are feature tests with the prefix too, so the convention isn't bug-regression-only).
CatalogManagerSuite / PathElementSuite — taking the "update the PR description" option. PathElement is a small single-purpose helper whose only consumer is CatalogManager, so the seven validate-no-duplicates tests sit naturally alongside the existing serialize/deserialize cases in the sibling suite. PR description corrected to match. Happy to re-extract if you'd prefer the strict source-mirror pattern.
"Explicitly deferred" claim about the single-pass resolver — you were right, my claim was stale. FunctionResolverUtils.isUnqualifiedCountShadowedByTemp IS live and wired into isNonDistinctCount / handleStarInArguments; only the spark.sql.analyzer.singlePassResolver.enabled conf default keeps the path dormant in CI. Added a parallel SetPathSuite test that enables that conf for the SELECT only (CREATE TEMPORARY FUNCTION and SET PATH stay on the fixed-point analyzer because the single-pass analyzer doesn't yet support those operators) and asserts the analyzed-plan output dataType (BIGINT when the shortcut fires, INT when the gate suppresses it and the temp count(INT) RETURN x + 100 wins). Bypassing checkAnswer avoids the DeserializeToObject operator the single-pass analyzer also doesn't yet support, but the gate predicate is exercised end-to-end through the analyzer. The "deferred" bullet for this item has been removed from the PR description.

Follow-up changes after the initial PR landed on the branch: - Strip the SPARK-56853 JIRA tag from test names, comments, and file headers across the new tests. These are feature-coverage tests, not bug regressions, so the JIRA label does not belong inline. - Correct the inaccurate "view's home schema" framing in the two new PATH-disabled compat tests in `SQLViewSuite`. The fallback target is the view's `viewCatalogAndNamespace` property -- i.e. the creator session's USE state captured at CREATE VIEW time -- not the schema the view physically lives in. The previous comment overgeneralized from a setup choice (USE-ing the destination schema before CREATE) to the general behavior. - Merge `PathElementSuite` into `CatalogManagerSuite`. The seven `PathElement.validateNoStaticDuplicates` tests are pure data tests living in the same package as the consumer; folding them into the sibling suite keeps the test file count down without losing coverage. `CatalogManagerSuite` is now 16 tests grouped by section comment. - Restructure `sql-tests/inputs/sql-path.sql` as the primary SQL-level reference for the feature. Adds a self-describing header, a Table of Contents, and ten clearly delimited sections covering: default path observability; SET PATH grammar (literal, DEFAULT_PATH, SYSTEM_PATH, PATH append, current_schema/current_database); CURRENT_PATH() builtin and its ANSI no-parens form; the full set of static error conditions; routine resolution via PATH for scalar AND table functions; relation resolution via PATH; persisted view frozen-path behavior including invoker-context current_schema/current_path; SQL function frozen-path behavior for scalar and table functions including the invoker-context guard; the `spark.sql.defaultPath` conf (explicit override, expansion, rejection of invalid values and the PATH keyword); and PATH-disabled behaviors. The corresponding `results/` and `analyzer-results/` golden files were regenerated and reviewed.

…ud-fan

…ud-fan - SqlPathV2CatalogSuite: the V2-function first-match test was using two `StrLen` impls that both return `s.length`, so swapping path order did not change the row and the test only proved "neither catalog raised NOT_FOUND". Add a tiny distinguishable `StrLenTimes100` ScalarFunction next to the suite and register it on `pathcat2` so the result is `3` vs `300` depending on which catalog supplies the function -- now symmetric with the parallel table test (id=10 vs id=20). - SQLViewSuite: restore the `SPARK-56853:` prefix on the two new tests to match the file-local convention (all surrounding tests, including the SPARK-56639 feature tests right next to them, use the JIRA prefix). The PR description's own `-z SPARK-56853` command now matches. - SetPathSuite: the single-pass-resolver counterpart of the COUNT(*) rewrite gate IS actionable -- `FunctionResolverUtils.isUnqualifiedCount ShadowedByTemp` is wired into `isNonDistinctCount` / `handleStarInArguments`; it is only the `singlePassResolver.enabled` conf default that keeps the path dormant. Add a parallel test that enables `spark.sql.analyzer.singlePassResolver.enabled` for the SELECT (CREATE TEMPORARY FUNCTION and SET PATH stay on the fixed-point analyzer because the single-pass analyzer does not yet support those operators) and asserts the analyzed-plan output type: BIGINT when the rewrite fires (builtin count) vs INT when the gate suppresses it and the temp count(INT) RETURN x + 100 wins. Bypassing `checkAnswer` avoids the `DeserializeToObject` operator the single-pass analyzer also does not yet support, while still exercising the analyzer-side gate.

cloud-fan approved these changes May 14, 2026

View reviewed changes

srielau added 3 commits May 14, 2026 14:05

[SPARK-56853] Improve PATH Tests

8f75f9e

srielau force-pushed the SPARK-56853-path-qa-gaps branch from 542eee8 to 0d3c0ea Compare May 14, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56853] Improve PATH Tests#55866

[SPARK-56853] Improve PATH Tests#55866
srielau wants to merge 3 commits into
apache:masterfrom
srielau:SPARK-56853-path-qa-gaps

srielau commented May 13, 2026 •

edited

Loading

Uh oh!

cloud-fan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srielau commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

srielau commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Summary of additions

On the "Explicitly deferred" note

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srielau commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

srielau commented May 13, 2026 •

edited

Loading