[SPARK-56853] Improve PATH Tests#55866
Conversation
cloud-fan
left a comment
There was a problem hiding this comment.
This is a test-only PR closing coverage gaps in the SQL Standard PATH feature (SPARK-56489 et al.). Coverage is broad and the in-source claims that the tests reference (Analyzer.matchesFunctionName, lookupBuiltinOrTempFunction non-synchronization, BaseSessionStateBuilder.functionRegistry.clone(), generateViewProperties(captureNewPath = false), resolutionPathEntriesForAnalysis semantics) all hold up.
A few small consistency / accuracy items inline. None blocking.
Summary of additions
- PySpark API (
current_path,SET PATH, disabled-feature error) — runs under classic and Connect parity viaCatalogTestsMixin - Spark Connect E2E — proves
AnalysisContext.resolutionPathEntriessurvives gRPC plan reification; persisted view frozen-path preserved acrossSET PATH - Golden SQL (
sql-path.sql+ analyzer/results) — 10 sections covering grammar, shortcuts, errors, frozen path,DEFAULT_PATHconf, and disabled state COUNT(*) → COUNT(1)rewrite gate — suppression when session shadows builtin underSET PATHALTER VIEW ... WITH SCHEMApreserves the frozen path (v1 only; v2 usesviewInfoBuilderFromand doesn't hit the same code path)- Catalyst unit tests —
SqlPathFormat,PathElement.validateNoStaticDuplicates,CatalogManager.serializePathEntriesround-trip /deserializePathEntriesOrFailerror format cloneSessionmatrix — 6 tests pinning what propagates:SET PATH✓,USE✓, temp views ✓, temp functions ✓ (with snapshot semantics check), temp variables ✗, child→parent leak ✗- V2 catalogs in
SET PATH— table resolution (first-match verified) and function resolution - PATH-disabled view fallback — pinned entries dropped, falls back to
viewCatalogAndNamespace - Concurrency smoke test — validates the non-synchronized
lookupBuiltin*claim inSessionCatalog
On the "Explicitly deferred" note
The PR description says testing FunctionResolverUtils.isUnqualifiedCountShadowedByTemp "is not actionable yet" because "the single-pass resolver wiring for PATH was reverted in the SPARK-56750 follow-ups." That method still exists at FunctionResolverUtils.scala:115 and is called by isNonDistinctCount at line 107 — the wiring looks live, not reverted. Either the claim is stale (and a test parallel to the fixed-point SetPathSuite cases is in fact actionable), or there's a runtime gate that keeps this path dormant. Worth saying which one so a future author knows when the deferral resolves.
|
Thanks for the careful review @cloud-fan! All four items addressed in 542eee8:
|
Follow-up changes after the initial PR landed on the branch: - Strip the SPARK-56853 JIRA tag from test names, comments, and file headers across the new tests. These are feature-coverage tests, not bug regressions, so the JIRA label does not belong inline. - Correct the inaccurate "view's home schema" framing in the two new PATH-disabled compat tests in `SQLViewSuite`. The fallback target is the view's `viewCatalogAndNamespace` property -- i.e. the creator session's USE state captured at CREATE VIEW time -- not the schema the view physically lives in. The previous comment overgeneralized from a setup choice (USE-ing the destination schema before CREATE) to the general behavior. - Merge `PathElementSuite` into `CatalogManagerSuite`. The seven `PathElement.validateNoStaticDuplicates` tests are pure data tests living in the same package as the consumer; folding them into the sibling suite keeps the test file count down without losing coverage. `CatalogManagerSuite` is now 16 tests grouped by section comment. - Restructure `sql-tests/inputs/sql-path.sql` as the primary SQL-level reference for the feature. Adds a self-describing header, a Table of Contents, and ten clearly delimited sections covering: default path observability; SET PATH grammar (literal, DEFAULT_PATH, SYSTEM_PATH, PATH append, current_schema/current_database); CURRENT_PATH() builtin and its ANSI no-parens form; the full set of static error conditions; routine resolution via PATH for scalar AND table functions; relation resolution via PATH; persisted view frozen-path behavior including invoker-context current_schema/current_path; SQL function frozen-path behavior for scalar and table functions including the invoker-context guard; the `spark.sql.defaultPath` conf (explicit override, expansion, rejection of invalid values and the PATH keyword); and PATH-disabled behaviors. The corresponding `results/` and `analyzer-results/` golden files were regenerated and reviewed.
…ud-fan - SqlPathV2CatalogSuite: the V2-function first-match test was using two `StrLen` impls that both return `s.length`, so swapping path order did not change the row and the test only proved "neither catalog raised NOT_FOUND". Add a tiny distinguishable `StrLenTimes100` ScalarFunction next to the suite and register it on `pathcat2` so the result is `3` vs `300` depending on which catalog supplies the function -- now symmetric with the parallel table test (id=10 vs id=20). - SQLViewSuite: restore the `SPARK-56853:` prefix on the two new tests to match the file-local convention (all surrounding tests, including the SPARK-56639 feature tests right next to them, use the JIRA prefix). The PR description's own `-z SPARK-56853` command now matches. - SetPathSuite: the single-pass-resolver counterpart of the COUNT(*) rewrite gate IS actionable -- `FunctionResolverUtils.isUnqualifiedCount ShadowedByTemp` is wired into `isNonDistinctCount` / `handleStarInArguments`; it is only the `singlePassResolver.enabled` conf default that keeps the path dormant. Add a parallel test that enables `spark.sql.analyzer.singlePassResolver.enabled` for the SELECT (CREATE TEMPORARY FUNCTION and SET PATH stay on the fixed-point analyzer because the single-pass analyzer does not yet support those operators) and asserts the analyzed-plan output type: BIGINT when the rewrite fires (builtin count) vs INT when the gate suppresses it and the temp count(INT) RETURN x + 100 wins. Bypassing `checkAnswer` avoids the `DeserializeToObject` operator the single-pass analyzer also does not yet support, while still exercising the analyzer-side gate.
542eee8 to
0d3c0ea
Compare
What changes were proposed in this pull request?
This is a test-only PR that closes coverage gaps identified by a follow-up QA audit of the SQL Standard PATH feature delivered in SPARK-56489, SPARK-56501, SPARK-56518, SPARK-56520, SPARK-56605, SPARK-56639, SPARK-56681, and SPARK-56750. No product code is changed.
New test surfaces:
sql/connect/.../SqlPathE2ETestSuite.scala(new):SET PATHandcurrent_path()round-trip over the gRPC client; a persisted view created under one path resolves its body under the frozen path even when the invoker switches PATH;SET PATHis rejected withUNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLEDover Connect when the feature flag is off. Previously,ProtoToParsedPlanTestSuitepinnedPATH_ENABLED=falsefor analyzer isolation and nothing exercised the feature over the wire.python/pyspark/sql/tests/test_catalog.py: three new methods added toCatalogTestsMixin(test_path_current_path_disabled,test_path_set_path_and_current_path,test_path_set_path_rejected_when_disabled). Because the mixin is shared withtest_parity_catalog.py, the same tests run under classic PySpark and under Connect parity. Previouslypyspark.sql.functions.current_pathhad no test calling it, andSET PATHwas never exercised from Python.sql-tests/inputs/sql-path.sql+ matchingresults/andanalyzer-results/outputs. The file has a TOC and ten labeled sections: default path observability; SET PATH grammar (literal, DEFAULT_PATH, SYSTEM_PATH, PATH append, current_schema/current_database, backticks/case); CURRENT_PATH() and the ANSI no-parens form; the full set of static error conditions; routine resolution via PATH for scalar AND table functions; relation resolution via PATH; persisted view frozen-path behavior incl. invoker-context current_schema/current_path; SQL function frozen-path for scalar and table functions incl. the invoker-context guard; thespark.sql.defaultPathconf (override, expansion to conf, invalid-value rejection); and PATH-disabled behaviors. Previously zero.sqlinput files referenced PATH.COUNT(*) → COUNT(1)rewrite gate — three tests inSetPathSuiteexercise the path-driven gate: (1) the fixed-point analyzer (Analyzer.matchesFunctionName/FunctionResolution.isSessionBeforeBuiltinInPath); (2) the gate fires ONLY when a temp count exists (an unrelated temp must not affect the rewrite); (3) the single-pass-resolver counterpart (FunctionResolverUtils.isUnqualifiedCountShadowedByTempviaisNonDistinctCount/handleStarInArguments), gated underspark.sql.analyzer.singlePassResolver.enabled = trueand asserted via the analyzed-plan output type (BIGINT vs INT) to avoid theDeserializeToObject/CreateSQLFunctionCommandoperators the single-pass analyzer does not yet support.ALTER VIEW ... WITH SCHEMApreserves the frozen path — new test inv1.AlterViewSchemaBindingSuiteasserts thatgenerateViewProperties(captureNewPath=false)keeps the persistedVIEW_RESOLUTION_PATHintact even when the caller's session PATH differs from the create-time path.SqlPathFormatSuite(toDescribeJsonvalid / malformed payloads,formatForDisplayquoting behavior, multi-level namespaces).CatalogManagerSuiteis extended with directPathElement.validateNoStaticDuplicatescases (case sensitivity, repeatedCurrentSchemaEntry, literal-vs-CurrentSchemaEntrytoleration, multi-part and dot-containing identifier error formatting) plusserializePathEntriesround-tripping (incl. multi-level / quoted / space-containing identifiers) anddeserializePathEntriesOrFailerror messaging. The path-element tests live inCatalogManagerSuiterather than a separate suite because they are pure-data tests on a single-purpose helper thatCatalogManageris the only consumer of; merging keeps the catalyst test-file count down.SQLViewSuitepin the documented behavior ofCatalogManager.resolutionPathEntriesForAnalysiswhenspark.sql.path.enabled=false: the pinned frozen path is dropped and analysis falls back to the recordedviewCatalogAndNamespace. The fully-qualified body path keeps working; the unqualified body resolves through the recorded namespace (or raisesTABLE_OR_VIEW_NOT_FOUNDwhen nothing in scope matches).SetPathSuitecase runsSET PATH(alternating between two paths) and unqualifiedcount(*)lookups concurrently for 200 iterations each, validating the in-source claim thatSessionCatalog.lookupBuiltinOrTempFunctionis intentionally non-synchronized to avoid lock-order inversion withCatalogManager.synchronized.cloneSession()propagation matrix — replaces the in-sourceTODOinSetPathSuite("audit and pin down clone semantics in a follow-up") with six tests pinning what does and does not survivespark.cloneSession(). Confirmed: storedSET PATH,USE SCHEMA, temp views, and temp functions (becausefunctionRegistry.clone()deep-copies them inBaseSessionStateBuilder) propagate; temp variables do not; a child'sSET PATHdoes not leak back to the parent.SET PATH— newSqlPathV2CatalogSuiteregisters twoInMemoryCataloginstances and verifies first-match resolution for unqualified tables (id=10 vs id=20 distinguishes which catalog supplied the row) and for unqualified V2 functions (a tinyStrLenTimes100ScalarFunction[Int]returnings.length * 100is registered onpathcat2, sostrlen('abc')returns 3 or 300 depending on path order), plus the negative case where the unqualified name only lives in a catalog that is not on the path. PreviouslyProcedureSuitehad a single V2-catalog test for the procedure side; tables and functions through real V2 catalogs were uncovered.Explicitly deferred:
DescribeFunctionCommandUtils.storedResolutionPathString). DESCRIBE FUNCTION needs broader improvements first; will land in a follow-up.Why are the changes needed?
A QA audit of the PATH feature found several substantive gaps:
AnalysisContext.resolutionPathEntrieswas entirely untested; a regression dropping pinned entries from the proto plan would not have been caught.current_path()PySpark function was documented (with a+SKIPdoctest) but never called from any test.SQLQueryTestSuitegolden file, despite golden coverage being the project convention for SQL-level features of similar surface (CURRENT_SCHEMA, variables, identifier clauses).COUNT(*) → COUNT(1)was a behavior change called out in the SPARK-56750 description but had no test that flipped PATH and verified the suppression -- in either the fixed-point or the single-pass resolver.ALTER VIEW ... WITH SCHEMApreserving the frozen path was undocumented in tests.PathElement,SqlPathFormat, andCatalogManager.serializePathEntrieswere only exercised end-to-end via SQL.SessionCatalogabout non-synchronized lookups was unvalidated.cloneSession()propagation matrix was flagged as "incidental" with an in-source TODO.SET PATHwere not exercised end-to-end (only stubcat/cat2for procedures).These tests pin the documented intent so future changes have to update the assertions deliberately.
Does this PR introduce any user-facing change?
No. Test-only changes.
How was this patch tested?
The new tests are themselves the patch. They all pass locally:
build/sbt "connect-client-jvm/testOnly *SqlPathE2ETestSuite"build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z sql-path.sql"(and regenerated viaSPARK_GENERATE_GOLDEN_FILES=1)build/sbt "sql/testOnly org.apache.spark.sql.SetPathSuite"build/sbt "sql/testOnly org.apache.spark.sql.execution.command.v1.AlterViewSchemaBindingSuite"build/sbt "sql/testOnly *SimpleSQLViewSuite -- -z SPARK-56853"build/sbt "sql/testOnly *SqlPathV2CatalogSuite"build/sbt "catalyst/testOnly *SqlPathFormatSuite *CatalogManagerSuite"./dev/lint-scalapasses (Scalastyle and Scalafmt).PySpark tests (
test_path_*inCatalogTestsMixinand the Connect parity variant) are syntactically validated; they are picked up automatically by the existingpyspark.sql.tests.test_catalogandpyspark.sql.tests.connect.test_parity_catalogmodules registered indev/sparktestsupport/modules.pyand will run in CI.Was this patch authored or co-authored using generative AI tooling?
Yes. Test design and implementation were iterated with coding-assistant tooling; the author reviewed and owns the final patch.
Generated-by: Cursor with Claude Opus 4.7