Skip to content

Add Vector Search semantic product discovery example#153

Open
janniklasrose wants to merge 2 commits intomainfrom
janniklasrose/vector-search-example
Open

Add Vector Search semantic product discovery example#153
janniklasrose wants to merge 2 commits intomainfrom
janniklasrose/vector-search-example

Conversation

@janniklasrose
Copy link
Copy Markdown

Summary

Adds a Declarative Automation Bundle under contrib/vector_search_product_discovery/ that demonstrates semantic product search end-to-end with Databricks Vector Search:

  • vector_search_endpoints + vector_search_indexes declared as bundle resources, with jobs referencing them via ${resources.*.name} so dev-mode prefixing flows through automatically
  • Direct Access index (engine: direct in databricks.yml); descriptions are embedded explicitly in 01_upsert_products.py and the query notebook embeds the query before calling similarity_search — Direct Access indexes don't auto-embed (that's a Delta Sync feature)
  • schema_json uses the flat {"col":"type"} form required by the API

Dependency

Requires databricks/cli#5123 (still open), which lands vector_search_indexes as a first-class DABs resource on the direct engine. Until that PR merges and ships in a CLI release, databricks bundle deploy against this example will fail to recognize the vector_search_indexes resource type.

Test plan

  • databricks bundle validate against a CLI built from Add vector_search_indexes resource (direct engine) cli#5123
  • databricks bundle deploy — endpoint reaches ONLINE, index created
  • databricks bundle run product_discovery_setup — products embedded and upserted
  • databricks bundle run product_discovery_query --params "query=footwear for slippery wet trails" — returns ranked results
  • databricks bundle destroy — clean teardown

This pull request and its description were written by Isaac.

Demonstrates a Direct Access Vector Search index and endpoint declared
as bundle resources (vector_search_endpoints, vector_search_indexes),
tested e2e against staging with the direct engine.

Key design decisions:
- Jobs use resource references (${resources.*.name}) for endpoint and
  index names so dev-mode prefixing flows through automatically
- schema_json uses flat {"col":"type"} format required by the API
- Notebooks embed descriptions/queries explicitly (Direct Access indexes
  don't auto-embed; that's a Delta Sync feature)
- engine: direct set in bundle config so no env var is needed

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant