Below is an AI generated summary of a bug I came across today:
Summary
optimizeOrExpression in packages/db/src/utils/index-optimization.ts is unsound. When a where clause contains an or(...) whose branches reference a mix of indexed and unindexed fields, the optimizer returns canOptimize: true with a result set that is the union of only the indexed branches — silently dropping items that would match the unindexed branch.
The result is that useLiveQuery / currentStateAsChanges returns an incomplete subset of the matching rows, with no warning or error. Callers don't fall back to a full scan because canOptimize is true.
Where the bug is
packages/db/src/utils/index-optimization.ts (verified on main and on the published 0.6.5 tarball — identical code):
function optimizeOrExpression(expression, collection) {
if (expression.type !== 'func' || expression.args.length < 2) {
return { canOptimize: false, matchingKeys: new Set() }
}
const results = []
// Try to optimize each part, keep the optimizable ones
for (const arg of expression.args) {
const result = optimizeQueryRecursive(arg, collection)
if (result.canOptimize) {
results.push(result)
}
// ⚠️ non-optimizable args are dropped on the floor
}
if (results.length > 0) {
const allMatchingSets = results.map((r) => r.matchingKeys)
const unionedKeys = unionSets(allMatchingSets)
return { canOptimize: true, matchingKeys: unionedKeys }
// ⚠️ canOptimize: true even though some branches were skipped
}
return { canOptimize: false, matchingKeys: new Set() }
}
For OR semantics this is unsound. If any branch can't be optimized, the optimizer can't soundly compute the answer from the indexes alone — the missing branch could match items not in any indexed branch's result set. The function should return canOptimize: false and let the consumer fall back to a full scan.
The corresponding optimizeAndExpression is fine because for AND, dropping a non-optimizable arg only widens the result set (the consumer is expected to apply the full predicate to that superset). For OR it narrows it.
Repro
Conceptual scenario that triggers it in practice:
- A collection has items where some match
field_A = X, others match field_B = Y, with no overlap.
- Some prior query has caused
field_A to be auto-indexed (autoIndex: 'eager' is the default).
field_B has not been queried in isolation, so no index exists on it.
- Run a live query with
where: or(eq(field_A, X), eq(field_B, Y)).
Expected: rows matching either branch.
Actual: only rows matching field_A = X. Rows matching field_B = Y (and not field_A = X) are dropped.
If you reorder the queries so a query touches field_B first (causing auto-indexing on field_B), the bug disappears, because now both branches are optimizable.
Suggested fix
function optimizeOrExpression(expression, collection) {
if (expression.type !== 'func' || expression.args.length < 2) {
return { canOptimize: false, matchingKeys: new Set() }
}
const results = []
for (const arg of expression.args) {
const result = optimizeQueryRecursive(arg, collection)
if (!result.canOptimize) {
// OR with any non-optimizable branch cannot be soundly index-optimized
return { canOptimize: false, matchingKeys: new Set() }
}
results.push(result)
}
const unionedKeys = unionSets(results.map((r) => r.matchingKeys))
return { canOptimize: true, matchingKeys: unionedKeys }
}
Below is an AI generated summary of a bug I came across today:
Summary
optimizeOrExpressioninpackages/db/src/utils/index-optimization.tsis unsound. When awhereclause contains anor(...)whose branches reference a mix of indexed and unindexed fields, the optimizer returnscanOptimize: truewith a result set that is the union of only the indexed branches — silently dropping items that would match the unindexed branch.The result is that
useLiveQuery/currentStateAsChangesreturns an incomplete subset of the matching rows, with no warning or error. Callers don't fall back to a full scan becausecanOptimizeistrue.Where the bug is
packages/db/src/utils/index-optimization.ts(verified onmainand on the published 0.6.5 tarball — identical code):For OR semantics this is unsound. If any branch can't be optimized, the optimizer can't soundly compute the answer from the indexes alone — the missing branch could match items not in any indexed branch's result set. The function should return
canOptimize: falseand let the consumer fall back to a full scan.The corresponding
optimizeAndExpressionis fine because for AND, dropping a non-optimizable arg only widens the result set (the consumer is expected to apply the full predicate to that superset). For OR it narrows it.Repro
Conceptual scenario that triggers it in practice:
field_A = X, others matchfield_B = Y, with no overlap.field_Ato be auto-indexed (autoIndex: 'eager'is the default).field_Bhas not been queried in isolation, so no index exists on it.where: or(eq(field_A, X), eq(field_B, Y)).Expected: rows matching either branch.
Actual: only rows matching
field_A = X. Rows matchingfield_B = Y(and notfield_A = X) are dropped.If you reorder the queries so a query touches
field_Bfirst (causing auto-indexing onfield_B), the bug disappears, because now both branches are optimizable.Suggested fix