Skip to content

fix: change message indexes to sort created_at descending#147

Open
smoreinis wants to merge 9 commits intomainfrom
stas/messages-index
Open

fix: change message indexes to sort created_at descending#147
smoreinis wants to merge 9 commits intomainfrom
stas/messages-index

Conversation

@smoreinis
Copy link
Copy Markdown
Collaborator

@smoreinis smoreinis commented Feb 9, 2026

Summary

  • Sort created_at descending in message indexes, matching the typical newest-first query pattern
  • Remove redundant standalone task_id_idx from both messages and task_states collections — the compound indexes with task_id as leftmost prefix already cover task_id-only queries
  • Add unique: true to task_agent_compound_idx on task_states to enforce the (task_id, agent_id) uniqueness invariant at the database level
  • Auto drop-and-recreate indexes on startup when options change (e.g. sort direction, uniqueness), with fail-fast if recreate fails after drop
  • Clean up legacy task_id_idx zombie indexes on existing databases during startup
  • Wrap sync find_one in run_in_executor to avoid blocking the event loop in get_by_task_and_agent

Test plan

  • Verify indexes are recreated on startup with the new sort direction
  • Confirm message list queries return results in expected order
  • Repository unit tests pass (214/214)
  • Confirm index creation logs on startup show 3 indexes for messages, 2 for task_states
  • Verify unique constraint rejects duplicate (task_id, agent_id) inserts
  • Verify legacy task_id_idx is dropped from existing databases on startup

Greptile Summary

This PR hardens MongoDB index lifecycle management by implementing drop-and-recreate for indexes with changed options, cleaning up legacy task_id_idx indexes, flipping created_at sort direction to DESCENDING (matching all callers' default sort), adding a unique constraint to task_agent_compound_idx, and fixing a blocking synchronous find_one call in get_by_task_and_agent.

Confidence Score: 5/5

Safe to merge — previous P1 concerns (silent index mismatch, index-loss-on-recreate-failure) are fully addressed; remaining findings are P2 style suggestions.

All prior blocking concerns around index migration correctness have been resolved. The drop-and-recreate path properly aborts startup on recreate failure, and legacy cleanup is explicit. The two remaining comments are non-blocking style observations.

agentex/src/config/mongodb_indexes.py — the broad OperationFailure catch in legacy cleanup is the only open item worth addressing before the next release.

Important Files Changed

Filename Overview
agentex/src/config/mongodb_indexes.py Adds drop-and-recreate logic for indexes with changed options and a legacy cleanup block; OperationFailure catch in legacy cleanup is broader than IndexNotFound alone
agentex/src/domain/repositories/task_message_repository.py Changes created_at sort to DESCENDING in compound indexes and removes redundant task_id_idx — consistent with all callers defaulting to desc order
agentex/src/domain/repositories/task_state_repository.py Adds unique: True to task_agent_compound_idx, removes redundant task_id_idx, and fixes get_by_task_and_agent to use run_in_executor to avoid blocking the event loop

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ensure_mongodb_indexes called at startup] --> B[Import repository classes]
    B --> C{For each repo_class}
    C --> D[collection = db[COLLECTION_NAME]]
    D --> E{For each index_spec in INDEXES}
    E --> F[collection.create_index]
    F -->|Success| G[log ✓ next index]
    G --> E
    F -->|OperationFailure: different options| H[drop_index name]
    H -->|drop fails| I[log ✗ continue]
    I --> E
    H -->|drop success| J[collection.create_index recreate]
    J -->|success| K[log ✓ next index]
    K --> E
    J -->|fails| L[log ✗ RAISE — aborts startup]
    F -->|other error| M[log ✗ next index]
    M --> E
    E -->|all indexes done| C
    C -->|all repos done| N[Legacy index cleanup]
    N --> O{For each legacy_indexes entry}
    O --> P[collection.drop_index]
    P -->|success| Q[log ✓ dropped]
    Q --> O
    P -->|OperationFailure| R[pass — already gone]
    R --> O
    P -->|other Exception| S[log ⚠ warning]
    S --> O
    O -->|done| T[log: index creation completed]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: agentex/src/config/mongodb_indexes.py
Line: 134-135

Comment:
**OperationFailure catch is broader than IndexNotFound**

The comment says "Index doesn't exist — already clean", but `OperationFailure` covers many server-side errors beyond index-not-found (e.g., insufficient privileges, namespace issues). An unrelated `OperationFailure` during legacy cleanup would be silently dropped with `pass`. Scoping to error code 27 (`IndexNotFound`) makes the intent explicit and lets other failures surface:

```suggestion
        except OperationFailure as e:
            if e.code != 27:  # 27 = IndexNotFound — already clean
                logger.warning(
                    f"  ⚠ Unexpected OperationFailure dropping legacy index '{index_name}' "
                    f"from '{collection_name}': {e}"
                )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: agentex/src/domain/repositories/task_state_repository.py
Line: 43-50

Comment:
**`run_in_executor` pattern inconsistent with base class**

`get_by_task_and_agent` now correctly offloads its synchronous `find_one` call to a thread pool, avoiding event-loop blocking. However, the base class `MongoDBCRUDRepository` has many other `async` methods (`get`, `get_by_field`, `find_by_field`, `find_by_field_with_cursor`, `list`, etc.) that call synchronous pymongo operations directly without `run_in_executor`. It's worth tracking a follow-up to apply the same fix across all base-class read methods.

How can I resolve this? If you propose a fix, please make it concise.

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

Reviews (5): Last reviewed commit: "Merge branch 'main' into stas/messages-i..." | Re-trigger Greptile

@smoreinis smoreinis requested a review from a team as a code owner February 9, 2026 20:30
…k states

Drop standalone task_id_idx from both messages and task_states collections
since the compound indexes with task_id as leftmost prefix already cover
task_id-only queries. Add unique: true to task_agent_compound_idx to
enforce the (task_id, agent_id) uniqueness invariant at the database level.
@smoreinis smoreinis requested a review from danielmillerp April 3, 2026 17:17
Comment on lines 22 to 26
"keys": [("task_id", pymongo.ASCENDING), ("agent_id", pymongo.ASCENDING)],
"name": "task_agent_compound_idx",
"description": "Compound index for get_by_task_and_agent queries",
},
{
"keys": [("task_id", pymongo.ASCENDING)],
"name": "task_id_idx",
"description": "Single index for task_id queries",
"unique": True,
"description": "Unique compound index for get_by_task_and_agent queries",
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unique constraint won't apply to existing deployments

Adding unique: True here will not enforce uniqueness on any environment where the task_agent_compound_idx index already exists as a non-unique index. When ensure_mongodb_indexes calls collection.create_index(keys, name="task_agent_compound_idx", unique=True) and an index with that name already exists without unique, MongoDB raises OperationFailure: "already exists with different options". The catch block in mongodb_indexes.py (lines 87–93) silently logs a warning and moves on — the old index remains in place and the uniqueness constraint is never applied.

The same applies to the sort-direction changes in task_message_repository.py: the pre-existing task_id_created_at_idx (with ASCENDING) won't be replaced with the DESCENDING variant on any running database.

To actually apply these changes, you need a one-time migration step (e.g., in ensure_mongodb_indexes or a separate script) that drops the old indexes before recreating them:

# In ensure_mongodb_indexes or a migration script, before create_index:
try:
    collection.drop_index("task_agent_compound_idx")
except Exception:
    pass  # index might not exist yet
collection.create_index(keys, name="task_agent_compound_idx", unique=True)

Or expose the existing drop_all_indexes helper and run it once as part of the deploy process. Until this is addressed, the unique invariant described in the PR summary will only be enforced on fresh databases.

Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex/src/domain/repositories/task_state_repository.py
Line: 22-26

Comment:
**Unique constraint won't apply to existing deployments**

Adding `unique: True` here will **not** enforce uniqueness on any environment where the `task_agent_compound_idx` index already exists as a non-unique index. When `ensure_mongodb_indexes` calls `collection.create_index(keys, name="task_agent_compound_idx", unique=True)` and an index with that name already exists without `unique`, MongoDB raises `OperationFailure: "already exists with different options"`. The catch block in `mongodb_indexes.py` (lines 87–93) silently logs a warning and moves on — the old index remains in place and **the uniqueness constraint is never applied**.

The same applies to the sort-direction changes in `task_message_repository.py`: the pre-existing `task_id_created_at_idx` (with `ASCENDING`) won't be replaced with the `DESCENDING` variant on any running database.

To actually apply these changes, you need a one-time migration step (e.g., in `ensure_mongodb_indexes` or a separate script) that drops the old indexes before recreating them:

```python
# In ensure_mongodb_indexes or a migration script, before create_index:
try:
    collection.drop_index("task_agent_compound_idx")
except Exception:
    pass  # index might not exist yet
collection.create_index(keys, name="task_agent_compound_idx", unique=True)
```

Or expose the existing `drop_all_indexes` helper and run it once as part of the deploy process. Until this is addressed, the unique invariant described in the PR summary will only be enforced on fresh databases.

How can I resolve this? If you propose a fix, please make it concise.

Previously, if an index already existed with different options (e.g.,
non-unique vs unique), the startup code just logged a warning and moved
on — meaning index changes never applied on existing deployments.
Now we auto-recover by dropping the old index and recreating it.
Separate drop and create into distinct try/except blocks so that:
- If drop fails, the old index stays in place and we skip recreate
- If drop succeeds but create fails, we raise to prevent the app
  from starting without the index (e.g., missing unique constraint)
- Wrap sync find_one in run_in_executor to avoid blocking the event loop
- Drop legacy task_id_idx indexes from existing databases on startup
- Add clarifying comment on exception propagation in index recreate path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant