Skip to content

fix(agents): parallelize LLM calls and clean handle_chat (#78, #80)#103

Open
zohaib-7035 wants to merge 1 commit intoINCF:mainfrom
zohaib-7035:perf/llm-parallelization
Open

fix(agents): parallelize LLM calls and clean handle_chat (#78, #80)#103
zohaib-7035 wants to merge 1 commit intoINCF:mainfrom
zohaib-7035:perf/llm-parallelization

Conversation

@zohaib-7035
Copy link
Copy Markdown
Contributor

Summary

This PR addresses two maintenance issues regarding LLM call latency and chat logic structure inside backend/agents.py. This is an isolated refactoring, not altering the external API structure.

Changes Made

  • Performance (Fixes perf: Parallelize LLM calls in extract_keywords_and_rewrite to reduce latency #78): Refactored extract_keywords_and_rewrite to parallelize the Gemini LLM calls using asyncio.gather. detect_intents and rewrite_with_history are now executed concurrently, followed by the keyword extraction running concurrently alongside the second intent detection. This mathematically optimizes latency by cutting out sequential blocking.
  • Code Quality (Fixes bug: Duplicate more-query condition in handle_chat #80): Cleaned up the confusing and somewhat redundant branch check inside handle_chat. By swapping the evaluation order to check query.strip().lower() in {...} before evaluating more_count, the logic becomes safer and no longer creates a dead-path confusion with _is_more_query.

Verification

  • py_compile: backend/agents.py
  • No missing dependencies.
  • Native async code applied safely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Duplicate more-query condition in handle_chat perf: Parallelize LLM calls in extract_keywords_and_rewrite to reduce latency

1 participant