Improve Twinkle service stability#230
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a watchdog supervisor script for Megatron, adds a completion mask to the input processor, and implements unhandled exception handling middleware in the FastAPI server alongside task queue error formatting. The review feedback highlights several critical issues: the exception-handling middleware is registered outermost, which interferes with telemetry and header injection; returning raw tracebacks to clients poses a security risk; the hardcoded lock file path in /tmp can cause permission conflicts in multi-user environments; spawning Python for health checks and date for time tracking in the shell script is highly inefficient; and a task failure helper is bypassed in the worker, causing code duplication.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Pull request overview
This PR improves Twinkle’s service stability by (1) fixing padding for a valid input field (completion_mask), (2) ensuring unhandled FastAPI route exceptions return a structured 500 response (with traceback) rather than destabilizing the replica, and (3) making the Megatron service startup more resilient via a container entrypoint watchdog and a single-instance/restart mechanism in run.sh.
Changes:
- Add
completion_masktoInputProcessor.padding_mapand cover it with a unit test to prevent padding-timeKeyError. - Add a deployment-level exception boundary middleware that converts unhandled route exceptions into
500JSON responses containing the traceback. - Improve operational robustness of Megatron service startup by introducing
entrypoint.shsupervision and enhancingrun.shwith single-instance + restart-request handling.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/twinkle/processor/base.py |
Adds completion_mask to padding defaults to prevent padding failures. |
tests/processor/test_processor.py |
Adds regression coverage for completion_mask padding; adjusts multimodal assertion. |
src/twinkle/server/deployment.py |
Adds an unhandled-exception middleware boundary returning 500 with traceback. |
tests/server/test_deployment_exception_boundary.py |
Validates the exception boundary returns tracebacks and continues serving; checks replica header behavior. |
tests/server/test_app_builders_characterization.py |
Updates expected middleware ordering to include the new exception boundary. |
src/twinkle/server/utils/task_errors.py |
Introduces a shared helper for standard task failure payloads. |
src/twinkle/server/utils/task_queue/worker.py |
Uses task_error_payload for failed task persistence. |
src/twinkle/server/utils/task_queue/mixin.py |
Uses task_error_payload for background-task failure persistence. |
tests/server/utils/test_task_errors.py |
Adds tests ensuring task failure payloads preserve traceback strings. |
cookbook/client/server/megatron/run.sh |
Replaces tail-based liveness with a supervised foreground wait loop and adds restart/single-instance mechanics. |
cookbook/client/server/megatron/entrypoint.sh |
Adds a container entrypoint watchdog that restarts run.sh based on Ray + HTTP health checks. |
Summary
completion_maskpadding so legal inputs no longer raiseKeyError.500with traceback while keeping the server alive.tail -f-based liveness inmegatron/run.shwith a singleton watchdog/supervisor that restarts server/Ray locally.MODELSCOPE_CACHEandTWINKLE_WORK_DIRinsiderun.sh../run.sh --restartorTWINKLE_RUN_EXISTING_ACTION=restart.Root Cause
completion_maskwas a valid input field but was missing fromInputProcessor.padding_map, so normal padding could fail withKeyError. Separately,run.shonly stayed alive throughtail -f run.log, so server/Ray failures could leave the process model fragile or fake-alive.Startup / Restart
Default startup can now be shortened to:
For a code update that should actively restart an already-running service:
or:
Default duplicate execution remains safe: without restart mode, a second
run.shexits instead of disrupting the live service.Validation
bash -n cookbook/client/server/megatron/run.shbash cookbook/client/server/megatron/run.sh --helpgit diff --check/Users/yunlin/miniconda3/envs/twinkle/bin/python -m pre_commit run --files .../Users/yunlin/miniconda3/envs/twinkle/bin/python -m pytest tests/processor/test_processor.py tests/server/utils/test_task_errors.py tests/server/utils/test_task_queue_mixin.py tests/server/test_deployment_exception_boundary.py tests/server/test_app_builders_characterization.py tests/server/model -qcompletion_maskpayload succeeded; repeated exception requests returned500 + traceback; the same replica remained healthy and RUNNING.Notes
Full megatron
run.shstartup was not executed locally because it can stop local Ray/vLLM/Redis/LGTM processes and consume GPU resources. First online rollout should set bounded restart env vars before enabling unlimited supervisor restarts.