CI: diagnose AOTI hang on macOS — isolated test with native stack sampling#19886
CI: diagnose AOTI hang on macOS — isolated test with native stack sampling#19886SS-JIA wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19886
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit ff048ff with merge base 88faab2 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This PR needs a
|
…pling Summary: The macOS unittest job has been timing out since the PyTorch pin was updated to 2.12. Three CI runs showed 38-42 minutes of complete silence after ~55% test completion, with faulthandler unable to fire (confirming the hang is in native C/C++ code, not Python). The last tests before the silence are in examples/models/llama/tests/; the next tests in collection order are the llama3_2_vision AOTI tests. This commit isolates the diagnosis: - Disables all CI jobs except the macos unittest job - Replaces the full test suite with a single AOTI test (test_llama3_2_text_decoder_aoti) run as an instrumented script - Prints timestamps before/after each step (export, compile, load, run) to identify which AOTI step hangs - Runs a background watchdog that uses macOS `sample` to capture native C/C++ call stacks every 60s, since faulthandler cannot see into native code that holds the GIL - Sets PYTHONUNBUFFERED=1 to prevent pipe buffering Co-Authored-By: Claude <noreply@anthropic.com>
Summary: AOTI tests (llama3_2_vision and select extension/llm tests) hang indefinitely on macOS CI runners after the PyTorch 2.12 pin update. The hang is in native C/C++ code (inductor compilation / dlopen), which prevents faulthandler from producing a traceback. Diagnosis is ongoing in #19886. Skip the affected tests and bump the macOS job timeout from the default 90 to 120 minutes to add margin (observed completion at ~79 min with skips applied). Co-Authored-By: Claude <noreply@anthropic.com>
Summary
The macOS unittest job has been timing out since the PyTorch pin was updated to 2.12. Three CI runs showed 38-42 minutes of complete silence after ~55% test completion, with faulthandler unable to fire (confirming the hang is in native C/C++ code, not Python).
This PR isolates the diagnosis:
test_llama3_2_text_decoder_aoti) run as an instrumented scriptsampleto capture native C/C++ call stacks every 60s, since faulthandler cannot see into native code that holds the GILPYTHONUNBUFFERED=1to prevent pipe bufferingTest plan
sampleoutput will show the exact native function blockingCo-Authored-By: Claude noreply@anthropic.com