Skip to content

fix: log failure reason before COMPLETE and fix misleading SCRAPE βœ“ (#1949)#1952

Open
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-silent-scrape-failure-1949
Open

fix: log failure reason before COMPLETE and fix misleading SCRAPE βœ“ (#1949)#1952
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/maysam-silent-scrape-failure-1949

Conversation

@hafezparast
Copy link
Copy Markdown
Contributor

Problem

When a crawl fails (anti-bot detection, empty HTML, etc.), users see only [COMPLETE] βœ— with zero explanation β€” even with verbose=True. The CrawlResult.error_message is correctly set but was never logged to the console.

A second issue: the [SCRAPE] log always emitted βœ“ regardless of whether scraping produced any content, which was actively misleading.

Before:

[FETCH]...  ↓ https://example.com  | βœ— | ⏱: 0.52s
[SCRAPE]..  β—† https://example.com  | βœ“ | ⏱: 0.00s   ← misleading βœ“
[COMPLETE] ● https://example.com   | βœ— | ⏱: 0.52s   ← no reason shown

After:

[FETCH]...  ↓ https://example.com  | βœ— | ⏱: 0.64s
[SCRAPE]..  β—† https://example.com  | βœ— | ⏱: 0.00s   ← correct βœ—
[ERROR]...  Γ— https://example.com  | Error: Blocked by anti-bot protection: Near-empty content (0 bytes) with HTTP 200
[COMPLETE] ● https://example.com   | βœ— | ⏱: 0.64s   ← user now knows why

Fix

Two-line change in async_webcrawler.py:

  1. Before the COMPLETE log: if crawl_result.success=False and error_message is set, emit an [ERROR] log with the reason
  2. In aprocess_html: change SCRAPE log from hardcoded success=True to success=bool(cleaned_html)

Test plan

  • Reproduced bug: SCRAPE βœ“, COMPLETE βœ—, no error log β€” confirmed
  • After fix: ERROR log appears before COMPLETE with full reason
  • 7 targeted tests β€” all pass
  • Full regression suite: 297 passed, 1 skipped, 0 failures

Fixes #1949

πŸ€– Generated with Claude Code

…nclecode#1949)

Two issues caused silent COMPLETE βœ— with no diagnostic output:

1. When crawl_result.success=False (anti-bot detection, empty HTML, etc.),
   the error_message was set on the CrawlResult but never logged β€” users
   saw only [COMPLETE] βœ— with zero explanation. Fix: emit an [ERROR] log
   containing error_message before the COMPLETE line whenever success=False.

2. The SCRAPE log in aprocess_html always emitted success=True regardless
   of whether scraping produced any content. Fix: use bool(cleaned_html)
   so SCRAPE reflects the actual outcome.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant