fix: Expose `impersonate` flag on HTTP crawlers. by Mantisus · Pull Request #1957 · apify/crawlee-python

Mantisus · 2026-06-09T21:50:03Z

Description

Expose an impersonate flag on the HTTP crawlers (HttpCrawler, BeautifulSoupCrawler, ParselCrawler) to turn browser impersonation on or off in the default ImpitHttpClient. The flag applies only to the default client; if a custom http_client is passed, it is ignored.
Add a guide on working with HTTP headers in web scraping (docs/guides/http_headers.mdx) with a runnable example.

Issues

Closes: Expose a browser-impersonation toggle directly on HTTP crawlers #1923

Testing

Added new tests verifying the impersonate flag for all HTTP crawlers.

Pijukatel

Hi, I think it would be better to do just a documentation change and keep the current implementation.

I wrote the reasons into the issue, as the current wording of the issue is asking for a code change.

#1923 (comment)

vdusek

Two comments from my side.

And regarding this:

Hi, I think it would be better to do just a documentation change and keep the current implementation.

I wrote the reasons into the issue, as the current wording of the issue is asking for a code change.

Exposing high-level convenience arguments on crawlers, which configure the underlying components, is a Crawlee design choice. And we follow this all over the place - crawlers (and Actor in SDK) already act as partial facades over the components they compose. A few examples:

PlaywrightCrawler - headless, browser_type, browser_launch_options, use_incognito_pages, user_data_dir, and similar BrowserPool/plugin internals directly on the crawler. With your approach, there should be only browser_pool.
StagehandCrawler follows the same pattern.
BasicCrawler has use_session_pool: bool next to the session_pool object, which is the same shape as impersonate.

So impersonate is not introducing a new pattern. It follows the same approach we already use for browsers, sessions, proxy configuration, concurrency, and more.

This design decision was made a long time ago, and we should be consistent and follow it, rather than diverging from it. And AFAIK @B4nan is a strong proponent of this approach.

This creates the ugly edge case HttpCrawler(impersonate=False, http_client=...).

We should simply validate the argument combination and raise an error when it is invalid, like in other places.

...

TLDR; A convenience flag (with guard and documentation) is consistent with the rest of Crawlee. It also keeps the simple case simple: users can turn off impersonation without needing to know what an HTTP client is.

Pijukatel · 2026-06-10T08:16:38Z

...
Exposing high-level convenience arguments on crawlers, which configure the underlying components, is a Crawlee design choice.
...

Those internal components usually have more arguments than what is exposed on the Crawler level, and many internal component arguments remain unexposed (which is fine). I do not think we have sufficient evidence to say that this specific internal component argument is so useful for the general user base that it deserves to be exposed on the Crawler level. JS tooling has been around for a while. Was anyone missing such an argument?

We should exercise restraint when exposing those convenient arguments. The more we have, the harder it is to understand the code.

Mantisus · 2026-06-10T11:49:55Z

Regarding the impersonate flag. To me, this situation is similar to #1487. Both cases require only a minor configuration change on the user’s part, and if it were entirely up to me, I would limit myself to providing documentation.

But since we’re already taking the approach of "giving the user a simple configuration option", this PR is fully consistent with that approach.

expose impersonate flag on HTTP crawlers

97e3c75

Mantisus self-assigned this Jun 9, 2026

Mantisus requested review from szaganek and vdusek June 9, 2026 21:51

Pijukatel reviewed Jun 10, 2026

View reviewed changes

vdusek reviewed Jun 10, 2026

View reviewed changes

Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py

Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py

add warning

681bc28

Mantisus requested a review from vdusek June 10, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Expose `impersonate` flag on HTTP crawlers.#1957

fix: Expose `impersonate` flag on HTTP crawlers.#1957
Mantisus wants to merge 2 commits into
apify:masterfrom
Mantisus:http-impersonation-expose

Mantisus commented Jun 9, 2026

Uh oh!

Pijukatel left a comment

Uh oh!

vdusek left a comment

Uh oh!

Uh oh!

Uh oh!

Pijukatel commented Jun 10, 2026

Uh oh!

Mantisus commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Mantisus commented Jun 9, 2026

Description

Issues

Testing

Uh oh!

Pijukatel left a comment

Choose a reason for hiding this comment

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pijukatel commented Jun 10, 2026

Uh oh!

Mantisus commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants