Skip to content

fix: raise NotFoundError on ambiguous 404 responses#755

Merged
vdusek merged 2 commits intomasterfrom
fix/chained-get-404
Apr 23, 2026
Merged

fix: raise NotFoundError on ambiguous 404 responses#755
vdusek merged 2 commits intomasterfrom
fix/chained-get-404

Conversation

@vdusek
Copy link
Copy Markdown
Contributor

@vdusek vdusek commented Apr 22, 2026

Follow-up to #737 (comment)

.get() previously collapsed every 404 into None via catch_not_found_or_throw. That works for direct, ID-identified fetches (client.dataset(id).get()), where a 404 unambiguously means the named resource is missing. It's misleading whenever a 404 is ambiguous — the client can't tell which resource in the path is actually gone, and silently returning None hides the cause.

This PR identifies three categories where a 404 is ambiguous and propagates NotFoundError instead:

  1. Chained calls without an ID (run.dataset(), run.key_value_store(), run.request_queue(), run.log()) — a 404 could mean either the parent run is missing or the default sub-resource. Covered by the base _get / _delete via a resource_id is None → raise guard, so .delete() on a chained client also raises now; direct dataset(id).delete() keeps its idempotent-DELETE semantics.
  2. Chained LogClientrun.log().get() / .get_as_bytes() / .stream() raise; direct client.log(id).get() still returns None.
  3. Singleton sub-path endpointsScheduleClient.get_log, TaskClient.get_input, DatasetClient.get_statistics, UserClient.monthly_usage, UserClient.limits, WebhookClient.test. These hit a fixed path (/.../{id}/log, /.../{id}/input, etc.) where a 404 effectively always means the parent is missing. Return types moved from T | None to T.

Record-by-key lookups (KeyValueStoreClient.get_record(key), RequestQueueClient.get_request(request_id)) keep the existing "None on missing" behavior — the 404 is specifically about the record/request, which is the natural meaning.

A shared helper catch_not_found_for_resource_or_throw(exc, resource_id) in _utils.py centralizes the resource_id is None → raise pattern across all 10 call sites. The v3 upgrade guide documents the new semantics. Sync/async tests cover every code path, including client.actor('id').last_run().dataset().get() (happy path + ambiguous-404 case).

@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Apr 22, 2026
@vdusek vdusek self-assigned this Apr 22, 2026
@vdusek vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Apr 22, 2026
@github-actions github-actions Bot added this to the 139th sprint - Tooling team milestone Apr 22, 2026
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Apr 22, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 86.36364% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.10%. Comparing base (a6daff7) to head (95ff5b9).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
src/apify_client/_resource_clients/log.py 14.28% 6 Missing ⚠️
...apify_client/_resource_clients/_resource_client.py 60.00% 2 Missing ⚠️
src/apify_client/_utils.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #755      +/-   ##
==========================================
+ Coverage   95.55%   96.10%   +0.54%     
==========================================
  Files          45       45              
  Lines        5152     5105      -47     
==========================================
- Hits         4923     4906      -17     
+ Misses        229      199      -30     
Flag Coverage Δ
integration 96.10% <86.36%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vdusek vdusek requested review from Pijukatel and janbuchar April 22, 2026 08:33
`.get()` previously collapsed every 404 into `None` via
`catch_not_found_or_throw`. That works for direct, ID-identified fetches
(`client.dataset(id).get()`), where a 404 unambiguously means the named
resource is missing. It's misleading for chained calls that target a default
sub-resource without an ID (`run.dataset()`, `run.key_value_store()`,
`run.request_queue()`, `run.log()`): a 404 there could mean the parent run
is missing or the default sub-resource is missing, and the API body cannot
disambiguate the two.

This change keeps the `None` behavior for ID-identified clients and
propagates `NotFoundError` from chained clients (`self._resource_id is
None`). The v3 upgrade guide is updated to document the new semantics, and
sync/async tests cover both code paths.
@vdusek vdusek force-pushed the fix/chained-get-404 branch from 84e57c0 to 8b6773d Compare April 22, 2026 08:35
Copy link
Copy Markdown
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be applied to more than just GET methods?
For example DELETE?

Also there are probably more such endpoints where this could be applied, for example ScheduleClient.get_log

Copy link
Copy Markdown
Contributor

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could you please test the special cases like
client.actor('actor-id').last_run().dataset().get()

Where cases should be:

  1. missing actor -> raise 404
  2. existing actor, but missing last run -> raise 404
  3. existing actor, existing run, missing dataset -> return None
  4. everything exists -> return dataset (probably already tested somewhere)

@vdusek
Copy link
Copy Markdown
Contributor Author

vdusek commented Apr 23, 2026

Should this also be applied to more than just GET methods? For example DELETE?

Good point. Updated, now there is the same guard for the _delete in the base class. I also added tests.

Also there are probably more such endpoints where this could be applied, for example ScheduleClient.get_log

I let claude audit every call of catch_not_found_or_throw.

Here is the summary of updates:

  1. Chained default sub-resources (run.dataset(), run.key_value_store(), run.request_queue(), run.log()) - covered by the existing _get/_delete guard.
  2. Chained LogClient - added the same guard to LogClient.get / .get_as_bytes / .stream. Direct client.log(id).get() still returns None; chained run.log().get() now raises.
  3. Singleton sub-path endpoints - dropped the 404 swallowing entirely and changed return types from T | None to T for:
  • ScheduleClient.get_log (the one you called out)
  • TaskClient.get_input
  • DatasetClient.get_statistics
  • UserClient.monthly_usage
  • UserClient.limits
  • WebhookClient.test

These all hit a fixed path like /schedules/{id}/log - a 404 there effectively always means the parent is missing, so silently returning None was hiding the real cause.

Left KeyValueStoreClient.get_record(key) / RequestQueueClient.get_request(request_id) and variants as-is - those look up a record by its own key/id, so None on missing is the legitimate semantic.

Also could you please test the special cases like client.actor('actor-id').last_run().dataset().get()
Where cases should be:

  1. missing actor -> raise 404
  2. existing actor, but missing last run -> raise 404
  3. existing actor, existing run, missing dataset -> return None
  4. everything exists -> return dataset (probably already tested somewhere)

Added tests:

  • test_actor_last_run_dataset_get_raises_on_404 - covers cases 1/2/3.
  • test_actor_last_run_dataset_get_returns_dataset - case 4.

@Pijukatel

@vdusek vdusek requested a review from Pijukatel April 23, 2026 08:03
@vdusek vdusek changed the title fix: raise NotFoundError on 404 from chained .get() without ID fix: raise NotFoundError on ambiguous 404 responses Apr 23, 2026
@vdusek vdusek merged commit 701185e into master Apr 23, 2026
26 checks passed
@vdusek vdusek deleted the fix/chained-get-404 branch April 23, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants