Skip to content

Retry Behavior for SigV4Adapter in REST Catalog #3008

@Tommo56700

Description

@Tommo56700

Feature Request / Improvement

Hi team,

I’ve recently migrated to AWS S3 Tables and switched from using the GlueCatalog to the REST catalog. After updating the catalog configuration, everything works correctly in local, single‑process scenarios. However, I’m encountering intermittent failures when scaling out to multiple Dask workers making parallel requests.
Specifically, I’m seeing occasional ThrottlingException errors coming from AWS SigV4‑signed requests. Once throttling occurs, subsequent requests sometimes fail with:
requests.exceptions.HTTPError: 403 Client Error

My understanding is that throttled SigV4 signing attempts can lead to follow‑on request failures, resulting in unauthorized S3 operations. According to AWS’s recommendation for handling throttling on signed requests, retry configuration should be applied via botocore: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html

While reviewing the PyIceberg implementation, I noticed:

This creates an inconsistency where switching from Glue to REST results in weaker retry behavior, which becomes visible under parallel load.

Question / Proposal

Should the REST catalog align its default retry behavior with what GlueCatalog already applies?
At present, users can manually configure retry settings by supplying a custom botocore session via catalog properties, but I am yet to test if this works. It seems reasonable and more consistent for the REST catalog to provide safe defaults, especially since SigV4Adapter is now a common path for AWS S3 Tables.

Matching (or at least approaching) the GlueCatalog’s retry policy would provide the following benefits:

  • Avoid intermittent throttling‑triggered failures in distributed workloads
  • Improve parity between Glue and REST behavior
  • Reduce the configuration burden on users switching to REST for AWS‑backed tables

Happy to discuss or test any proposed changes. Thanks for your work on the project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions