Skip to content

ETL: batch geocoding pipeline with rate limiting and resume support #26

@francescobianco

Description

@francescobianco

Problem

When geocoding large address lists (10k+ records), the API rate limits kick in. There is currently no built-in mechanism in the SDK to handle this gracefully.

Typical pattern people try

$client = new Client($token);
foreach ($addresses as $addr) {
    $result = $client->get('https://geocoding.openapi.com/geocode', ['address' => $addr]);
    // if rate limit hit (429), this throws immediately and the whole batch is lost
}

What would help

A retry/backoff decorator on top of HttpTransportInterface:

use Openapi\Transports\RetryTransport;

$client = new Client($token, new RetryTransport(
    maxRetries: 3,
    backoffMs: 500,
    retryOn: [429, 503]
));

And a checkpoint mechanism to resume from the last successful record.

Open questions

  • Should RetryTransport be part of the core SDK or a separate optional package?
  • Should the checkpoint state be file-based, Redis-based, or left to the consumer?
  • Is 429 the only rate-limit signal used across all Openapi endpoints?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions