Skip to content

BatchJob.src is always None for Gemini Developer API — inputConfig is not parsed in _BatchJob_from_mldev #2367

@SoceyN

Description

@SoceyN

BatchJob.src is never populated for Gemini Developer API (mldev), causing input file references to be lost

Environment

  • google-genai version: 1.73.1
  • API backend: Gemini Developer API (genai.Client(api_key=...), not Vertex AI)
  • Python: 3.12
  • File: google/genai/batches.py

Summary

_BatchJob_from_mldev (the response parser used for the Gemini Developer API) does not map metadata.inputConfig from the REST response into the src field of types.BatchJob. Only metadata.output is mapped (into dest).

As a result, after calling client.batches.create(...) or client.batches.get(name=...), batch.src is always None for the Gemini Developer API, even though the underlying REST response contains metadata.inputConfig.fileName.

The Vertex AI parser (_BatchJob_from_vertex) does not have this issue — it correctly maps inputConfig to src.

Reproduction

from google import genai

client = genai.Client(api_key="...")

# 1. Upload an input file
uploaded = client.files.upload(
    file=BytesIO(b'{"key":"k1","request":{"contents":[{"parts":[{"text":"hi"}]}]}}\n'),
    config={"display_name": "demo", "mime_type": "application/jsonl"},
)
print("uploaded:", uploaded.name)  # e.g. "files/vx4mwl208pq9"

# 2. Create a batch from the uploaded file
batch = client.batches.create(
    model="gemini-2.5-flash-lite",
    src=uploaded.name,
    config={"display_name": "demo-batch"},
)

# 3. Retrieve the batch
fetched = client.batches.get(name=batch.name)

print("src:", fetched.src)   # ← None (BUG)
print("dest:", fetched.dest) # ← BatchJobDestination(file_name='files/batch-...')

The raw REST response (GET /v1beta/{batch.name}) contains both:

{
  "name": "batches/xhpy2t2ugh8l0obqasnhinyi7aeisy38w6h3",
  "metadata": {
    "inputConfig": {
      "fileName": "files/vx4mwl208pq9"
    },
    "output": {
      "responsesFile": "files/batch-xhpy2t2ugh8l0obqasnhinyi7aeisy38w6h3"
    },
    ...
  }
}

…but fetched.src is None because the SDK never parses metadata.inputConfig.

Expected behavior

fetched.src.file_name should equal "files/vx4mwl208pq9" (consistent with the Vertex AI parser, which sets src from inputConfig).

Actual behavior

fetched.src is None for the Gemini Developer API.

Root cause

google/genai/batches.py:256-310_BatchJob_from_mldev is missing the inputConfig → src mapping:

def _BatchJob_from_mldev(
    from_object: Union[dict[str, Any], object],
    parent_object: Optional[dict[str, Any]] = None,
) -> dict[str, Any]:
  to_object: dict[str, Any] = {}
  if getv(from_object, ['name']) is not None:
    setv(to_object, ['name'], getv(from_object, ['name']))
  # ... displayName, state, createTime, endTime, updateTime, model ...

  if getv(from_object, ['metadata', 'output']) is not None:
    setv(
        to_object,
        ['dest'],
        _BatchJobDestination_from_mldev(
            t.t_recv_batch_job_destination(
                getv(from_object, ['metadata', 'output'])
            ),
            to_object,
        ),
    )

  return to_object   # ← `src` is never set

Compare with _BatchJob_from_vertex (batches.py:313-369), which does map inputConfig to src:

if getv(from_object, ['inputConfig']) is not None:
    setv(
        to_object,
        ['src'],
        _BatchJobSource_from_vertex(
            getv(from_object, ['inputConfig']), to_object
        ),
    )

Impact

This bug has caused a serious production issue for us. Our code relied on batch.src.file_name to identify the input file for cleanup after the batch completes:

src_name = getattr(batch.src, "file_name", None)  # always None
if src_name:
    client.files.delete(name=src_name)             # never executes

Because batch.src is always None, the client.files.delete(...) call is silently skipped. Input files (uploaded via client.files.upload(...), default 30-day TTL on the Gemini Developer API) accumulate in the project's Files API quota (20 GB), eventually exhausting it and blocking new batch submissions until the user manually cleans up out-of-band.

Anyone using BatchJob.src to track or clean up the input file on the Gemini Developer API path is silently affected.

Workaround

We are working around this by issuing a raw REST GET to retrieve the batch and reading metadata.inputConfig.fileName directly:

import urllib.request, json
req = urllib.request.Request(
    f"https://generativelanguage.googleapis.com/v1beta/{batch_name}",
    headers={"x-goog-api-key": api_key},
)
with urllib.request.urlopen(req) as resp:
    meta = json.loads(resp.read())
input_file = meta["metadata"]["inputConfig"]["fileName"]

A workaround that stays within the SDK is to persist the file name returned by client.files.upload(...) ourselves at submit time and ignore batch.src entirely.

Proposed fix

Add the inputConfig → src mapping to _BatchJob_from_mldev, mirroring _BatchJob_from_vertex:

def _BatchJob_from_mldev(from_object, parent_object=None):
    to_object = {}
    # ... existing mappings ...

    if getv(from_object, ['metadata', 'inputConfig']) is not None:
        setv(
            to_object,
            ['src'],
            _BatchJobSource_from_mldev(
                getv(from_object, ['metadata', 'inputConfig']),
                to_object,
            ),
        )

    if getv(from_object, ['metadata', 'output']) is not None:
        setv(
            to_object,
            ['dest'],
            _BatchJobDestination_from_mldev(
                t.t_recv_batch_job_destination(
                    getv(from_object, ['metadata', 'output'])
                ),
                to_object,
            ),
        )

    return to_object

(_BatchJobSource_from_mldev already exists in the file; this just calls it with the right input.)

Thanks for maintaining the SDK!

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions