BatchJob.src is never populated for Gemini Developer API (mldev), causing input file references to be lost
Environment
google-genai version: 1.73.1
- API backend: Gemini Developer API (
genai.Client(api_key=...), not Vertex AI)
- Python: 3.12
- File:
google/genai/batches.py
Summary
_BatchJob_from_mldev (the response parser used for the Gemini Developer API) does not map metadata.inputConfig from the REST response into the src field of types.BatchJob. Only metadata.output is mapped (into dest).
As a result, after calling client.batches.create(...) or client.batches.get(name=...), batch.src is always None for the Gemini Developer API, even though the underlying REST response contains metadata.inputConfig.fileName.
The Vertex AI parser (_BatchJob_from_vertex) does not have this issue — it correctly maps inputConfig to src.
Reproduction
from google import genai
client = genai.Client(api_key="...")
# 1. Upload an input file
uploaded = client.files.upload(
file=BytesIO(b'{"key":"k1","request":{"contents":[{"parts":[{"text":"hi"}]}]}}\n'),
config={"display_name": "demo", "mime_type": "application/jsonl"},
)
print("uploaded:", uploaded.name) # e.g. "files/vx4mwl208pq9"
# 2. Create a batch from the uploaded file
batch = client.batches.create(
model="gemini-2.5-flash-lite",
src=uploaded.name,
config={"display_name": "demo-batch"},
)
# 3. Retrieve the batch
fetched = client.batches.get(name=batch.name)
print("src:", fetched.src) # ← None (BUG)
print("dest:", fetched.dest) # ← BatchJobDestination(file_name='files/batch-...')
The raw REST response (GET /v1beta/{batch.name}) contains both:
{
"name": "batches/xhpy2t2ugh8l0obqasnhinyi7aeisy38w6h3",
"metadata": {
"inputConfig": {
"fileName": "files/vx4mwl208pq9"
},
"output": {
"responsesFile": "files/batch-xhpy2t2ugh8l0obqasnhinyi7aeisy38w6h3"
},
...
}
}
…but fetched.src is None because the SDK never parses metadata.inputConfig.
Expected behavior
fetched.src.file_name should equal "files/vx4mwl208pq9" (consistent with the Vertex AI parser, which sets src from inputConfig).
Actual behavior
fetched.src is None for the Gemini Developer API.
Root cause
google/genai/batches.py:256-310 — _BatchJob_from_mldev is missing the inputConfig → src mapping:
def _BatchJob_from_mldev(
from_object: Union[dict[str, Any], object],
parent_object: Optional[dict[str, Any]] = None,
) -> dict[str, Any]:
to_object: dict[str, Any] = {}
if getv(from_object, ['name']) is not None:
setv(to_object, ['name'], getv(from_object, ['name']))
# ... displayName, state, createTime, endTime, updateTime, model ...
if getv(from_object, ['metadata', 'output']) is not None:
setv(
to_object,
['dest'],
_BatchJobDestination_from_mldev(
t.t_recv_batch_job_destination(
getv(from_object, ['metadata', 'output'])
),
to_object,
),
)
return to_object # ← `src` is never set
Compare with _BatchJob_from_vertex (batches.py:313-369), which does map inputConfig to src:
if getv(from_object, ['inputConfig']) is not None:
setv(
to_object,
['src'],
_BatchJobSource_from_vertex(
getv(from_object, ['inputConfig']), to_object
),
)
Impact
This bug has caused a serious production issue for us. Our code relied on batch.src.file_name to identify the input file for cleanup after the batch completes:
src_name = getattr(batch.src, "file_name", None) # always None
if src_name:
client.files.delete(name=src_name) # never executes
Because batch.src is always None, the client.files.delete(...) call is silently skipped. Input files (uploaded via client.files.upload(...), default 30-day TTL on the Gemini Developer API) accumulate in the project's Files API quota (20 GB), eventually exhausting it and blocking new batch submissions until the user manually cleans up out-of-band.
Anyone using BatchJob.src to track or clean up the input file on the Gemini Developer API path is silently affected.
Workaround
We are working around this by issuing a raw REST GET to retrieve the batch and reading metadata.inputConfig.fileName directly:
import urllib.request, json
req = urllib.request.Request(
f"https://generativelanguage.googleapis.com/v1beta/{batch_name}",
headers={"x-goog-api-key": api_key},
)
with urllib.request.urlopen(req) as resp:
meta = json.loads(resp.read())
input_file = meta["metadata"]["inputConfig"]["fileName"]
A workaround that stays within the SDK is to persist the file name returned by client.files.upload(...) ourselves at submit time and ignore batch.src entirely.
Proposed fix
Add the inputConfig → src mapping to _BatchJob_from_mldev, mirroring _BatchJob_from_vertex:
def _BatchJob_from_mldev(from_object, parent_object=None):
to_object = {}
# ... existing mappings ...
if getv(from_object, ['metadata', 'inputConfig']) is not None:
setv(
to_object,
['src'],
_BatchJobSource_from_mldev(
getv(from_object, ['metadata', 'inputConfig']),
to_object,
),
)
if getv(from_object, ['metadata', 'output']) is not None:
setv(
to_object,
['dest'],
_BatchJobDestination_from_mldev(
t.t_recv_batch_job_destination(
getv(from_object, ['metadata', 'output'])
),
to_object,
),
)
return to_object
(_BatchJobSource_from_mldev already exists in the file; this just calls it with the right input.)
Thanks for maintaining the SDK!
BatchJob.srcis never populated for Gemini Developer API (mldev), causing input file references to be lostEnvironment
google-genaiversion: 1.73.1genai.Client(api_key=...), not Vertex AI)google/genai/batches.pySummary
_BatchJob_from_mldev(the response parser used for the Gemini Developer API) does not mapmetadata.inputConfigfrom the REST response into thesrcfield oftypes.BatchJob. Onlymetadata.outputis mapped (intodest).As a result, after calling
client.batches.create(...)orclient.batches.get(name=...),batch.srcis alwaysNonefor the Gemini Developer API, even though the underlying REST response containsmetadata.inputConfig.fileName.The Vertex AI parser (
_BatchJob_from_vertex) does not have this issue — it correctly mapsinputConfigtosrc.Reproduction
The raw REST response (
GET /v1beta/{batch.name}) contains both:{ "name": "batches/xhpy2t2ugh8l0obqasnhinyi7aeisy38w6h3", "metadata": { "inputConfig": { "fileName": "files/vx4mwl208pq9" }, "output": { "responsesFile": "files/batch-xhpy2t2ugh8l0obqasnhinyi7aeisy38w6h3" }, ... } }…but
fetched.srcisNonebecause the SDK never parsesmetadata.inputConfig.Expected behavior
fetched.src.file_nameshould equal"files/vx4mwl208pq9"(consistent with the Vertex AI parser, which setssrcfrominputConfig).Actual behavior
fetched.srcisNonefor the Gemini Developer API.Root cause
google/genai/batches.py:256-310—_BatchJob_from_mldevis missing theinputConfig → srcmapping:Compare with
_BatchJob_from_vertex(batches.py:313-369), which does mapinputConfigtosrc:Impact
This bug has caused a serious production issue for us. Our code relied on
batch.src.file_nameto identify the input file for cleanup after the batch completes:Because
batch.srcis alwaysNone, theclient.files.delete(...)call is silently skipped. Input files (uploaded viaclient.files.upload(...), default 30-day TTL on the Gemini Developer API) accumulate in the project's Files API quota (20 GB), eventually exhausting it and blocking new batch submissions until the user manually cleans up out-of-band.Anyone using
BatchJob.srcto track or clean up the input file on the Gemini Developer API path is silently affected.Workaround
We are working around this by issuing a raw REST
GETto retrieve the batch and readingmetadata.inputConfig.fileNamedirectly:A workaround that stays within the SDK is to persist the file name returned by
client.files.upload(...)ourselves at submit time and ignorebatch.srcentirely.Proposed fix
Add the
inputConfig → srcmapping to_BatchJob_from_mldev, mirroring_BatchJob_from_vertex:(
_BatchJobSource_from_mldevalready exists in the file; this just calls it with the right input.)Thanks for maintaining the SDK!