Search before asking
What happened
The CircleCI plugin's collectJobs subtask fails on every incremental run whenever any CircleCI job is in a non-terminal status (running / not_running / queued / on_hold). The whole CircleCI collection aborts, so no CI/CD data (deployments, builds) is refreshed and DORA metrics go stale.
The task fails with repeated HTTP 500s against a malformed URL that has an empty workflow id:
/v2/workflow//job
^^ empty path segment
Root cause: CollectJobs runs two API collectors that share the URL template "/v2/workflow/{{ .Input.Id }}/job". The first ("new records") collector populates .Input.Id correctly; the second ("unfinished details") collector's iterator query selects DISTINCT workflow_id (not id), so the templated field .Input.Id is empty and DevLake requests GET /v2/workflow//job, which CircleCI answers with 500 Internal Server Error.
Environment
|
|
| DevLake version |
v1.0.3-beta12 (image apache/devlake:v1.0.3-beta12) |
| Plugin |
circleci |
| Database |
MySQL 8.x |
| CircleCI |
CircleCI Cloud (circleci.com API v2) |
| Trigger |
Nightly blueprint (incremental, skipCollectors=false) |
Error / logs
Exact error_name / message recorded on the failed task (_devlake_tasks, plugin='circleci', status='TASK_FAILED'):
subtask collectJobs ended unexpectedly
caused by: Retry exceeded 3 times calling /v2/workflow//job.
The last error was: Http DoAsync error calling [method:GET path:/v2/workflow//job query:map[]].
Response: {"message":"Internal Server Error"} (500)
The message repeats the same /v2/workflow//job 500 once per affected workflow id (8+ combined messages in our case) before the subtask aborts.
Root cause analysis
backend/plugins/circleci/tasks/job_collector.go defines the subtask:
var CollectJobsMeta = plugin.SubTaskMeta{
Name: "collectJobs",
EntryPoint: CollectJobs,
EnabledByDefault: true,
Description: "collect circleci jobs",
DomainTypes: []string{plugin.DOMAIN_TYPE_CICD},
}
CollectJobs creates two API collectors, both using the same URL template "/v2/workflow/{{ .Input.Id }}/job".
Collector 1 — "new records" (works). Its iterator selects id from the workflow table, so the templated field .Input.Id is populated:
clauses := []dal.Clause{
dal.Select("id, pipeline_id"),
dal.From(&models.CircleciWorkflow{}),
dal.Where("connection_id = ? and project_slug = ?",
data.Options.ConnectionId, data.Options.ProjectSlug),
}
// ...
UrlTemplate: "/v2/workflow/{{ .Input.Id }}/job",
Collector 2 — "unfinished details" (broken). Its iterator selects DISTINCT workflow_id (the column is workflow_id, not id), yet it reuses the same URL template referencing .Input.Id:
clauses := []dal.Clause{
dal.Select("DISTINCT workflow_id"),
dal.From(&models.CircleciJob{}),
dal.Where("connection_id = ? AND project_slug = ? AND status IN ('running', 'not_running', 'queued', 'on_hold')",
data.Options.ConnectionId, data.Options.ProjectSlug),
}
// ...
UrlTemplate: "/v2/workflow/{{ .Input.Id }}/job",
Because the row scanned by collector 2 has no Id field set (the query projected workflow_id, which maps to the struct field WorkflowId, not Id), {{ .Input.Id }} renders empty and the request path collapses to /v2/workflow//job. CircleCI returns 500 for the malformed path, the collector retries 3×, and the subtask fails — taking the entire CircleCI collection down with it.
Evidence that this is a code bug, not missing data
The affected jobs all have a valid, non-empty workflow_id in the DB — the empty id is introduced by the collector, not by the data.
Status distribution of _tool_circleci_jobs at the time of failure:
status jobs
------- -----
success 74622
blocked 51314 <- NOT in the collector's status filter; not involved
canceled 17742
on_hold 12368 <- queried by collector 2
failed 3031
not_run 2801 <- note: 'not_run' (data) != 'not_running' (query); not matched
running 2 <- queried by collector 2
Rows in collector 2's query set (status IN ('running','not_running','queued','on_hold')), and how many have a valid workflow_id:
status jobs with_valid_workflow_id
------- ---- ----------------------
on_hold 12368 12368
running 2 2
TOTAL 12370 12370 <- 100% have a valid workflow_id
A concrete example of an affected job (valid workflow_id, yet the collector requests /v2/workflow//job):
id : b8719700-9035-4a96-8053-58596b802e79
workflow_id : 0021f5f7-0161-4845-a8ef-eca400e9de7c <- present and valid
project_slug : gh/elopage/elopage_web_client
status : on_hold
started_at : 2026-04-29 07:45:28
So the correct request would be GET /v2/workflow/0021f5f7-0161-4845-a8ef-eca400e9de7c/job, but DevLake issues GET /v2/workflow//job.
Impact
- The entire CircleCI collection aborts on the failing subtask. CI/CD domain data (deployments/builds) is not refreshed → DORA Deployment Frequency, Lead Time for Changes, and Change Failure Rate go stale.
- It recurs indefinitely.
on_hold approval jobs accumulate naturally, and each successful collection re-collects them at their real status, so the failure returns on the next run (see "Durability of the workaround" below).
Proposed fix
Alias the projected column in collector 2 so the templated field is populated, mirroring collector 1:
// backend/plugins/circleci/tasks/job_collector.go — "unfinished details" collector
dal.Select("DISTINCT workflow_id AS id"),
Alternatively, give the iterator input struct an explicit field for the workflow id and reference it in the template (e.g. "/v2/workflow/{{ .Input.WorkflowId }}/job"), so the two collectors don't silently depend on column aliasing. Either way the fix is one line in CollectJobs.
Workaround (operator-side, non-durable)
Terminalize the jobs in collector 2's query set so the "unfinished details" collector gets zero input rows and never builds the malformed URL:
UPDATE _tool_circleci_jobs
SET status = 'canceled'
WHERE status IN ('running', 'not_running', 'queued', 'on_hold');
After this, collectJobs completes successfully (verified — see below).
Durability of the workaround
The workaround lasts exactly one collection cycle. We verified the mechanism on our instance:
- Before fix:
12,370 jobs in the query set → collectJobs fails.
- After the
UPDATE (query set = 0) we triggered an incremental collection → the circleci task reached TASK_COMPLETED (no /v2/workflow//job error).
- However, that same collection's "new records" phase re-collected the workflows and left
9,013 jobs back in on_hold (our canceled count fell by ~8,800 as the upsert overwrote the workaround, plus freshly-collected on_hold jobs). The collection succeeded only because collector 2 had already iterated the empty set before re-population happened.
- Consequently the next collection's "unfinished details" phase queries those
9,013 on_hold jobs again and fails — so the operator must re-run the UPDATE before every collection. This treadmill is why an upstream code fix is needed.
What do you expect to happen
CircleCI sync will work without issues.
How to reproduce
- Configure a CircleCI connection + scope and run a first (full) collection —
_tool_circleci_jobs gets populated.
- Ensure at least one collected job is in a non-terminal status —
on_hold is the most common (any approval-gated workflow that is waiting for / never received manual approval), but running/queued/not_running also trigger it.
- Run the blueprint again (incremental). The
collectJobs subtask enters its "unfinished details" phase, iterates the non-terminal jobs, and calls GET /v2/workflow//job (empty id) → 500 → subtask collectJobs ended unexpectedly.
In practice this reproduces on every subsequent collection once approval-gated (on_hold) jobs exist, which for most CircleCI setups is continuously.
Anything else
No response
Version
v1.0.3-beta12
Are you willing to submit PR?
Code of Conduct
Search before asking
What happened
The CircleCI plugin's
collectJobssubtask fails on every incremental run whenever any CircleCI job is in a non-terminal status (running/not_running/queued/on_hold). The whole CircleCI collection aborts, so no CI/CD data (deployments, builds) is refreshed and DORA metrics go stale.The task fails with repeated HTTP 500s against a malformed URL that has an empty workflow id:
Root cause:
CollectJobsruns two API collectors that share the URL template"/v2/workflow/{{ .Input.Id }}/job". The first ("new records") collector populates.Input.Idcorrectly; the second ("unfinished details") collector's iterator query selectsDISTINCT workflow_id(notid), so the templated field.Input.Idis empty and DevLake requestsGET /v2/workflow//job, which CircleCI answers with500 Internal Server Error.Environment
apache/devlake:v1.0.3-beta12)circlecicircleci.comAPI v2)skipCollectors=false)Error / logs
Exact
error_name/messagerecorded on the failed task (_devlake_tasks,plugin='circleci',status='TASK_FAILED'):The message repeats the same
/v2/workflow//job500 once per affected workflow id (8+ combined messages in our case) before the subtask aborts.Root cause analysis
backend/plugins/circleci/tasks/job_collector.godefines the subtask:CollectJobscreates two API collectors, both using the same URL template"/v2/workflow/{{ .Input.Id }}/job".Collector 1 — "new records" (works). Its iterator selects
idfrom the workflow table, so the templated field.Input.Idis populated:Collector 2 — "unfinished details" (broken). Its iterator selects
DISTINCT workflow_id(the column isworkflow_id, notid), yet it reuses the same URL template referencing.Input.Id:Because the row scanned by collector 2 has no
Idfield set (the query projectedworkflow_id, which maps to the struct fieldWorkflowId, notId),{{ .Input.Id }}renders empty and the request path collapses to/v2/workflow//job. CircleCI returns500for the malformed path, the collector retries 3×, and the subtask fails — taking the entire CircleCI collection down with it.Evidence that this is a code bug, not missing data
The affected jobs all have a valid, non-empty
workflow_idin the DB — the empty id is introduced by the collector, not by the data.Status distribution of
_tool_circleci_jobsat the time of failure:Rows in collector 2's query set (
status IN ('running','not_running','queued','on_hold')), and how many have a validworkflow_id:A concrete example of an affected job (valid
workflow_id, yet the collector requests/v2/workflow//job):So the correct request would be
GET /v2/workflow/0021f5f7-0161-4845-a8ef-eca400e9de7c/job, but DevLake issuesGET /v2/workflow//job.Impact
on_holdapproval jobs accumulate naturally, and each successful collection re-collects them at their real status, so the failure returns on the next run (see "Durability of the workaround" below).Proposed fix
Alias the projected column in collector 2 so the templated field is populated, mirroring collector 1:
Alternatively, give the iterator input struct an explicit field for the workflow id and reference it in the template (e.g.
"/v2/workflow/{{ .Input.WorkflowId }}/job"), so the two collectors don't silently depend on column aliasing. Either way the fix is one line inCollectJobs.Workaround (operator-side, non-durable)
Terminalize the jobs in collector 2's query set so the "unfinished details" collector gets zero input rows and never builds the malformed URL:
After this,
collectJobscompletes successfully (verified — see below).Durability of the workaround
The workaround lasts exactly one collection cycle. We verified the mechanism on our instance:
12,370jobs in the query set →collectJobsfails.UPDATE(query set =0) we triggered an incremental collection → thecirclecitask reachedTASK_COMPLETED(no/v2/workflow//joberror).9,013jobs back inon_hold(ourcanceledcount fell by ~8,800 as the upsert overwrote the workaround, plus freshly-collectedon_holdjobs). The collection succeeded only because collector 2 had already iterated the empty set before re-population happened.9,013on_holdjobs again and fails — so the operator must re-run theUPDATEbefore every collection. This treadmill is why an upstream code fix is needed.What do you expect to happen
CircleCI sync will work without issues.
How to reproduce
_tool_circleci_jobsgets populated.on_holdis the most common (any approval-gated workflow that is waiting for / never received manual approval), butrunning/queued/not_runningalso trigger it.collectJobssubtask enters its "unfinished details" phase, iterates the non-terminal jobs, and callsGET /v2/workflow//job(empty id) →500→subtask collectJobs ended unexpectedly.In practice this reproduces on every subsequent collection once approval-gated (
on_hold) jobs exist, which for most CircleCI setups is continuously.Anything else
No response
Version
v1.0.3-beta12
Are you willing to submit PR?
Code of Conduct