Skip to content

[Bug][plugin-circleci] collectJobs fails with HTTP 500 on /v2/workflow//job — the "unfinished details" collector selects workflow_id but the URL template uses {{ .Input.Id }} #8907

@zahorniak

Description

@zahorniak

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

The CircleCI plugin's collectJobs subtask fails on every incremental run whenever any CircleCI job is in a non-terminal status (running / not_running / queued / on_hold). The whole CircleCI collection aborts, so no CI/CD data (deployments, builds) is refreshed and DORA metrics go stale.

The task fails with repeated HTTP 500s against a malformed URL that has an empty workflow id:

/v2/workflow//job
         ^^ empty path segment

Root cause: CollectJobs runs two API collectors that share the URL template "/v2/workflow/{{ .Input.Id }}/job". The first ("new records") collector populates .Input.Id correctly; the second ("unfinished details") collector's iterator query selects DISTINCT workflow_id (not id), so the templated field .Input.Id is empty and DevLake requests GET /v2/workflow//job, which CircleCI answers with 500 Internal Server Error.

Environment

DevLake version v1.0.3-beta12 (image apache/devlake:v1.0.3-beta12)
Plugin circleci
Database MySQL 8.x
CircleCI CircleCI Cloud (circleci.com API v2)
Trigger Nightly blueprint (incremental, skipCollectors=false)

Error / logs

Exact error_name / message recorded on the failed task (_devlake_tasks, plugin='circleci', status='TASK_FAILED'):

subtask collectJobs ended unexpectedly
caused by: Retry exceeded 3 times calling /v2/workflow//job.
The last error was: Http DoAsync error calling [method:GET path:/v2/workflow//job query:map[]].
Response: {"message":"Internal Server Error"} (500)

The message repeats the same /v2/workflow//job 500 once per affected workflow id (8+ combined messages in our case) before the subtask aborts.

Root cause analysis

backend/plugins/circleci/tasks/job_collector.go defines the subtask:

var CollectJobsMeta = plugin.SubTaskMeta{
	Name:             "collectJobs",
	EntryPoint:       CollectJobs,
	EnabledByDefault: true,
	Description:      "collect circleci jobs",
	DomainTypes:      []string{plugin.DOMAIN_TYPE_CICD},
}

CollectJobs creates two API collectors, both using the same URL template "/v2/workflow/{{ .Input.Id }}/job".

Collector 1 — "new records" (works). Its iterator selects id from the workflow table, so the templated field .Input.Id is populated:

clauses := []dal.Clause{
	dal.Select("id, pipeline_id"),
	dal.From(&models.CircleciWorkflow{}),
	dal.Where("connection_id = ? and project_slug = ?",
		data.Options.ConnectionId, data.Options.ProjectSlug),
}
// ...
UrlTemplate: "/v2/workflow/{{ .Input.Id }}/job",

Collector 2 — "unfinished details" (broken). Its iterator selects DISTINCT workflow_id (the column is workflow_id, not id), yet it reuses the same URL template referencing .Input.Id:

clauses := []dal.Clause{
	dal.Select("DISTINCT workflow_id"),
	dal.From(&models.CircleciJob{}),
	dal.Where("connection_id = ? AND project_slug = ? AND status IN ('running', 'not_running', 'queued', 'on_hold')",
		data.Options.ConnectionId, data.Options.ProjectSlug),
}
// ...
UrlTemplate: "/v2/workflow/{{ .Input.Id }}/job",

Because the row scanned by collector 2 has no Id field set (the query projected workflow_id, which maps to the struct field WorkflowId, not Id), {{ .Input.Id }} renders empty and the request path collapses to /v2/workflow//job. CircleCI returns 500 for the malformed path, the collector retries 3×, and the subtask fails — taking the entire CircleCI collection down with it.

Evidence that this is a code bug, not missing data

The affected jobs all have a valid, non-empty workflow_id in the DB — the empty id is introduced by the collector, not by the data.

Status distribution of _tool_circleci_jobs at the time of failure:

status     jobs
-------    -----
success    74622
blocked    51314    <- NOT in the collector's status filter; not involved
canceled   17742
on_hold    12368    <- queried by collector 2
failed      3031
not_run     2801    <- note: 'not_run' (data) != 'not_running' (query); not matched
running        2    <- queried by collector 2

Rows in collector 2's query set (status IN ('running','not_running','queued','on_hold')), and how many have a valid workflow_id:

status     jobs    with_valid_workflow_id
-------    ----    ----------------------
on_hold    12368   12368
running        2       2
TOTAL      12370   12370   <- 100% have a valid workflow_id

A concrete example of an affected job (valid workflow_id, yet the collector requests /v2/workflow//job):

id           : b8719700-9035-4a96-8053-58596b802e79
workflow_id  : 0021f5f7-0161-4845-a8ef-eca400e9de7c   <- present and valid
project_slug : gh/elopage/elopage_web_client
status       : on_hold
started_at   : 2026-04-29 07:45:28

So the correct request would be GET /v2/workflow/0021f5f7-0161-4845-a8ef-eca400e9de7c/job, but DevLake issues GET /v2/workflow//job.

Impact

  • The entire CircleCI collection aborts on the failing subtask. CI/CD domain data (deployments/builds) is not refreshed → DORA Deployment Frequency, Lead Time for Changes, and Change Failure Rate go stale.
  • It recurs indefinitely. on_hold approval jobs accumulate naturally, and each successful collection re-collects them at their real status, so the failure returns on the next run (see "Durability of the workaround" below).

Proposed fix

Alias the projected column in collector 2 so the templated field is populated, mirroring collector 1:

// backend/plugins/circleci/tasks/job_collector.go — "unfinished details" collector
dal.Select("DISTINCT workflow_id AS id"),

Alternatively, give the iterator input struct an explicit field for the workflow id and reference it in the template (e.g. "/v2/workflow/{{ .Input.WorkflowId }}/job"), so the two collectors don't silently depend on column aliasing. Either way the fix is one line in CollectJobs.

Workaround (operator-side, non-durable)

Terminalize the jobs in collector 2's query set so the "unfinished details" collector gets zero input rows and never builds the malformed URL:

UPDATE _tool_circleci_jobs
SET status = 'canceled'
WHERE status IN ('running', 'not_running', 'queued', 'on_hold');

After this, collectJobs completes successfully (verified — see below).

Durability of the workaround

The workaround lasts exactly one collection cycle. We verified the mechanism on our instance:

  1. Before fix: 12,370 jobs in the query set → collectJobs fails.
  2. After the UPDATE (query set = 0) we triggered an incremental collection → the circleci task reached TASK_COMPLETED (no /v2/workflow//job error).
  3. However, that same collection's "new records" phase re-collected the workflows and left 9,013 jobs back in on_hold (our canceled count fell by ~8,800 as the upsert overwrote the workaround, plus freshly-collected on_hold jobs). The collection succeeded only because collector 2 had already iterated the empty set before re-population happened.
  4. Consequently the next collection's "unfinished details" phase queries those 9,013 on_hold jobs again and fails — so the operator must re-run the UPDATE before every collection. This treadmill is why an upstream code fix is needed.

What do you expect to happen

CircleCI sync will work without issues.

How to reproduce

  1. Configure a CircleCI connection + scope and run a first (full) collection — _tool_circleci_jobs gets populated.
  2. Ensure at least one collected job is in a non-terminal status — on_hold is the most common (any approval-gated workflow that is waiting for / never received manual approval), but running/queued/not_running also trigger it.
  3. Run the blueprint again (incremental). The collectJobs subtask enters its "unfinished details" phase, iterates the non-terminal jobs, and calls GET /v2/workflow//job (empty id) → 500subtask collectJobs ended unexpectedly.

In practice this reproduces on every subsequent collection once approval-gated (on_hold) jobs exist, which for most CircleCI setups is continuously.

Anything else

No response

Version

v1.0.3-beta12

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThis issue is a bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions