Skip to content

Shared Hardware Resource Pool for Compute Environments#1390

Open
alexcos20 wants to merge 8 commits into
next-4from
feature/shared_hw_resources
Open

Shared Hardware Resource Pool for Compute Environments#1390
alexcos20 wants to merge 8 commits into
next-4from
feature/shared_hw_resources

Conversation

@alexcos20

@alexcos20 alexcos20 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Closing #1388

Shared hardware resource pool for compute environments

Summary

  • Hardware resources (GPUs, NICs, custom devices) are now defined once at the Docker-connection level and shared across all compute environments, instead of being duplicated inside each environment.
  • Each environment now references pool resources by id via lightweight refs { id, total?, min?, max? }, removing the need to repeat hardware metadata per environment.
  • cpu, ram, and disk are auto-detected from the host at startup — no need to declare them unless you want to cap/override the detected value.
  • Clean break: the old format (hardware fields inside environment resources) is rejected at startup with a clear migration message.

What changed

Resource tracking

  • Added kind field ("discrete" | "fungible") to formalize resource behavior. GPUs/FPGAs are discrete (exclusive or shareable); CPU/RAM/disk are fungible.
  • Added shareable boolean for discrete resources — allows NICs, TPMs, and HSMs to be used by multiple jobs simultaneously while GPUs/FPGAs remain exclusive.
  • Dual-gate availability for fungible resources: Gate 1 enforces the per-environment ceiling, Gate 2 enforces the engine-wide physical pool. Both must pass.
  • Global tracking of discrete resources is now driven by kind === 'discrete' instead of type === 'gpu', so any hardware with an init block is tracked correctly.

Resource constraints

  • Constraints defined on a pool resource (e.g. "renting this GPU requires at least 4 GB RAM") are automatically inherited by all environments that reference it.
  • Per-environment constraint overrides supported via EnvironmentResourceRef.constraints — replaces the pool constraints entirely for that environment. Set [] to remove all constraints for an env.

CPU affinity

  • All environments now share the full physical core pool dynamically. The old static partitioning (cpuOffset) is removed. allocateCpus() already prevents conflicts via in-use tracking.

Config & docs

  • config.json, .env.example, and scripts/ocean-node-quickstart.sh updated to the new two-level format.
  • GPU auto-detection in the quickstart script now emits one resource per physical GPU (single DeviceID) instead of grouping them.
  • docs/GPU.md, docs/env.md, docs/compute-pricing.md, and README.md fully updated with examples, field references, migration guide, and security notes.

Tests — 40 new unit tests covering: pool resolution, kind inference, dual-gate blocking, discrete exclusive/shareable behavior, constraint-driven exhaustion, per-env constraint override, schema clean-break validation.

Migration

Move hardware fields from inside environments[].resources up to the connection-level resources array. Environment resources become lightweight refs:

// Before
"environments": [{ "resources": [{ "id": "myGPU", "total": 1, "init": {...} }] }]

// After
"resources": [{ "id": "myGPU", "kind": "discrete", "total": 1, "init": {...} }],
"environments": [{ "resources": [{ "id": "myGPU" }] }]

cpu, ram, and disk no longer need to be declared — they are auto-detected from the host.

No breaking API/P2P changes

The ComputeEnvironment wire format returned by getComputeEnvironments is unchanged. All existing P2P commands and start-compute handlers work without modification.

Summary by CodeRabbit

  • New Features

    • Introduced a two-level resource configuration model where hardware resources are defined at the Docker connection level and referenced by compute environments, enabling better resource sharing across environments.
    • Added resource kind classification ("discrete" and "fungible") for improved resource tracking and allocation behavior.
  • Documentation

    • Expanded and restructured guides covering the new resource configuration architecture, GPU setup, compute pricing, and environment variables.
    • Added migration guidance for updating existing configurations to the new model.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@alexcos20

Copy link
Copy Markdown
Member Author

/run-security-scan

@alexcos20 alexcos20 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: low

Summary:
This PR introduces a robust architectural refactoring of hardware resource management. By separating physical hardware definitions (connection-level pools) from environment logic (business rules and soft limits), it enables features like resource sharing across pricing tiers. The implementation of the dual-gate logic (environment ceiling vs. global pool ceiling) is elegant, well-documented, and thoroughly tested. I only found a few minor bugs regarding TypeScript optional chaining, inference consistency, and Zod schema error paths.

Comments:
• [WARNING][bug] Because the if condition uses optional chaining (res.init?.deviceRequests?.DeviceIDs?.length > 1), TypeScript type narrowing might not carry over to the string template if the compiler assumes init or deviceRequests could still be undefined. This could lead to a compilation or runtime error. Use optional chaining in the interpolated string as well.

-            `Resource "${res.id}": DeviceIDs has ${res.init.deviceRequests.DeviceIDs.length} entries. ` +
+            `Resource "${res.id}": DeviceIDs has ${res.init?.deviceRequests?.DeviceIDs?.length} entries. ` +

• [WARNING][logic] When auto-generating the benchmark environment, the code filters for res.kind !== 'fungible'. However, if a user provided a custom GPU resource without explicitly specifying the kind field, res.kind will be undefined, which evaluates to !== 'fungible'. To ensure it respects the exact same inference logic used throughout the rest of the application, it's safer to use the resolveResourceKind method here.

     const gpuResources: ComputeResource[] = (envConfig.resources ?? []).filter(
       (res: ComputeResource) =>
         res.id !== 'cpu' &&
         res.id !== 'ram' &&
         res.id !== 'disk' &&
-        res.kind !== 'fungible'
+        this.resolveResourceKind(res) !== 'fungible'
     )

• [INFO][style] When checking for invalid shareable: true properties, you can include the array index i in the Zod error path. This greatly improves error logs, helping users trace the exact misconfigured resource object.

-      ;(dockerConfig.resources ?? []).forEach((res) => {
+      ;(dockerConfig.resources ?? []).forEach((res, i) => {
         if (res.shareable === true && (res.type === 'gpu' || res.type === 'fpga')) {
           ctx.addIssue({
             code: z.ZodIssueCode.custom,
             message: `Resource "${res.id}": shareable:true is not allowed for type "${res.type}" — GPUs and FPGAs require exclusive access per job`,
-            path: ['resources']
+            path: ['resources', i]
           })
         }
       })

• [INFO][other] Excellent work writing comprehensive unit tests. The additions effectively cover the edge cases for global vs. environment ceilings, shared discrete resources, and constraint mapping.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8980021f-b1e4-4bb5-a615-80c0294f0dfa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/shared_hw_resources

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
src/utils/config/schemas.ts (1)

306-314: 💤 Low value

Error message mentions platform but the condition doesn't check for it.

The condition only checks init and driverVersion, but the error message also mentions platform. Either add platform to the condition or update the message to match what's actually checked.

🔧 Proposed fix to align condition with message
           if (
             (ref as any).init !== undefined ||
-            (ref as any).driverVersion !== undefined
+            (ref as any).driverVersion !== undefined ||
+            (ref as any).platform !== undefined
           ) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/config/schemas.ts` around lines 306 - 314, The error message in the
validation check for hardware fields mentions `platform` and "etc." but the
condition in the if statement only checks for `init` and `driverVersion`. Either
add `platform` to the condition alongside the existing checks for `init` and
`driverVersion`, or update the error message to only reference the fields that
are actually being validated. Ensure consistency between what the code validates
and what the error message claims to validate.
scripts/ocean-node-quickstart.sh (1)

652-655: ⚡ Quick win

Deduplicate GPU ids before merge.

Line 652 appends detected GPUs/refs unconditionally. If the config already includes the same ids, duplicates are introduced and later id-based resolution can become ambiguous. De-dup on .id during merge.

Suggested patch
-    DOCKER_COMPUTE_ENVIRONMENTS=$(echo "$DOCKER_COMPUTE_ENVIRONMENTS" | jq --argjson gpus "$DETECTED_GPUS" '
-      .[0].resources += $gpus |
-      .[0].environments[0].resources += ($gpus | map({ id: .id }))
-    ')
+    DOCKER_COMPUTE_ENVIRONMENTS=$(echo "$DOCKER_COMPUTE_ENVIRONMENTS" | jq --argjson gpus "$DETECTED_GPUS" '
+      .[0].resources = (((.[0].resources // []) + $gpus) | unique_by(.id)) |
+      .[0].environments[0].resources = (((.[0].environments[0].resources // []) + ($gpus | map({ id: .id }))) | unique_by(.id))
+    ')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/ocean-node-quickstart.sh` around lines 652 - 655, The jq filter in
the DOCKER_COMPUTE_ENVIRONMENTS assignment unconditionally appends detected GPUs
to both .[0].resources and .[0].environments[0].resources without checking for
duplicates. If the config already includes GPUs with the same ids, this creates
duplicate entries and causes ambiguity in later id-based resolution. Modify the
jq filter to deduplicate on the .id field by applying unique_by(.id) to both the
resources array additions after the += operations that merge $gpus and the GPU
id mappings, ensuring that only unique GPU IDs are retained in the final merged
result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/compute-pricing.md`:
- Around line 44-59: The fenced code block containing the
DOCKER_COMPUTE_ENVIRONMENTS tree structure is missing a language tag, which
violates markdownlint rule MD040. Add `text` as the language specifier to the
opening triple backticks (change ` ``` ` to ` ```text `) at the start of the
DOCKER_COMPUTE_ENVIRONMENTS code block to satisfy the linting requirement.

In `@docs/env.md`:
- Around line 196-215: The markdown heading for "Connection-level fields" is
using a four-level heading (####) which skips the three-level heading (###) in
the hierarchy, violating proper markdown structure. Change the "Connection-level
fields" heading from #### to ### to create the correct hierarchical progression.
Additionally, check the "Environment-level fields" heading that appears at the
end of the diff section and apply the same correction if it also uses ####
instead of ###, ensuring all subsection headers follow consistent three-level
heading formatting.

In `@src/components/c2d/compute_engine_base.ts`:
- Around line 438-451: The double-counting issue occurs because getUsedResources
aggregates discrete resource usage globally into a single usedResources dict,
and then compute_engine_docker.ts assigns that same global value to each
environment's inUse field, causing checkGlobalResourceAvailability to sum the
same usage across all environments. To verify and expose this bug, add a test
case with multiple environments (e.g., env1 and env2) that both actively use the
same discrete resource (e.g., gpu0) with non-zero inUse values, ensuring the
test validates that the global availability check does not double-count the
discrete resource usage when summing across environments.

In `@src/components/c2d/compute_engine_docker.ts`:
- Around line 303-305: The filter for discrete resources in the
computeEngineDocker.ts file only checks the explicit `res.kind === 'discrete'`
property, but it misses resources that should be inferred as discrete (like GPUs
with `init` set) because the inference logic in `resolveConnectionResourcePool`
hasn't run yet at that point. Instead of just checking the explicit kind
property, reuse the same inference logic that determines whether a
ComputeResource is discrete implicitly, so that user-configured GPUs without an
explicit kind declaration are still properly identified and included in the
discreteResources array.
- Around line 237-241: The code in compute_engine_docker.ts unconditionally uses
res.total in the Math.min() call on line 241, but the ComputeResourceSchema
allows total to be optional, which will result in NaN if undefined. Add a guard
condition to check if res.total is defined before using it in Math.min(), and
provide a sensible default value (such as using cap or skipping the assignment)
when res.total is undefined to prevent silent corruption of the resource pool.

In `@src/test/unit/compute.test.ts`:
- Around line 824-850: The test for the old format rejection in the it block
starting at line 824 only asserts that result.success is false, but does not
verify the specific error reason for the failure. This means the test could pass
incorrectly if the config is rejected due to an unrelated rule like
unknown-pool-id instead of the legacy init detection. Add an assertion on the
result.error or result.issues to check that the specific legacy-format or
migration-related error message is present in the failure details, ensuring that
the test validates the actual root cause of rejection rather than just any parse
failure.

---

Nitpick comments:
In `@scripts/ocean-node-quickstart.sh`:
- Around line 652-655: The jq filter in the DOCKER_COMPUTE_ENVIRONMENTS
assignment unconditionally appends detected GPUs to both .[0].resources and
.[0].environments[0].resources without checking for duplicates. If the config
already includes GPUs with the same ids, this creates duplicate entries and
causes ambiguity in later id-based resolution. Modify the jq filter to
deduplicate on the .id field by applying unique_by(.id) to both the resources
array additions after the += operations that merge $gpus and the GPU id
mappings, ensuring that only unique GPU IDs are retained in the final merged
result.

In `@src/utils/config/schemas.ts`:
- Around line 306-314: The error message in the validation check for hardware
fields mentions `platform` and "etc." but the condition in the if statement only
checks for `init` and `driverVersion`. Either add `platform` to the condition
alongside the existing checks for `init` and `driverVersion`, or update the
error message to only reference the fields that are actually being validated.
Ensure consistency between what the code validates and what the error message
claims to validate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e50da9f-5607-4565-b33f-647c1a55663c

📥 Commits

Reviewing files that changed from the base of the PR and between cd3f609 and 367a606.

📒 Files selected for processing (13)
  • .env.example
  • CLAUDE.md
  • README.md
  • config.json
  • docs/GPU.md
  • docs/compute-pricing.md
  • docs/env.md
  • scripts/ocean-node-quickstart.sh
  • src/@types/C2D/C2D.ts
  • src/components/c2d/compute_engine_base.ts
  • src/components/c2d/compute_engine_docker.ts
  • src/test/unit/compute.test.ts
  • src/utils/config/schemas.ts

Comment thread docs/compute-pricing.md
Comment on lines +44 to +59
```
DOCKER_COMPUTE_ENVIRONMENTS
└── [ Docker connection ] ← socketPath, resources[], environments[]
├── resources[] ← full hardware definitions (CPU, RAM, disk, GPU, …)
│ ├── { id: "cpu", total: 6 } (optional: caps auto-detected value)
│ ├── { id: "ram", total: 28 } (optional: caps auto-detected value)
│ ├── { id: "disk", total: 80 } (optional: caps auto-detected value)
│ └── { id: "gpu0", kind: "discrete", … } (required for custom hardware)
└── environments[] ← one or more compute environments
└── { id, fees, resources[], free? }
└── resources[] ← lightweight refs to the pool above
├── { id: "cpu", total: 4, min: 1, max: 4 }
├── { id: "ram", total: 16, min: 1, max: 8 }
├── { id: "disk", max: 20 }
└── { id: "gpu0" }
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced block.

Line 44 uses an unlabeled fenced code block; use text to satisfy markdownlint (MD040).

Suggested patch
-```
+```text
 DOCKER_COMPUTE_ENVIRONMENTS
 └── [ Docker connection ]          ← socketPath, resources[], environments[]
@@
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
DOCKER_COMPUTE_ENVIRONMENTS
└── [ Docker connection ] ← socketPath, resources[], environments[]
├── resources[] ← full hardware definitions (CPU, RAM, disk, GPU, …)
│ ├── { id: "cpu", total: 6 } (optional: caps auto-detected value)
│ ├── { id: "ram", total: 28 } (optional: caps auto-detected value)
│ ├── { id: "disk", total: 80 } (optional: caps auto-detected value)
│ └── { id: "gpu0", kind: "discrete", … } (required for custom hardware)
└── environments[] ← one or more compute environments
└── { id, fees, resources[], free? }
└── resources[] ← lightweight refs to the pool above
├── { id: "cpu", total: 4, min: 1, max: 4 }
├── { id: "ram", total: 16, min: 1, max: 8 }
├── { id: "disk", max: 20 }
└── { id: "gpu0" }
```
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 44-44: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/compute-pricing.md` around lines 44 - 59, The fenced code block
containing the DOCKER_COMPUTE_ENVIRONMENTS tree structure is missing a language
tag, which violates markdownlint rule MD040. Add `text` as the language
specifier to the opening triple backticks (change ` ``` ` to ` ```text `) at the
start of the DOCKER_COMPUTE_ENVIRONMENTS code block to satisfy the linting
requirement.

Source: Linters/SAST tools

Comment thread docs/env.md
Comment on lines +196 to +215
#### Connection-level fields

- **socketPath** / **host** / **port** / **protocol** / **caPath** / **certPath** / **keyPath**: Docker connection settings.
- **scanImages**: Scan algorithm images for vulnerabilities with Trivy. Default: `false`
- **scanImageDBUpdateInterval**: Vulnerability DB update interval in seconds. Default: `43200` (12 hours)
- **imageRetentionDays**: How long to keep Docker images, in days. Default: `7`
- **imageCleanupInterval**: Image cleanup interval in seconds. Min: `3600`, Default: `86400`
- **paymentClaimInterval**: Payment claim interval in seconds. Min: `60`, Default: `3600`
- **resources** *(optional)*: Hardware resource pool for this connection. `cpu`, `ram`, and `disk` are auto-detected from the host — include them only to cap their totals or to add custom resources (GPUs, NICs, etc.).
- **id**: Resource identifier. `cpu`, `ram`, `disk` are built-in; any other string defines a custom resource.
- **kind**: `"discrete"` (non-fungible device, e.g. GPU) or `"fungible"` (interchangeable units, e.g. CPU). Auto-inferred: `"discrete"` if `init` is present, `"fungible"` otherwise.
- **shareable** *(discrete only)*: `true` allows multiple jobs to use the device simultaneously (NIC, TPM). Default: `false`. **Not allowed** on `type: "gpu"` or `type: "fpga"`.
- **total**: Total units available. Capped at the physical host limit.
- **min** / **max**: Per-job minimum/maximum.
- **description**, **platform**, **driverVersion**, **memoryTotal**: Informational metadata.
- **init**: Docker device configuration (`deviceRequests` for NVIDIA, `advanced` for AMD/Intel). See [GPU Guide](GPU.md).
- **constraints**: Cross-resource requirements. `{ "id": "ram", "min": 4 }` means renting this resource also requires 4 GB RAM.

#### Environment-level fields

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix heading level increment in the compute section.

Line 196 jumps from ## to ####. Use ### for these subsection headers to satisfy markdown heading hierarchy.

Suggested patch
-#### Connection-level fields
+### Connection-level fields
@@
-#### Environment-level fields
+### Environment-level fields
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#### Connection-level fields
- **socketPath** / **host** / **port** / **protocol** / **caPath** / **certPath** / **keyPath**: Docker connection settings.
- **scanImages**: Scan algorithm images for vulnerabilities with Trivy. Default: `false`
- **scanImageDBUpdateInterval**: Vulnerability DB update interval in seconds. Default: `43200` (12 hours)
- **imageRetentionDays**: How long to keep Docker images, in days. Default: `7`
- **imageCleanupInterval**: Image cleanup interval in seconds. Min: `3600`, Default: `86400`
- **paymentClaimInterval**: Payment claim interval in seconds. Min: `60`, Default: `3600`
- **resources** *(optional)*: Hardware resource pool for this connection. `cpu`, `ram`, and `disk` are auto-detected from the host — include them only to cap their totals or to add custom resources (GPUs, NICs, etc.).
- **id**: Resource identifier. `cpu`, `ram`, `disk` are built-in; any other string defines a custom resource.
- **kind**: `"discrete"` (non-fungible device, e.g. GPU) or `"fungible"` (interchangeable units, e.g. CPU). Auto-inferred: `"discrete"` if `init` is present, `"fungible"` otherwise.
- **shareable** *(discrete only)*: `true` allows multiple jobs to use the device simultaneously (NIC, TPM). Default: `false`. **Not allowed** on `type: "gpu"` or `type: "fpga"`.
- **total**: Total units available. Capped at the physical host limit.
- **min** / **max**: Per-job minimum/maximum.
- **description**, **platform**, **driverVersion**, **memoryTotal**: Informational metadata.
- **init**: Docker device configuration (`deviceRequests` for NVIDIA, `advanced` for AMD/Intel). See [GPU Guide](GPU.md).
- **constraints**: Cross-resource requirements. `{ "id": "ram", "min": 4 }` means renting this resource also requires 4 GB RAM.
#### Environment-level fields
### Connection-level fields
- **socketPath** / **host** / **port** / **protocol** / **caPath** / **certPath** / **keyPath**: Docker connection settings.
- **scanImages**: Scan algorithm images for vulnerabilities with Trivy. Default: `false`
- **scanImageDBUpdateInterval**: Vulnerability DB update interval in seconds. Default: `43200` (12 hours)
- **imageRetentionDays**: How long to keep Docker images, in days. Default: `7`
- **imageCleanupInterval**: Image cleanup interval in seconds. Min: `3600`, Default: `86400`
- **paymentClaimInterval**: Payment claim interval in seconds. Min: `60`, Default: `3600`
- **resources** *(optional)*: Hardware resource pool for this connection. `cpu`, `ram`, and `disk` are auto-detected from the host — include them only to cap their totals or to add custom resources (GPUs, NICs, etc.).
- **id**: Resource identifier. `cpu`, `ram`, `disk` are built-in; any other string defines a custom resource.
- **kind**: `"discrete"` (non-fungible device, e.g. GPU) or `"fungible"` (interchangeable units, e.g. CPU). Auto-inferred: `"discrete"` if `init` is present, `"fungible"` otherwise.
- **shareable** *(discrete only)*: `true` allows multiple jobs to use the device simultaneously (NIC, TPM). Default: `false`. **Not allowed** on `type: "gpu"` or `type: "fpga"`.
- **total**: Total units available. Capped at the physical host limit.
- **min** / **max**: Per-job minimum/maximum.
- **description**, **platform**, **driverVersion**, **memoryTotal**: Informational metadata.
- **init**: Docker device configuration (`deviceRequests` for NVIDIA, `advanced` for AMD/Intel). See [GPU Guide](GPU.md).
- **constraints**: Cross-resource requirements. `{ "id": "ram", "min": 4 }` means renting this resource also requires 4 GB RAM.
### Environment-level fields
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 196-196: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/env.md` around lines 196 - 215, The markdown heading for
"Connection-level fields" is using a four-level heading (####) which skips the
three-level heading (###) in the hierarchy, violating proper markdown structure.
Change the "Connection-level fields" heading from #### to ### to create the
correct hierarchical progression. Additionally, check the "Environment-level
fields" heading that appears at the end of the diff section and apply the same
correction if it also uses #### instead of ###, ensuring all subsection headers
follow consistent three-level heading formatting.

Source: Linters/SAST tools

Comment thread src/components/c2d/compute_engine_base.ts
Comment thread src/components/c2d/compute_engine_docker.ts Outdated
Comment thread src/components/c2d/compute_engine_docker.ts
Comment thread src/test/unit/compute.test.ts
alexcos20 and others added 4 commits June 14, 2026 12:38
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/components/c2d/compute_engine_base.ts (1)

461-467: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing null guard for envResource.inUse on free resource check.

Line 452 correctly uses (envResource.inUse ?? 0) for the fungible gate, but line 465 accesses envResource.inUse without the nullish coalescing operator. If inUse is undefined (e.g., no free jobs currently running), the subtraction yields NaN, and NaN < request.amount is always false—incorrectly allowing the request through.

🐛 Proposed fix
       if (isFree) {
         if (!env.free) throw new Error(`No free resources`)
         envResource = this.getResource(env.free?.resources, request.id)
         if (!envResource) throw new Error(`No such free resource ${request.id}`)
-        if (envResource.total - envResource.inUse < request.amount)
+        if (envResource.total - (envResource.inUse ?? 0) < request.amount)
           throw new Error(`Not enough available ${request.id} for free`)
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/c2d/compute_engine_base.ts` around lines 461 - 467, The free
resource availability check in the isFree block is missing a nullish coalescing
operator on envResource.inUse, which can result in NaN when inUse is undefined,
causing the validation to always pass. Fix the line that calculates
`envResource.total - envResource.inUse < request.amount` by applying the nullish
coalescing operator to envResource.inUse (use `?? 0` as done elsewhere in the
code) to ensure undefined inUse values are treated as 0 and the comparison works
correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/components/c2d/compute_engine_base.ts`:
- Around line 461-467: The free resource availability check in the isFree block
is missing a nullish coalescing operator on envResource.inUse, which can result
in NaN when inUse is undefined, causing the validation to always pass. Fix the
line that calculates `envResource.total - envResource.inUse < request.amount` by
applying the nullish coalescing operator to envResource.inUse (use `?? 0` as
done elsewhere in the code) to ensure undefined inUse values are treated as 0
and the comparison works correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 69b6f1bc-0fde-4bf2-8ba5-14cfd8bfb5f4

📥 Commits

Reviewing files that changed from the base of the PR and between 367a606 and 9461e21.

📒 Files selected for processing (3)
  • src/components/c2d/compute_engine_base.ts
  • src/components/c2d/compute_engine_docker.ts
  • src/test/unit/compute.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/components/c2d/compute_engine_docker.ts
  • src/test/unit/compute.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant