Skip to content

[Draft] AMD ROCm support#1072

Draft
benoit-cty wants to merge 28 commits intomasterfrom
feat/rocm
Draft

[Draft] AMD ROCm support#1072
benoit-cty wants to merge 28 commits intomasterfrom
feat/rocm

Conversation

@benoit-cty
Copy link
Contributor

Description

Continuing #490

Related Issue

Please link to the issue this PR resolves: [issue #178 ]

Motivation and Context

AMD GPU are not yet supported.

How Has This Been Tested?

Using Adastra supercomputer. With AMD MI250 GPUs.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@benoit-cty
Copy link
Contributor Author

I manage to make CodeCarbon works on Adastra and upgrade the @IlyasMoutawwakil code to support more recent version of the amdsmi package.

There is still work to do as the metrics are weird:

[codecarbon INFO @ 17:04:13] Energy consumed for all GPUs : 4.254300 kWh. Total GPU Power : 12572969.923258875 W

value = os.environ.get("CUDA_VISIBLE_DEVICES")
elif value is None and os.environ.get("ROCR_VISIBLE_DEVICES"):
value = os.environ.get("ROCR_VISIBLE_DEVICES")
logger.debug(f"_set_from_conf() gpu_ids: {value}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.
This expression logs
sensitive data (password)
as clear text.

Copilot Autofix

AI about 1 hour ago

In general, the problem is that _set_from_conf logs the raw value it derives from configuration with logger.debug(f"_set_from_conf() gpu_ids: {value}"). Since _set_from_conf is a generic helper that can process both sensitive (e.g., api_key) and non-sensitive fields, logging the raw value at this central point is unsafe: misconfiguration, refactoring, or future calls might cause secrets to be logged in clear text. The best way to fix this without changing user-visible functionality is to avoid logging the potentially sensitive value at all, or to log only non-sensitive metadata (such as whether a value is present, its type, or a redacted form). Because GPU IDs are not credentials but still may be considered semi-sensitive in some environments, a conservative approach is to avoid printing them in full here.

Concretely, within BaseEmissionsTracker._set_from_conf in codecarbon/emissions_tracker.py, in the if name == "gpu_ids": block, we should replace the final debug statement logger.debug(f"_set_from_conf() gpu_ids: {value}") with a version that does not include the raw value. For example, we can log only whether value is None, or the number of IDs, or a generic message. This preserves the diagnostic intent (indicating that GPU IDs were set or resolved) without risking cleartext leakage of any token that might, now or in the future, flow into value at this point. No new imports or helper methods are required; we just modify the existing logging call.


Suggested changeset 1
codecarbon/emissions_tracker.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/codecarbon/emissions_tracker.py b/codecarbon/emissions_tracker.py
--- a/codecarbon/emissions_tracker.py
+++ b/codecarbon/emissions_tracker.py
@@ -155,7 +155,11 @@
                 value = os.environ.get("CUDA_VISIBLE_DEVICES")
             elif value is None and os.environ.get("ROCR_VISIBLE_DEVICES"):
                 value = os.environ.get("ROCR_VISIBLE_DEVICES")
-            logger.debug(f"_set_from_conf() gpu_ids: {value}")
+            # Do not log the raw gpu_ids value to avoid leaking potentially sensitive data
+            logger.debug(
+                "_set_from_conf() gpu_ids configured; "
+                f"type={type(value).__name__}, is_none={value is None}"
+            )
         # store final value
         self._conf[name] = value
         # set `self._{name}` to `value`
EOF
@@ -155,7 +155,11 @@
value = os.environ.get("CUDA_VISIBLE_DEVICES")
elif value is None and os.environ.get("ROCR_VISIBLE_DEVICES"):
value = os.environ.get("ROCR_VISIBLE_DEVICES")
logger.debug(f"_set_from_conf() gpu_ids: {value}")
# Do not log the raw gpu_ids value to avoid leaking potentially sensitive data
logger.debug(
"_set_from_conf() gpu_ids configured; "
f"type={type(value).__name__}, is_none={value is None}"
)
# store final value
self._conf[name] = value
# set `self._{name}` to `value`
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants