Conversation
[Version 3] AMD GPUs support
|
I manage to make CodeCarbon works on Adastra and upgrade the @IlyasMoutawwakil code to support more recent version of the amdsmi package. There is still work to do as the metrics are weird:
|
| value = os.environ.get("CUDA_VISIBLE_DEVICES") | ||
| elif value is None and os.environ.get("ROCR_VISIBLE_DEVICES"): | ||
| value = os.environ.get("ROCR_VISIBLE_DEVICES") | ||
| logger.debug(f"_set_from_conf() gpu_ids: {value}") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 hour ago
In general, the problem is that _set_from_conf logs the raw value it derives from configuration with logger.debug(f"_set_from_conf() gpu_ids: {value}"). Since _set_from_conf is a generic helper that can process both sensitive (e.g., api_key) and non-sensitive fields, logging the raw value at this central point is unsafe: misconfiguration, refactoring, or future calls might cause secrets to be logged in clear text. The best way to fix this without changing user-visible functionality is to avoid logging the potentially sensitive value at all, or to log only non-sensitive metadata (such as whether a value is present, its type, or a redacted form). Because GPU IDs are not credentials but still may be considered semi-sensitive in some environments, a conservative approach is to avoid printing them in full here.
Concretely, within BaseEmissionsTracker._set_from_conf in codecarbon/emissions_tracker.py, in the if name == "gpu_ids": block, we should replace the final debug statement logger.debug(f"_set_from_conf() gpu_ids: {value}") with a version that does not include the raw value. For example, we can log only whether value is None, or the number of IDs, or a generic message. This preserves the diagnostic intent (indicating that GPU IDs were set or resolved) without risking cleartext leakage of any token that might, now or in the future, flow into value at this point. No new imports or helper methods are required; we just modify the existing logging call.
| @@ -155,7 +155,11 @@ | ||
| value = os.environ.get("CUDA_VISIBLE_DEVICES") | ||
| elif value is None and os.environ.get("ROCR_VISIBLE_DEVICES"): | ||
| value = os.environ.get("ROCR_VISIBLE_DEVICES") | ||
| logger.debug(f"_set_from_conf() gpu_ids: {value}") | ||
| # Do not log the raw gpu_ids value to avoid leaking potentially sensitive data | ||
| logger.debug( | ||
| "_set_from_conf() gpu_ids configured; " | ||
| f"type={type(value).__name__}, is_none={value is None}" | ||
| ) | ||
| # store final value | ||
| self._conf[name] = value | ||
| # set `self._{name}` to `value` |
Description
Continuing #490
Related Issue
Please link to the issue this PR resolves: [issue #178 ]
Motivation and Context
AMD GPU are not yet supported.
How Has This Been Tested?
Using Adastra supercomputer. With AMD MI250 GPUs.
Types of changes
What types of changes does your code introduce? Put an
xin all the boxes that apply:Checklist:
Go over all the following points, and put an
xin all the boxes that apply.