fix: prevent CSV column inconsistency for gpu_count and gpu_model fields (fixes #1092)#1094
Merged
benoit-cty merged 1 commit intomlco2:masterfrom Mar 1, 2026
Conversation
dropna(axis=1, how="all") was applied to both the existing CSV data and the new row before concatenation. On CPU-only machines, gpu_count and gpu_model are always NaN, so the second write stripped those columns entirely. The third write then detected a schema mismatch via has_valid_headers(), triggering a spurious backup and new file, creating a repeating cycle of columns appearing and disappearing. - Remove dropna from existing DataFrame in the append path; apply only to new_df - Simplify task_out to avoid redundant empty DataFrame + concat - Default gpu_count to 0 and gpu_model to "" when no GPU is detected - Add regression tests for both None and zero-default GPU values
Closed
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1094 +/- ##
==========================================
- Coverage 78.19% 78.19% -0.01%
==========================================
Files 38 38
Lines 3637 3636 -1
==========================================
- Hits 2844 2843 -1
Misses 793 793 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes inconsistent CSV column output when running successive measurements on machines without a GPU (#1092). The
gpu_countandgpu_modelcolumns would disappear after the second write, triggering the warning "The CSV format has changed, backing up old emission file" on every subsequent run and creating unnecessary.bakfiles.Root Cause
In
FileOutput.out(), the append path applieddf.dropna(axis=1, how="all")to the existing CSV DataFrame. On CPU-only machines,gpu_countandgpu_modelareNone(NaN), so after the first write produced a CSV with those columns, the second write'sdropnastripped them from the existing data before re-saving. On the third write,has_valid_headers()detected the column mismatch and backed up the file, restarting the cycle.Changes
Primary fix —
codecarbon/output_methods/file.pyout()append path: Removeddf.dropna(axis=1, how="all")from the existing DataFrame. Onlynew_dfhasdropnaapplied (to suppress the pandas concat dtype warning).task_out(): Simplified by removing the redundant empty-DataFrame creation and doubledropna.Defense-in-depth — default GPU values
codecarbon/core/resource_tracker.py:set_GPU_tracking()now setsgpu_count=0andgpu_model=""in_confwhen no GPU is found, instead of leaving them unset.codecarbon/emissions_tracker.py:_prepare_emissions_data()uses fallback defaults of0and""forgpu_count/gpu_model.Tests —
tests/output_methods/test_file.pytest_file_output_out_append_no_gpu_consistent_columns: Verifies that successive appends withgpu_count=None/gpu_model=Nonenever trigger a backup or lose columns.test_file_output_out_append_no_gpu_zero_defaults: Verifies that the new default values (0/"") produce consistent CSV output across runs.Reproduction
On a CPU-only machine (or with NVML unavailable), run any tracker twice in append mode. Before this fix, the second run drops GPU columns; the third run detects a schema mismatch and creates a
.bakfile.Types of changes
What types of changes does your code introduce? Put an
xin all the boxes that apply:Checklist:
Go over all the following points, and put an
xin all the boxes that apply.