Skip to content

fix: prevent CSV column inconsistency for gpu_count and gpu_model fields (fixes #1092)#1094

Merged
benoit-cty merged 1 commit intomlco2:masterfrom
ArmaanjeetSandhu:fix/csv-gpu-columns-dropna
Mar 1, 2026
Merged

fix: prevent CSV column inconsistency for gpu_count and gpu_model fields (fixes #1092)#1094
benoit-cty merged 1 commit intomlco2:masterfrom
ArmaanjeetSandhu:fix/csv-gpu-columns-dropna

Conversation

@ArmaanjeetSandhu
Copy link
Contributor

Summary

Fixes inconsistent CSV column output when running successive measurements on machines without a GPU (#1092). The gpu_count and gpu_model columns would disappear after the second write, triggering the warning "The CSV format has changed, backing up old emission file" on every subsequent run and creating unnecessary .bak files.

Root Cause

In FileOutput.out(), the append path applied df.dropna(axis=1, how="all") to the existing CSV DataFrame. On CPU-only machines, gpu_count and gpu_model are None (NaN), so after the first write produced a CSV with those columns, the second write's dropna stripped them from the existing data before re-saving. On the third write, has_valid_headers() detected the column mismatch and backed up the file, restarting the cycle.

Changes

Primary fix — codecarbon/output_methods/file.py

  • out() append path: Removed df.dropna(axis=1, how="all") from the existing DataFrame. Only new_df has dropna applied (to suppress the pandas concat dtype warning).
  • task_out(): Simplified by removing the redundant empty-DataFrame creation and double dropna.

Defense-in-depth — default GPU values

  • codecarbon/core/resource_tracker.py: set_GPU_tracking() now sets gpu_count=0 and gpu_model="" in _conf when no GPU is found, instead of leaving them unset.
  • codecarbon/emissions_tracker.py: _prepare_emissions_data() uses fallback defaults of 0 and "" for gpu_count/gpu_model.

Tests — tests/output_methods/test_file.py

  • test_file_output_out_append_no_gpu_consistent_columns: Verifies that successive appends with gpu_count=None / gpu_model=None never trigger a backup or lose columns.
  • test_file_output_out_append_no_gpu_zero_defaults: Verifies that the new default values (0 / "") produce consistent CSV output across runs.

Reproduction

On a CPU-only machine (or with NVML unavailable), run any tracker twice in append mode. Before this fix, the second run drops GPU columns; the third run detects a schema mismatch and creates a .bak file.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

dropna(axis=1, how="all") was applied to both the existing CSV data and
the new row before concatenation. On CPU-only machines, gpu_count and
gpu_model are always NaN, so the second write stripped those columns
entirely. The third write then detected a schema mismatch via
has_valid_headers(), triggering a spurious backup and new file, creating
a repeating cycle of columns appearing and disappearing.

- Remove dropna from existing DataFrame in the append path; apply only
  to new_df
- Simplify task_out to avoid redundant empty DataFrame + concat
- Default gpu_count to 0 and gpu_model to "" when no GPU is detected
- Add regression tests for both None and zero-default GPU values
@ArmaanjeetSandhu ArmaanjeetSandhu requested a review from a team as a code owner March 1, 2026 13:54
@benoit-cty benoit-cty linked an issue Mar 1, 2026 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.19%. Comparing base (db0d749) to head (4ff4107).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1094      +/-   ##
==========================================
- Coverage   78.19%   78.19%   -0.01%     
==========================================
  Files          38       38              
  Lines        3637     3636       -1     
==========================================
- Hits         2844     2843       -1     
  Misses        793      793              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@benoit-cty benoit-cty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot !

@benoit-cty benoit-cty merged commit 0eee57c into mlco2:master Mar 1, 2026
10 checks passed
@ArmaanjeetSandhu ArmaanjeetSandhu deleted the fix/csv-gpu-columns-dropna branch March 1, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random columns in CSV

2 participants