Use git ls-files in upload-ct-artifacts.sh to avoid phantom deletions#48
Closed
ntarakad-aws wants to merge 1 commit into
Closed
Conversation
The previous exclusion list (-x '.env*' -x '*.pem' -x '*.key' -x 'node_modules/*'
-x '.aws/*') filters out files by pattern regardless of whether they are tracked
in git. This creates a confusing experience for customers when those patterns
match TRACKED files: extracting the resulting code.zip and running 'git status'
shows them as deleted (because .git/ is preserved but the working tree files
were excluded by the upload).
Example: openapi-generator tracks test certificates (.pem files for the rust-server
template). After analysis, the customer's extracted artifact shows ~50 .pem files
as 'deleted' in git status, even though the analysis bot didn't touch them.
Fix: use 'git ls-files' (tracked files + non-gitignored new files) as the source
of truth for what to include. This:
- Respects the repo's .gitignore (node_modules, build dirs, .env if gitignored,
etc. continue to be excluded automatically)
- Includes the analysis bot's auto-committed output (e.g., ATXDocumentation/
on the result branch) since git ls-files reflects HEAD's contents
- Includes tracked files that happen to match the old patterns (test certs,
.envrc) without phantom deletes
- Preserves .git/ for git log / git diff review
- Falls back to conservative pattern-based exclusion if the repo is not a git
working tree (defensive — shouldn't occur post-clone, but keeps the script
safe for edge cases)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The current zip exclusion list:
filters by pattern regardless of whether files are tracked in git. When tracked files match a pattern, customers extracting code.zip see them as "deleted" in
git status:Real example:
openapi-generatortracks test certificates (.pemfiles for the rust-server template). After running tech-debt-comprehensive on it, the customer's extracted artifact shows ~50.pemfiles as deleted, even though the analysis bot didn't touch them. The exclusion list is removing them from the working tree, but.git/(correctly preserved for diff review) still references them.Fix
Use
git ls-filesas the source of truth for what to zip:{ git ls-files --recurse-submodules; \ git ls-files --others --exclude-standard; } | sort -u > /tmp/code-files.txt zip -q /tmp/code.zip -@ < /tmp/code-files.txt zip -qry /tmp/code.zip .gitThis:
.gitignore—node_modules/, build dirs,.env(if gitignored), etc. are excluded automaticallyATXDocumentation/on the result branch) sincegit ls-filesreflects HEAD's contents.envrc) without phantom deletes.git/forgit log/git diffreviewBehavior comparison
node_modules/.env.pemcerts.envrc.git/still has them.git/anyway — exclusion never hid them)Note on secret exclusion
The old exclusion list gave a false sense of security: if a real secret was committed and tracked, the file was "excluded" from the working tree but
.git/objects/(which IS in the zip) still contained it. The exclusion never actually hid secrets — it only created phantom deletes. The new approach is honest: whatever's in git history is in the artifact, period. Secret hygiene needs to happen at commit time (e.g.,git filter-branch, BFG), not at upload time.Testing
Local test on a repo with tracked .env / .pem files:
git statusafter extract shows tens of "deleted" filesgit statusafter extract shows clean working tree (or only the bot's actual changes)Customer-side workflow now works as expected: