Skip to content

feat(remote): Add checksum path filters for download and extract tasks (fixes #68).#113

Open
Bill-hbrhbr wants to merge 2 commits intoy-scope:mainfrom
Bill-hbrhbr:add-checksum-path-patterns-for-download-and-extract
Open

feat(remote): Add checksum path filters for download and extract tasks (fixes #68).#113
Bill-hbrhbr wants to merge 2 commits intoy-scope:mainfrom
Bill-hbrhbr:add-checksum-path-patterns-for-download-and-extract

Conversation

@Bill-hbrhbr
Copy link
Copy Markdown
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Apr 23, 2026

Description

(closes #69 )

For npm packages, sometimes we only want to checksum a subset of the downloaded source, or ignore
in-place updates in known locations such as node_modules. node_modules should live inside the source tree because package managers like npm and Yarn install relative to the project root, and many tools assume this layout.

However, these updates can cause checksum drift. Adding CHECKSUM_EXCLUDE_PATTERNS to download and extract tasks is the most direct and practical way to handle this. Beyond node_modules, it can filter out build outputs, caches, logs, generated docs, and other nonessential files that do not affect the core content of the source tree.

Alongside this, we introduce CHECKSUM_INCLUDE_PATTERNS for completeness, but it is more narrow in scope. It is useful when only a small, well defined portion of a larger directory is the actual artifact of interest and the rest is generated. Currently, it defaults to the entire extraction output directory, preserving the original behavior before this PR.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Pass the newly added unit tests.

Summary by CodeRabbit

  • New Features

    • Tar and zip extraction tasks now support configurable checksum include/exclude patterns, letting you control which files or directories are considered during checksum validation and computation while retaining sensible defaults.
  • Tests

    • Added verification scenarios that confirm exclude/include scopes behave as expected and that checksum-limited runs correctly skip work when only out-of-scope files change.

@Bill-hbrhbr Bill-hbrhbr requested a review from a team as a code owner April 23, 2026 17:16
@Bill-hbrhbr Bill-hbrhbr requested a review from junhaoliao April 23, 2026 17:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Walkthrough

The tar and zip extraction tasks now accept CHECKSUM_INCLUDE_PATTERNS and CHECKSUM_EXCLUDE_PATTERNS. Checksum validation and computation were changed to use these new configurable include/exclude patterns. New tests were added to verify checksum behaviour for exclude and include scopes during zip extraction.

Changes

Cohort / File(s) Summary
Task definitions
exports/taskfiles/utils/remote.yaml
Added public task inputs CHECKSUM_INCLUDE_PATTERNS and CHECKSUM_EXCLUDE_PATTERNS to download-and-extract-tar and download-and-extract-zip. Updated checksum validation and computation steps to use derived INCLUDE_PATTERNS / EXCLUDE_PATTERNS instead of always targeting the output dir.
Tests for checksum scope
taskfiles/remote/tests.yaml
Added two public test tasks: download-and-extract-zip-test-checksum-exclude (validates exclude-pattern checksum parity) and download-and-extract-zip-test-checksum-include (validates include-scope prevents rework when out-of-scope files change).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and specifically describes the main change: adding checksum path filters to remote download and extract tasks.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's refer to #69 and add tests to cover those parameters

the title should include (resolves #68).

Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for tests to be ported over then this is good to go

@Bill-hbrhbr Bill-hbrhbr changed the title feat(remote): Add checksum path filters for download and extract tasks. feat(remote): Add checksum path filters for download and extract tasks (fixes #68). Apr 28, 2026
@Bill-hbrhbr Bill-hbrhbr linked an issue Apr 28, 2026 that may be closed by this pull request
@Bill-hbrhbr Bill-hbrhbr requested a review from junhaoliao April 28, 2026 12:38
Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest lgtm. just docstring issues

Comment on lines +63 to +64
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "relative to any CHECKSUM_INCLUDE_PATTERNS" wording seems ambiguous. since whether CHECKSUM_INCLUDE_PATTERNS is specified or not, those patterns will be excluded, i think it's fine to remove the wording to avoid confusion?

Suggested change
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns to exclude from the
# checksum computation.

Comment on lines +152 to +153
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns to exclude from the
# checksum computation.

FILE_SHA256: "{{.G_TEST_ZIP_FILE_SHA256}}"

# Test that the checksums from the two cases match
- "diff -q '{{.CHECKSUM_FILE_0}}' '{{.CHECKSUM_FILE_1}}'"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we prefer using cmp in the utils?

Suggested change
- "diff -q '{{.CHECKSUM_FILE_0}}' '{{.CHECKSUM_FILE_1}}'"
- "cmp -s '{{.CHECKSUM_FILE_0}}' '{{.CHECKSUM_FILE_1}}'"

URL: "{{.G_TEST_ZIP_FILE_URL}}"
FILE_SHA256: "{{.G_TEST_ZIP_FILE_SHA256}}"

# Case 1: Eexclude the same paths during extraction, and compute the checksum of the whole
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Case 1: Eexclude the same paths during extraction, and compute the checksum of the whole
# Case 1: Exclude the same paths during extraction, and compute the checksum of the whole

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote utils should pass EXCLUDE_PATTERNS to the checksum tasks.

2 participants