Skip to content

Fix #4225: Dynamically generate manifest extensions from package handlers#4857

Open
HasTheDev wants to merge 2 commits intoaboutcode-org:developfrom
HasTheDev:fix-manifest-classification
Open

Fix #4225: Dynamically generate manifest extensions from package handlers#4857
HasTheDev wants to merge 2 commits intoaboutcode-org:developfrom
HasTheDev:fix-manifest-classification

Conversation

@HasTheDev
Copy link
Copy Markdown

@HasTheDev HasTheDev commented Mar 22, 2026

Fixes #4225

Replaces the hardcoded _MANIFEST_ENDS list in src/summarycode/classify.py with a dynamically generated set of manifest file extensions from APPLICATION_PACKAGE_DATAFILE_HANDLERS.

Key Updates:

  • Created get_dynamic_manifest_ends() to safely extract path_patterns from registered package handlers.

  • Placed the import of APPLICATION_PACKAGE_DATAFILE_HANDLERS locally inside the function and moved the assignment to the bottom of the file to prevent the circular import issue.

  • Added a fallback to seed the dynamic set with the legacy hardcoded list to ensure 100% backwards compatibility with outdated test suite files (like elm-package.json, project.clj, and metadata).

  • All test_classify.py tests passing locally.

Tasks

  • Reviewed contribution guidelines

  • PR is descriptively titled 📑 and links the original issue above 🔗

  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR

Run tests locally to check for errors.

  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

  • Updated documentation pages (if applicable)

  • Updated CHANGELOG.rst (if applicable)

Signed-off-by: HasTheDev 122232470+HasTheDev@users.noreply.github.com

Signed-off-by: HasTheDev <hassanazam2021@gmail.com>
@HasTheDev HasTheDev force-pushed the fix-manifest-classification branch 3 times, most recently from 3c8e1a2 to 7a79419 Compare March 22, 2026 09:08
…age handlers

Signed-off-by: HasTheDev <122232470+HasTheDev@users.noreply.github.com>
@HasTheDev HasTheDev force-pushed the fix-manifest-classification branch from 7a79419 to 5b358b7 Compare March 22, 2026 10:41
Copy link
Copy Markdown
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HasTheDev thank you for your PR, but your code does not work as intended.
This needs some major changes and restructuring, and also needs to use newer code which was merged and is helpful for this functionality.

"is_script": false,
"is_legal": true,
"is_manifest": false,
"is_manifest": true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, not a manifest.

The path of copyright files should be checked in it's entirety to validate if this is a debian copyright file and only then this can be classified as manifests.

We possibly need to maintain a reject list of datafile handlers which should be ignored while checking if a file is manifest or not, because:

  • sometimes path pattern is not enough and is_datafile() checks happen in functions, which either performs more checks or opens the files


# Seed the set with the original legacy list to appease old, rigid tests
manifest_ends = set([
'package.json',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have this list, this is not dynamically extracted, this just replaces one static list with another, even though you have additional checks.

from commoncode.fileutils import file_base_name

def get_dynamic_manifest_ends():
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how we now have a package manifest patterns index with https://github.com/aboutcode-org/scancode-toolkit/pull/4606/changes#diff-13120b0eb8c69b520b66229f7090c12b1102859a76ddd19b4789de2ed1b8818cR85-R100, can you reuse this to do a fast manifest pattern check directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

is_manifest flag is not accurately determined

2 participants