fix(metadata): encode multi-word Scopus queries by WilmerGaspar · Pull Request #5 · slimeslab/ComProScanner

WilmerGaspar · 2026-06-18T03:16:00Z

This PR improves Scopus metadata URL construction for multi-word query terms.

Changes:

Encodes query and special_query using urllib.parse.quote.
Converts spaces into AND before URL encoding to make multi-word searches more explicit.
Keeps the change focused on the metadata query-construction path.

Testing:

Not run locally. This change was prepared through the GitHub web editor.

Related issue:

Related to [QUESTION]: Handling of multi-word keywords in Scopus query construction #4

Encode main and special Scopus query terms after converting spaces to AND for multi-word keyword handling.

codecov-commenter · 2026-06-18T07:32:03Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

aritraroy24 · 2026-06-18T08:02:56Z

Hi @WilmerGaspar, thanks for working on the issue!

I recommend a modification to ensure multi-word main_keyword handling throughout the entire workflow, not just in the Scopus query. As we discussed in #4, except for the less accurate Scopus search, nothing will break. However, replacing the space with _ right after receiving the keyword gives us a uniform approach for file/table naming, which is cleaner from a scripting perspective. We can then replace _ back with a space wherever the original form is needed:

Scopus search
article collection regex matching
data extraction query

Could you extend the PR to cover these cases too, so the fix handles the entire space handling, not just the Scopus search?

WilmerGaspar · 2026-06-19T04:31:35Z

Thanks, Aritra. I extended the PR to cover the workflow-wide space handling more consistently.

The update now:

Normalizes main_property_keyword into an underscore-safe internal form for file/table/path naming.
Converts _ back to spaces before Scopus query construction.
Uses the readable search keyword form in the data extraction query.

I also reviewed the article processor path. From what I saw, self.keyword is mainly used for paths, CSV/database naming, and related outputs, while the article text matching relies on property_keywords. So I kept the matching logic unchanged and focused the normalization on the main keyword flow.

Please let me know if you’d prefer any part handled differently.

aritraroy24 · 2026-06-22T09:13:54Z

Hi @WilmerGaspar, sorry for the delay in replying. Got stuck with some other work.

Yeah. You were right about the property_keywords. So, we don't need to modify anything in the regex matching.

However, the tests are failing due to wrong indentation in the scripts. Maybe due to the GitHub web editor, both scripts now have indentation issues in the following functions:

_construct_url in the fetch_metadata.py
__init__ in the comproscanner.py

WilmerGaspar · 2026-06-23T03:06:25Z

No problem at all, Aritra. I completely understand. Thanks for taking the time to review it.

It’s a pleasure to help with the project. I corrected the indentation issues in comproscanner.py and fetch_metadata.py and pushed the update to the PR.

aritraroy24 · 2026-06-23T09:47:27Z

Hi @WilmerGaspar,

The indentation error is still there for the __init__ method in comproscanner.py‎:

Current code:

class ComProScanner:
        def __init__(self, main_property_keyword: str = None):
        if main_property_keyword is None:
            raise ValueErrorHandler(
                "Please provide a main property keyword to proceed."
            )

        self.main_property_keyword = main_property_keyword.replace(" ", "_")
        self.main_property_search_keyword = self.main_property_keyword.replace("_", " ")

Should be:

class ComProScanner:
    def __init__(self, main_property_keyword: str = None):
        if main_property_keyword is None:
            raise ValueErrorHandler(
                "Please provide a main property keyword to proceed."
            )

        self.main_property_keyword = main_property_keyword.replace(" ", "_")
        self.main_property_search_keyword = self.main_property_keyword.replace("_", " ")

WilmerGaspar · 2026-06-23T12:49:00Z

Thanks again, Aritra. I also corrected the indentation in _construct_url inside fetch_metadata.py and pushed the update. The workflow is now waiting for approval to run again.

aritraroy24 · 2026-06-23T13:56:55Z

Hi @WilmerGaspar, the test is failing again with an AttributeError: "'FetchMetadata' object has no attribute '_construct_url'".

Reason: _construct_url became a function under __init__ instead of FetchMetadata class.

fix(metadata): encode multi-word Scopus queries

e9fddc9

Encode main and special Scopus query terms after converting spaces to AND for multi-word keyword handling.

fix(metadata): normalize multi-word keyword handling

dd605f7

fix: correct indentation in keyword handling changes

6c2e4ac

aritraroy24 assigned WilmerGaspar Jun 23, 2026

WilmerGaspar added 2 commits June 23, 2026 06:24

fix: correct ComProScanner init indentation

96ce3fb

fix: correct metadata URL construction indentation

2fbe7f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(metadata): encode multi-word Scopus queries#5

fix(metadata): encode multi-word Scopus queries#5
WilmerGaspar wants to merge 5 commits into
slimeslab:mainfrom
WilmerGaspar:fix/multi-word-keyword-handling

WilmerGaspar commented Jun 18, 2026

Uh oh!

codecov-commenter commented Jun 18, 2026

Uh oh!

aritraroy24 commented Jun 18, 2026

Uh oh!

WilmerGaspar commented Jun 19, 2026

Uh oh!

aritraroy24 commented Jun 22, 2026

Uh oh!

WilmerGaspar commented Jun 23, 2026

Uh oh!

aritraroy24 commented Jun 23, 2026

Uh oh!

WilmerGaspar commented Jun 23, 2026

Uh oh!

aritraroy24 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WilmerGaspar commented Jun 18, 2026

Uh oh!

codecov-commenter commented Jun 18, 2026

Codecov Report

Uh oh!

aritraroy24 commented Jun 18, 2026

Uh oh!

WilmerGaspar commented Jun 19, 2026

Uh oh!

aritraroy24 commented Jun 22, 2026

Uh oh!

WilmerGaspar commented Jun 23, 2026

Uh oh!

aritraroy24 commented Jun 23, 2026

Current code:

Uh oh!

WilmerGaspar commented Jun 23, 2026

Uh oh!

aritraroy24 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants