fix: handle adjacent @@ variable tokens in split_words() by hwu71 · Pull Request #15 · binsync/varbert_api

hwu71 · 2026-02-22T23:51:57Z

Fixes #14

Problem

When processing decompiled code where variables appear adjacent without spaces
(e.g., func(a,b,c) — common in Ghidra output), VarBERT silently returns
zero predictions for the entire function.

Root Cause

_process_code_with_text() replaces variable names with @@varname@@id@@
placeholders. When variables are adjacent without whitespace, multiple
placeholders merge into a single space-delimited word:

FUN(@@local_18@@varid_abc@@,@@param_3@@varid_def@@,@@pcVar2@@varid_ghi@@);

split_words() uses re.search() which only returns the first @@
match per word — subsequent adjacent patterns are silently lost. This causes
generate_popular_names() to see a holder/mask count mismatch and discard
all predictions.

Fix

Replace re.search() with re.finditer() in split_words() to extract
all @@ patterns from each word.

Testing

Tested with a minimal Ghidra-decompiled function containing adjacent variables:

Before fix: 0 predictions (bug triggered)
After fix: 7 predictions (all variables renamed correctly)

Existing tests continue to pass.

When variables appear adjacent without spaces in decompiled code (e.g., func(a,b,c)), the @@ placeholder tokens merge into one word. re.search() only matched the first pattern, silently losing the rest and causing a holder/mask count mismatch that discards all predictions. Replace re.search() with re.finditer() to extract all @@ patterns.

hwu71 mentioned this pull request Feb 22, 2026

[BUG] split_words() silently drops adjacent @@ variable tokens → zero predictions #14

Closed

fix broken tests with pin

09d3e8c

mahaloz merged commit 0434324 into binsync:main Feb 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle adjacent @@ variable tokens in split_words()#15

fix: handle adjacent @@ variable tokens in split_words()#15
mahaloz merged 2 commits intobinsync:mainfrom
hwu71:main

hwu71 commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hwu71 commented Feb 22, 2026

Problem

Root Cause

Fix

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants