Skip to content

feat(mzid): DirectPinWriter + -outputFormat 3 (pin) (Q7)#20

Open
ypriverol wants to merge 1 commit intodevfrom
feat/direct-pin-writer
Open

feat(mzid): DirectPinWriter + -outputFormat 3 (pin) (Q7)#20
ypriverol wants to merge 1 commit intodevfrom
feat/direct-pin-writer

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

Emits a Percolator .pin file directly from the in-memory result list, bypassing the separate msgf2pin converter. Closes one of the long-standing MS²Rescore / Percolator integration gaps in the landscape review (Q7 in .claude/investigations/msgfplus_research_report.md §0, and §5 Priority 3).

Invocation

java -Xmx8G -jar MSGFPlus.jar -s spectra.mzML -d db.fasta -outputFormat 3 ...

Output path is the same as the mzid but with .pin extension. All other search flags behave identically.

Column layout

SpecId  Label  ScanNr  ExpMass  CalcMass
RawScore  DeNovoScore  lnSpecEValue  lnEValue  IsotopeError
PepLen  dM  absdM
Charge{min..max}           (one-hot over the search's charge range)
NumMatchedMainIons  ExplainedIonCurrentRatio  NTermIonCurrentRatio
CTermIonCurrentRatio  MS2IonCurrent  IsolationWindowEfficiency
MeanErrorTop7  StdevErrorTop7  MeanRelErrorTop7  StdevRelErrorTop7
Peptide  Proteins
  • Label: +1 when any target protein matches, -1 when every match is a decoy — Percolator uses -1 rows as the null distribution, so we emit them rather than drop them.
  • Peptide: Percolator's pre.PEPTIDE.post flanking format with inline mod masses (+nnn.mmm / -nnn.mmm) matching MS-GF+'s TSV output.
  • Proteins: tab-separated after Peptide, so Percolator's "read all remaining columns" rule works as-is.
  • Additional-feature columns are zero-filled when -addFeatures 1 is off, so the column count stays stable across runs.

Touched

  • DirectPinWriter.java (new, ~353 lines) — writer + peptide formatter duplicated from DirectTSVWriter (extracting a shared PeptideFormatter is a clean follow-up; kept verbatim here so review is self-contained).
  • ParamManager.java — registers pin as enum index 3 on -outputFormat.
  • SearchParams.java — new writePin() getter.
  • MSGFPlus.java — wire call site in runMSGFPlus next to the existing writeTsv / writeMzid blocks.

Test plan

  • mvn -B verify — 145 tests pass (was 141 on dev; +4 new Q7 cases).
  • TestDirectPinWriter covers flag acceptance, the writePin getter's flip behaviour, index stability vs pre-existing enum entries (mzid/tsv/both/pin = 0/1/2/3), and header shape (verifying all Percolator-required column names appear).
  • No behaviour change for existing -outputFormat 0/1/2 users — the new enum entry is appended at the end, no existing index shifted.

Not in this PR

🤖 Generated with Claude Code

Emits a Percolator .pin file directly from the in-memory result list,
bypassing the separate msgf2pin converter. This closes one of the
long-standing MS²Rescore / Percolator integration gaps for bigbio/msgfplus
users.

Column layout (tab-separated):
  SpecId Label ScanNr ExpMass CalcMass
  RawScore DeNovoScore lnSpecEValue lnEValue IsotopeError
  PepLen dM absdM
  Charge{min..max}          (one-hot over params' charge range)
  NumMatchedMainIons ExplainedIonCurrentRatio NTermIonCurrentRatio
  CTermIonCurrentRatio MS2IonCurrent IsolationWindowEfficiency
  MeanErrorTop7 StdevErrorTop7 MeanRelErrorTop7 StdevRelErrorTop7
  Peptide Proteins

Label is +1 when at least one target protein matches, -1 when every
match is a decoy (Percolator uses those as the null distribution).
Peptide is emitted in Percolator's pre.PEPTIDE.post flanking format
with inline mod masses (+/-mass.mmm). Proteins are tab-separated so
Percolator's "read all remaining columns" rule works as-is.

When -addFeatures 1 is off the feature columns are zero-filled rather
than dropped — the column count stays stable across runs so any
downstream config that references a column index keeps working.

Wiring: new "pin" entry (index 3) on the existing -outputFormat
EnumParameter, a SearchParams.writePin() getter, and a call site in
MSGFPlus.runMSGFPlus next to the existing writeTsv / writeMzid blocks.
PIN output file path is outputFile.getPath().replaceAll("\\.mzid$", ".pin").

formatPeptideWithMods is duplicated from DirectTSVWriter for now;
extracting a shared PeptideFormatter is a clean follow-up.

Tests: TestDirectPinWriter (4 cases) covers flag acceptance, the
writePin getter, index stability vs the existing entries, and the
header shape via reflection on writeHeader.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a004741c-0b53-444c-8144-852ce57c828b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/direct-pin-writer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant