feat(mzid): DirectPinWriter + -outputFormat 3 (pin) (Q7)#20
Open
feat(mzid): DirectPinWriter + -outputFormat 3 (pin) (Q7)#20
Conversation
Emits a Percolator .pin file directly from the in-memory result list,
bypassing the separate msgf2pin converter. This closes one of the
long-standing MS²Rescore / Percolator integration gaps for bigbio/msgfplus
users.
Column layout (tab-separated):
SpecId Label ScanNr ExpMass CalcMass
RawScore DeNovoScore lnSpecEValue lnEValue IsotopeError
PepLen dM absdM
Charge{min..max} (one-hot over params' charge range)
NumMatchedMainIons ExplainedIonCurrentRatio NTermIonCurrentRatio
CTermIonCurrentRatio MS2IonCurrent IsolationWindowEfficiency
MeanErrorTop7 StdevErrorTop7 MeanRelErrorTop7 StdevRelErrorTop7
Peptide Proteins
Label is +1 when at least one target protein matches, -1 when every
match is a decoy (Percolator uses those as the null distribution).
Peptide is emitted in Percolator's pre.PEPTIDE.post flanking format
with inline mod masses (+/-mass.mmm). Proteins are tab-separated so
Percolator's "read all remaining columns" rule works as-is.
When -addFeatures 1 is off the feature columns are zero-filled rather
than dropped — the column count stays stable across runs so any
downstream config that references a column index keeps working.
Wiring: new "pin" entry (index 3) on the existing -outputFormat
EnumParameter, a SearchParams.writePin() getter, and a call site in
MSGFPlus.runMSGFPlus next to the existing writeTsv / writeMzid blocks.
PIN output file path is outputFile.getPath().replaceAll("\\.mzid$", ".pin").
formatPeptideWithMods is duplicated from DirectTSVWriter for now;
extracting a shared PeptideFormatter is a clean follow-up.
Tests: TestDirectPinWriter (4 cases) covers flag acceptance, the
writePin getter, index stability vs the existing entries, and the
header shape via reflection on writeHeader.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Emits a Percolator
.pinfile directly from the in-memory result list, bypassing the separatemsgf2pinconverter. Closes one of the long-standing MS²Rescore / Percolator integration gaps in the landscape review (Q7 in.claude/investigations/msgfplus_research_report.md§0, and §5 Priority 3).Invocation
Output path is the same as the mzid but with
.pinextension. All other search flags behave identically.Column layout
+1when any target protein matches,-1when every match is a decoy — Percolator uses-1rows as the null distribution, so we emit them rather than drop them.pre.PEPTIDE.postflanking format with inline mod masses (+nnn.mmm/-nnn.mmm) matching MS-GF+'s TSV output.Peptide, so Percolator's "read all remaining columns" rule works as-is.-addFeatures 1is off, so the column count stays stable across runs.Touched
DirectPinWriter.java(new, ~353 lines) — writer + peptide formatter duplicated fromDirectTSVWriter(extracting a sharedPeptideFormatteris a clean follow-up; kept verbatim here so review is self-contained).ParamManager.java— registerspinas enum index 3 on-outputFormat.SearchParams.java— newwritePin()getter.MSGFPlus.java— wire call site inrunMSGFPlusnext to the existingwriteTsv/writeMzidblocks.Test plan
mvn -B verify— 145 tests pass (was 141 ondev; +4 new Q7 cases).TestDirectPinWritercovers flag acceptance, thewritePingetter's flip behaviour, index stability vs pre-existing enum entries (mzid/tsv/both/pin= 0/1/2/3), and header shape (verifying all Percolator-required column names appear).-outputFormat 0/1/2users — the new enum entry is appended at the end, no existing index shifted.Not in this PR
Pr(G|P)/Pr(G|O)as separate columns) — deliberately deferred; requires decomposing the log-ratio score on the hot path (2-3 day effort with PSM@1%FDR parity risk after the PRs perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync #15/perf(scorer): precompute log scores in NewRankScorer #16/feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4) #18 perf stack).-Xmxvs FASTA pre-flight warning) — queued behind PR feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4) #18'sMSGFLogger.formatPeptideWithModsinto a sharedPeptideFormatterused by bothDirectTSVWriterandDirectPinWriter— clean follow-up.🤖 Generated with Claude Code