Skip to content

First bigbio release of msgf+#14

Open
ypriverol wants to merge 45 commits intomasterfrom
dev
Open

First bigbio release of msgf+#14
ypriverol wants to merge 45 commits intomasterfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

No description provided.

ypriverol and others added 30 commits April 13, 2026 07:51
…Java 17

- Change skipTests from true to false so mvn test actually runs
- Update maven-compiler-plugin source/target from 1.8 to 17 (matches runtime)
- Add missing compile dependencies: jmzml 1.7.11, fastutil 8.5.12,
  slf4j-api 1.7.36, logback-classic 1.2.12, commons-io 2.15.1
  (master code references these classes but they were not declared)
- @ignore TestMzML test that requires Windows-specific DMS files

Result: 120 tests run, 53 active, 67 skipped, 0 failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…shold

In MZIdentMLGen.addSpectrumIdentificationResults(), change `break` to
`continue` when a match has DeNovoScore below the minimum threshold.
The `break` was incorrectly stopping emission of all subsequent matches
for that spectrum, silently dropping valid PSMs from the mzid output.

Also add null safety check for spectrum index lookup — if a spectrum
index is not found in the spectrum file, log a warning and skip
instead of throwing a NullPointerException.

Add TestMZIdentMLGen with two integration tests:
- testMzidScoreCompleteness: runs MSGF+ search, verifies every SII
  has all 4 score CVParams (RawScore, DeNovoScore, SpecEValue, EValue)
- testMzidStructuralValidity: verifies output mzid has required
  mzIdentML structure elements

Closes MSGFPlus#157

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add new -msLevel CLI parameter to filter spectra by MS level.
Accepts single value (e.g., -msLevel 2) or comma-separated range
(e.g., -msLevel 2,3). Default is 2 (MS2 only).

Changes:
- ParamManager: add MS_LEVEL enum and registration
- IntRangeParameter: enable single-value parsing, fix typo
- SearchParams: add minMSLevel/maxMSLevel fields
- SpecKey: filter spectra by MS level in getSpecKeyList()
- SpectraAccessor: add setMSLevelRange(), wire to parsers
- MzMLAdapter/MzXMLSpectraMap: fix maxMSLevel to be inclusive
- MSGFPlus/MSGFDB/MSGFDBLib: wire MS level parameters
- pom.xml: remove fastutil shade filter (jmzml 1.7.11 needs full fastutil)

Tests: TestIntRangeParameter (9 tests), TestMSLevelFiltering (6 tests)

Benchmark (TMT 1.1GB, TDA):
  Baseline: 1245s, 6654 PSMs@1%FDR
  -msLevel 2: 957s (-23%), 6936 PSMs@1%FDR (+4.2%)

Closes MSGFPlus#159

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(MSGFPlus#159): add -msLevel parameter for MS level filtering
fix(MSGFPlus#157): preserve PSM scores when DeNovoScore is below threshold
fix: enable test suite and fix broken build dependencies
Remove standalone scripts, legacy tools, and unused classes that are
not referenced by the core MSGF+ search pipeline, reducing codebase
by ~22,000 lines.

Deleted entire packages:
- ims/ (9 files) — legacy IMS utilities
- ipa/ (5 files) — unused isotope pattern analysis
- msgf2d/ (8 files) — abandoned 2D scoring experiment
- msdictionary/ (7 files) — unused genome dictionary tool
- mstag/ (3 files) — unused sequence tagging
- scripts/ (6 files) — standalone CLI utilities
- msutil/test/ (3 files) — misplaced test classes
- msgf/test/ (2 files) — legacy test stubs
- msgf/analysis/ (1 file) — unused ROC generator

Cleaned mixed packages:
- misc/: removed 59 standalone scripts, kept 5 core utilities
- msgf/: removed 6 unused graph/scoring classes
- msutil/: removed 9 unused filter/annotation classes
- msdbsearch/: removed 4 standalone DB tools
- parser/: removed 9 legacy format parsers (InsPecT, Mascot, etc.)
- ui/: removed 6 legacy entry points (MSGF, MSGFLib, etc.)
- mzid/: removed 1 unused adapter stub
- msscorer/: removed 1 unused stats class
- suffixarray/: removed 1 unused mass array class

Also removed dead test methods and cleaned dangling imports.
Tests: 119 run, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite README.md:
- Full parameter reference tables covering all 30+ flags organized
  by category (core search, fragmentation, enzyme, filtering, etc.)
- Quick start examples for basic and TMT searches
- Modification file format documentation with examples
- Build-from-source instructions
- Updated requirements to Java 17+

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add parameter docs to README and CI/CD workflow
Remove dead code: 150 unused classes, -22K lines
Write search results directly to TSV from in-memory objects, bypassing
mzIdentML serialization. Output is column-identical to MzIDToTsv
(verified by diff on test.mgf search). This avoids generating large
.mzid files when only TSV is needed downstream (e.g. OpenMS
MSGFPlusAdapter, Percolator).

- New DirectTSVWriter class with same score/protein/mod logic as
  MZIdentMLGen but streaming tab-delimited output
- New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both
- Includes fixed + variable mods, MGF Title column, decoy filtering
- Backwards compatible: default remains mzid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now
includes all PSMFeatureFinder columns needed for Percolator:
ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio,
MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency,
NumMatchedMainIons, and all error statistics (MeanError/StdevError
for All and Top7, both absolute and relative).

These features were previously only available as UserParams in mzid
and were not extracted by OpenMS's addMSGFFeatures() — now they are
directly accessible as TSV columns.

The peptide modification format (M+15.995) is already compatible with
OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms
it to bracket notation M[+15.995] for AASequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the jmzml JAXB-based MzMLUnmarshaller with a lightweight StAX
streaming parser that extracts only the 11 fields MSGF+ needs. The new
parser builds a spectrum index in a single pass, then preloads all
spectra into memory on first random access, eliminating repeated XML
parsing during the search phase.

Benchmark (TMT 1.1GB mzML, target-decoy, 4 threads):
- Wall time: 957s -> 853s (-10.9%)
- PSMs at 1% FDR: 6,936 (unchanged)
- Score completeness: 100% (unchanged)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port

- Remove jmzml (JAXB-based mzML parser) dependency from pom.xml
- Delete old jmzml-dependent classes: MzMLAdapter, MzMLSpectraMap,
  MzMLSpectraIterator, SpectrumConverter
- Add referenceableParamGroupRef resolution to StaxMzMLParser: builds
  a map of param groups during index pass, resolves refs during
  spectrum parsing (critical for files that define polarity, MS level,
  etc. in referenceable groups)
- Move turnOffLogs() utility to StaxMzMLParser, update all callers
- Keep fastutil dependency (needed by jmzidml at runtime)

JAR size reduced from 39.5MB to 38MB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jmzReader (uk.ac.ebi.pride.tools:jmzreader:2.0.6) had zero imports
anywhere in the codebase — a dead dependency from earlier development.
All spectrum file format parsing uses custom implementations:
mzML (StaxMzMLParser), mzXML (embedded jrap/stax), MGF/MS2/PKL
(custom parsers).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove mzXML file format support entirely:
- Delete embedded jrap/stax library (20 files, ~5,800 lines)
- Delete MzXMLSpectraMap, MzXMLSpectraIterator, MzXMLToMgfConverter
- Delete MzXMLToMgf utility and mzXML test resources (38MB)
- Remove MZXML from SpecFileFormat enum, SpectraAccessor, ParamManager
- Update misc/scripts/ui classes to remove mzXML code paths

mzXML is a legacy format superseded by mzML. Users with mzXML files
can convert to mzML using msconvert (ProteoWizard).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- StaxMzMLParser: use ConcurrentHashMap for thread-safe spectrum cache,
  fix class-level doc (preload-all, not bounded LRU), check index before
  preloading, propagate exceptions instead of returning null
- StaxMzMLSpectraIterator: throw NoSuchElementException when exhausted
- SpectraAccessor: throw exception instead of System.exit(-1),
  validate specFormat is non-null in constructor
- SelectSpectra: update stale .mzXML reference to .mzML
- pom.xml: fix duplicate <manifest>, remove stale comments, note
  fastutil is required by jmzidentml at runtime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write search results directly to TSV from in-memory objects, bypassing
mzIdentML serialization. Output is column-identical to MzIDToTsv
(verified by diff on test.mgf search). This avoids generating large
.mzid files when only TSV is needed downstream (e.g. OpenMS
MSGFPlusAdapter, Percolator).

- New DirectTSVWriter class with same score/protein/mod logic as
  MZIdentMLGen but streaming tab-delimited output
- New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both
- Includes fixed + variable mods, MGF Title column, decoy filtering
- Backwards compatible: default remains mzid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now
includes all PSMFeatureFinder columns needed for Percolator:
ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio,
MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency,
NumMatchedMainIons, and all error statistics (MeanError/StdevError
for All and Top7, both absolute and relative).

These features were previously only available as UserParams in mzid
and were not extracted by OpenMS's addMSGFFeatures() — now they are
directly accessible as TSV columns.

The peptide modification format (M+15.995) is already compatible with
OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms
it to bracket notation M[+15.995] for AASequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ConvertToMgf-based tests (class removed in PR #7) with
StaxMzMLParser and SpectraAccessor mzML parsing tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines
perf: replace jmzml JAXB parser with StAX-based mzML reader
* chore: add CI/release packaging and benchmark scaffolding

Split infra and repository maintenance updates into a dedicated reviewable change set, including workflow automation, Docker packaging, benchmark scripts/docs, and project documentation updates. Exclude large local benchmark artifacts and keep this PR focused on non-hot-path code organization and release hygiene.

Made-with: Cursor

* chore: keep benchmark folder local-only

Remove benchmark scripts/docs from this branch and ignore the entire benchmark directory so local benchmarking assets do not appear in review PRs.

Made-with: Cursor

* docs: keep single canonical primitives plan

Fold memory-reduction guidance into the balanced primitives plan and remove the old duplicate plan file so review and maintenance use one canonical document.

Made-with: Cursor

* chore: narrow PR1 plans to scope-only docs

Remove unrelated strategy and optimization plan documents from PR1 so this branch stays focused on infra/packaging cleanup. Keep only the plans index file in this PR.

Made-with: Cursor

* chore: remove legacy ZippedReleases folder

Delete the obsolete Windows release helper scripts and reference files under ZippedReleases from the repository.

Made-with: Cursor

* chore: remove legacy extlib dependency jar

Delete the obsolete jrap/stax legacy jar under extlib as part of repository cleanup.

Made-with: Cursor

* fix: address copilot review feedback for PR11

Align docs with actual supported legacy formats, update release pipeline to build from tag version with tests, and fix Docker build JDK requirement.

Made-with: Cursor

* chore: minor packaging/docs hygiene for PR1

Normalize ignore files, shrink Docker build context, align agent README with dev/CI, and clarify release workflow step naming.

Made-with: Cursor

* docs: trim examples folder to small referenced artifacts

Remove duplicate Tryp_Pig_Bov DB/index copies (tests use src/test/resources),
drop large unlinked Excel/PNG teaching files, and add docs/examples/README.md
so the directory purpose is obvious. Link the index from the main README.

Made-with: Cursor

* chore: remove IntelliJ IDEA tips screenshots from docs

Made-with: Cursor

* docs: replace legacy HTML manuals with Markdown

Convert docs/*.html to GitHub-flavored Markdown (pandoc), fix internal links,
add docs/README.md as the documentation index, and remove unused style.css.

Made-with: Cursor

* docs: strip leftover HTML span wrappers from converted Markdown

Made-with: Cursor
Add a workflow-dispatch benchmark pipeline on a fixed self-hosted runner profile, with public-data download, metrics emission, and baseline TSV comparison under benchmark/ci/PXD001819 for future dataset expansion.

Made-with: Cursor
Use uppercase PXD001819 naming in workflow-visible labels/artifacts and update README to state mzXML is not available in this fork.

Made-with: Cursor
- run_ci.sh: count only opening <SpectrumIdentificationItem> tags for
  sii_count (prior substring match double-counted closing tags and
  picked up SpectrumIdentificationItemRef)
- run_ci.sh: always emit peak_rss_kb and cpu_percent (NA when GNU time
  does not expose them) so metrics file format is consistent
- compare_metrics.py: support an `optional` column; optional missing/NA
  metrics warn instead of failing CI
- baseline.tsv: add optional column, mark peak_rss_kb optional, fix
  ubuntu-latest note to reference the self-hosted runner, widen
  sii_count floor to match the de-duplicated count
- README pointers: update stale references to a non-existent
  benchmark/run_pxd001819_benchmark.sh script
- benchmark/README.md: describe the actual committed CI scaffold
  instead of an uncommitted local harness layout

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- extract_metrics.py: stream-parse mzIdentML with ElementTree.iterparse
  so SII counting and PSM 1% FDR counting no longer rely on line-shaped
  regex matches over XML
- run_ci.sh: use a bash array for SEARCH_ARGS (safe against future
  flags with spaces), atomic .part downloads, validate cached gzip,
  default MSGFPLUS_THREADS to 8 to match the workflow, drop the
  always-zero java_exit metric, and emit integer wall_time_sec
- workflow: pin Python via actions/setup-python@v5 so self-hosted
  runners have a known 3.11 interpreter for the helper scripts
- compare_metrics.py: add test_compare_metrics.py covering
  in-range pass, out-of-range fail, missing required/optional,
  NA, non-numeric, and empty-range rows (7 tests, all passing)
- .gitignore: drop redundant benchmark/** patterns (already covered
  by benchmark/* + ci/ allowlist); add __pycache__/ and *.pyc
- docs: describe new helper and test scripts in both READMEs
fix(benchmark): harden PXD001819 scaffold per review feedback
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

Important

Review skipped

Too many files!

This PR contains 282 files, which is 132 over the limit of 150.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1977927b-c8cb-498a-9833-14e5a459cdb3

📥 Commits

Reviewing files that changed from the base of the PR and between f0bc79e and 1bd9ff2.

⛔ Files ignored due to path filters (18)
  • benchmark/ci/PXD001819/baseline.tsv is excluded by !**/*.tsv
  • docs/IntelliJ_IDEA_Tips/InvalidateCaches.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging1.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging2_TestClass.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging2_TestMethod.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging3.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging4.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging5.png is excluded by !**/*.png
  • docs/IntelliJ_IDEA_Tips/UnitTestDebugging6.png is excluded by !**/*.png
  • docs/examples/Find_best_peptide_for_each_scan.xlsx is excluded by !**/*.xlsx
  • docs/examples/IsotopeRangeExample1.xlsx is excluded by !**/*.xlsx
  • docs/examples/IsotopeRangeExample2.xlsx is excluded by !**/*.xlsx
  • docs/examples/MassErrorHistogram_QC_Mam_19_01-run3_03Jan20.png is excluded by !**/*.png
  • docs/examples/MassErrorHistogram_QC_Mam_19_01-run3_03Jan20.xlsx is excluded by !**/*.xlsx
  • docs/examples/PepQValue_Computation_Mockup.xlsx is excluded by !**/*.xlsx
  • docs/examples/QValue_Computation.xlsx is excluded by !**/*.xlsx
  • docs/examples/QValue_Computation_Excerpt.xlsx is excluded by !**/*.xlsx
  • extlib/jrap_StAX_v5.2.jar is excluded by !**/*.jar
📒 Files selected for processing (282)
  • .claude/CLAUDE.md
  • .claude/investigations/001-mgf-scan-number-extraction-failure.md
  • .claude/investigations/002-evalue-target-decoy-leakage-to-percolator.md
  • .claude/investigations/README.md
  • .claude/plans/README.md
  • .claude/skills/README.md
  • .claude/skills/score-output-safety.md
  • .dockerignore
  • .github/workflows/benchmark-pxd001819.yml
  • .github/workflows/ci.yml
  • .github/workflows/release.yml
  • .gitignore
  • Dockerfile
  • README.md
  • ZippedReleases/Distribute_Program.bat
  • ZippedReleases/Distribute_Program_Use_Proto-2.bat
  • ZippedReleases/ReferenceFiles/BuildSuffixArray.bat
  • ZippedReleases/ReferenceFiles/Convert_mzid_to_tsv.bat
  • ZippedReleases/ReferenceFiles/HowTo_Run_MSGFPlus.txt
  • ZippedReleases/ReferenceFiles/MSGFPlus_Mods1.txt
  • ZippedReleases/ReferenceFiles/MSGFPlus_Mods2.txt
  • ZippedReleases/ReferenceFiles/MSGFPlus_Mods3.txt
  • ZippedReleases/ReferenceFiles/Start_MSGFPlus.bat
  • ZippedReleases/ReferenceFiles/Syntax.txt
  • ZippedReleases/ZipFilesForRelease.bat
  • benchmark/README.md
  • benchmark/ci/PXD001819/compare_metrics.py
  • benchmark/ci/PXD001819/extract_metrics.py
  • benchmark/ci/PXD001819/run_ci.sh
  • benchmark/ci/PXD001819/test_compare_metrics.py
  • benchmark/ci/README.md
  • docs/BuildSA.html
  • docs/BuildSA.md
  • docs/Changelog.html
  • docs/Changelog.md
  • docs/IsobaricLabeling.md
  • docs/MS-GFDB.html
  • docs/MS-GFDB.md
  • docs/MSGFDB_ModFile.html
  • docs/MSGFDB_ModFile.md
  • docs/MSGFPlus.html
  • docs/MSGFPlus.md
  • docs/MzidToTsv.html
  • docs/MzidToTsv.md
  • docs/README.md
  • docs/ScoringParamGen.html
  • docs/ScoringParamGen.md
  • docs/Troubleshooting.md
  • docs/examples/IndexFasta.bat
  • docs/examples/README.md
  • docs/examples/Tryp_Pig_Bov.canno
  • docs/examples/Tryp_Pig_Bov.cnlcp
  • docs/examples/Tryp_Pig_Bov.csarr
  • docs/examples/Tryp_Pig_Bov.cseq
  • docs/examples/Tryp_Pig_Bov.fasta
  • docs/index.html
  • docs/style.css
  • pom.xml
  • src/main/java/edu/ucsd/msjava/ims/DtaToMSGFInput.java
  • src/main/java/edu/ucsd/msjava/ims/DtaToMSGFInputDB.java
  • src/main/java/edu/ucsd/msjava/ims/GetTheBestPerPeptide.java
  • src/main/java/edu/ucsd/msjava/ims/GetTheBestPerScan.java
  • src/main/java/edu/ucsd/msjava/ims/MaskSpectra.java
  • src/main/java/edu/ucsd/msjava/ims/Misc.java
  • src/main/java/edu/ucsd/msjava/ims/OptimizeCE.java
  • src/main/java/edu/ucsd/msjava/ims/SplitDta.java
  • src/main/java/edu/ucsd/msjava/ims/Summarize.java
  • src/main/java/edu/ucsd/msjava/ipa/Feature.java
  • src/main/java/edu/ucsd/msjava/ipa/IPA.java
  • src/main/java/edu/ucsd/msjava/ipa/MS1SpectraMap.java
  • src/main/java/edu/ucsd/msjava/ipa/MSGFPlusResultSet.java
  • src/main/java/edu/ucsd/msjava/ipa/PSM.java
  • src/main/java/edu/ucsd/msjava/misc/AgilentQTOF.java
  • src/main/java/edu/ucsd/msjava/misc/AnnotatedMgfToMSGFInput.java
  • src/main/java/edu/ucsd/msjava/misc/AnnotatedSpecGenerator.java
  • src/main/java/edu/ucsd/msjava/misc/CIDETDPairs.java
  • src/main/java/edu/ucsd/msjava/misc/CalcFastaDBSize.java
  • src/main/java/edu/ucsd/msjava/misc/ChargePrediction.java
  • src/main/java/edu/ucsd/msjava/misc/Chores.java
  • src/main/java/edu/ucsd/msjava/misc/Clauser.java
  • src/main/java/edu/ucsd/msjava/misc/CompGraphPaper.java
  • src/main/java/edu/ucsd/msjava/misc/CompactSATest.java
  • src/main/java/edu/ucsd/msjava/misc/CompareSearchResults.java
  • src/main/java/edu/ucsd/msjava/misc/CompositionFirst.java
  • src/main/java/edu/ucsd/msjava/misc/ControlNew.java
  • src/main/java/edu/ucsd/msjava/misc/ConvertToMgf.java
  • src/main/java/edu/ucsd/msjava/misc/CountID.java
  • src/main/java/edu/ucsd/msjava/misc/CountPSMs.java
  • src/main/java/edu/ucsd/msjava/misc/CountSequestIDs.java
  • src/main/java/edu/ucsd/msjava/misc/DatToTxt.java
  • src/main/java/edu/ucsd/msjava/misc/Deconvolution.java
  • src/main/java/edu/ucsd/msjava/misc/FileFilter.java
  • src/main/java/edu/ucsd/msjava/misc/FilteringEfficiency.java
  • src/main/java/edu/ucsd/msjava/misc/FindPSMIntersection.java
  • src/main/java/edu/ucsd/msjava/misc/GetProteinLength.java
  • src/main/java/edu/ucsd/msjava/misc/GetSearchParams.java
  • src/main/java/edu/ucsd/msjava/misc/HCDCIDETD.java
  • src/main/java/edu/ucsd/msjava/misc/HeckPercolator.java
  • src/main/java/edu/ucsd/msjava/misc/HeckRevision.java
  • src/main/java/edu/ucsd/msjava/misc/HeckWhole.java
  • src/main/java/edu/ucsd/msjava/misc/IPA.java
  • src/main/java/edu/ucsd/msjava/misc/IPRGStudy.java
  • src/main/java/edu/ucsd/msjava/misc/ISBETDAnalysis.java
  • src/main/java/edu/ucsd/msjava/misc/LibraryScripts.java
  • src/main/java/edu/ucsd/msjava/misc/MS2ToMgf.java
  • src/main/java/edu/ucsd/msjava/misc/MSGFDBToInspect.java
  • src/main/java/edu/ucsd/msjava/misc/MSGFDBToQSpec.java
  • src/main/java/edu/ucsd/msjava/misc/MSGFLogger.java
  • src/main/java/edu/ucsd/msjava/misc/MSGFPlusPaper.java
  • src/main/java/edu/ucsd/msjava/misc/MakePrefixDB.java
  • src/main/java/edu/ucsd/msjava/misc/MassCalc.java
  • src/main/java/edu/ucsd/msjava/misc/MergeTargetDecoyFiles.java
  • src/main/java/edu/ucsd/msjava/misc/MiscScripts.java
  • src/main/java/edu/ucsd/msjava/misc/MultiThreadExercise.java
  • src/main/java/edu/ucsd/msjava/misc/MzXMLToMgf.java
  • src/main/java/edu/ucsd/msjava/misc/PEMMRProcessor.java
  • src/main/java/edu/ucsd/msjava/misc/ParamToTxt.java
  • src/main/java/edu/ucsd/msjava/misc/PepIdxToFasta.java
  • src/main/java/edu/ucsd/msjava/misc/PhosAnalysis.java
  • src/main/java/edu/ucsd/msjava/misc/PreprocessSpec.java
  • src/main/java/edu/ucsd/msjava/misc/RunMSGFDBOnGrid.java
  • src/main/java/edu/ucsd/msjava/misc/RunManifestWriter.java
  • src/main/java/edu/ucsd/msjava/misc/RunOMSSAOnCCMS.java
  • src/main/java/edu/ucsd/msjava/misc/SpectraSTToMSGFInput.java
  • src/main/java/edu/ucsd/msjava/misc/SpectraSTToTSV.java
  • src/main/java/edu/ucsd/msjava/misc/SplitFasta.java
  • src/main/java/edu/ucsd/msjava/misc/SplitMgf.java
  • src/main/java/edu/ucsd/msjava/misc/SuffixArrayTest.java
  • src/main/java/edu/ucsd/msjava/misc/SwedCAD.java
  • src/main/java/edu/ucsd/msjava/misc/TopDownAnalysis.java
  • src/main/java/edu/ucsd/msjava/misc/TrainScoringParameters.java
  • src/main/java/edu/ucsd/msjava/misc/VennDiagram.java
  • src/main/java/edu/ucsd/msjava/misc/Zubarev.java
  • src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java
  • src/main/java/edu/ucsd/msjava/msdbsearch/FilterDatabase.java
  • src/main/java/edu/ucsd/msjava/msdbsearch/MakePairedSpectra.java
  • src/main/java/edu/ucsd/msjava/msdbsearch/ReverseLibDB.java
  • src/main/java/edu/ucsd/msjava/msdbsearch/SearchParams.java
  • src/main/java/edu/ucsd/msjava/msdbsearch/ShuffleDB.java
  • src/main/java/edu/ucsd/msjava/msdictionary/Codon.java
  • src/main/java/edu/ucsd/msjava/msdictionary/GenomeLocator.java
  • src/main/java/edu/ucsd/msjava/msdictionary/GenomeSplitter.java
  • src/main/java/edu/ucsd/msjava/msdictionary/GenomeTranslator.java
  • src/main/java/edu/ucsd/msjava/msdictionary/MSDicLauncher.java
  • src/main/java/edu/ucsd/msjava/msdictionary/ProteinLocator.java
  • src/main/java/edu/ucsd/msjava/msdictionary/TestMSDictionary.java
  • src/main/java/edu/ucsd/msjava/msgf/AminoAcidGraph.java
  • src/main/java/edu/ucsd/msjava/msgf/DeNovoSequencer.java
  • src/main/java/edu/ucsd/msjava/msgf/GenericDeNovoGraph.java
  • src/main/java/edu/ucsd/msjava/msgf/LengthPredictor.java
  • src/main/java/edu/ucsd/msjava/msgf/PercolatorAdapter.java
  • src/main/java/edu/ucsd/msjava/msgf/PrimitiveAminoAcidGraph.java
  • src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunction.java
  • src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunctionGroup.java
  • src/main/java/edu/ucsd/msjava/msgf/ReachableNode.java
  • src/main/java/edu/ucsd/msjava/msgf/ScoredSpectrumSumHP.java
  • src/main/java/edu/ucsd/msjava/msgf/SpectrumGraphNode.java
  • src/main/java/edu/ucsd/msjava/msgf/analysis/ROCGenerator.java
  • src/main/java/edu/ucsd/msjava/msgf/test/MSGFTest.java
  • src/main/java/edu/ucsd/msjava/msgf/test/MSGFValidation.java
  • src/main/java/edu/ucsd/msjava/msgf2d/BacktrackPointer2D.java
  • src/main/java/edu/ucsd/msjava/msgf2d/BacktrackTable2D.java
  • src/main/java/edu/ucsd/msjava/msgf2d/CombinePairedSpectra.java
  • src/main/java/edu/ucsd/msjava/msgf2d/GeneratingFunction2D.java
  • src/main/java/edu/ucsd/msjava/msgf2d/ScoreBound2D.java
  • src/main/java/edu/ucsd/msjava/msgf2d/ScoreDist2D.java
  • src/main/java/edu/ucsd/msjava/msgf2d/ScoreDistMerged.java
  • src/main/java/edu/ucsd/msjava/msgf2d/TestMSGF2D.java
  • src/main/java/edu/ucsd/msjava/msscorer/ListStat.java
  • src/main/java/edu/ucsd/msjava/msscorer/NewRankScorer.java
  • src/main/java/edu/ucsd/msjava/msscorer/NewScorerFactory.java
  • src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGenerator.java
  • src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGeneratorWithErrors.java
  • src/main/java/edu/ucsd/msjava/mstag/Tag.java
  • src/main/java/edu/ucsd/msjava/mstag/TagTester.java
  • src/main/java/edu/ucsd/msjava/mstag/Tagger.java
  • src/main/java/edu/ucsd/msjava/msutil/CompositionAASet.java
  • src/main/java/edu/ucsd/msjava/msutil/GappedPeptide.java
  • src/main/java/edu/ucsd/msjava/msutil/IntMassAASet.java
  • src/main/java/edu/ucsd/msjava/msutil/NominalMassAASet.java
  • src/main/java/edu/ucsd/msjava/msutil/ScoringFunction.java
  • src/main/java/edu/ucsd/msjava/msutil/SpecFileFormat.java
  • src/main/java/edu/ucsd/msjava/msutil/SpecKey.java
  • src/main/java/edu/ucsd/msjava/msutil/SpectraAccessor.java
  • src/main/java/edu/ucsd/msjava/msutil/Spectrum.java
  • src/main/java/edu/ucsd/msjava/msutil/SpectrumAnnotator.java
  • src/main/java/edu/ucsd/msjava/msutil/SpectrumRecalibrator.java
  • src/main/java/edu/ucsd/msjava/msutil/SpectrumTester.java
  • src/main/java/edu/ucsd/msjava/msutil/TopNFilter.java
  • src/main/java/edu/ucsd/msjava/msutil/test/MzXMLSpectraIteratorTest.java
  • src/main/java/edu/ucsd/msjava/msutil/test/MzXMLSpectraMapTest.java
  • src/main/java/edu/ucsd/msjava/msutil/test/SpectraTest.java
  • src/main/java/edu/ucsd/msjava/mzid/DirectTSVWriter.java
  • src/main/java/edu/ucsd/msjava/mzid/MZIdentMLGen.java
  • src/main/java/edu/ucsd/msjava/mzid/MzIDParser.java
  • src/main/java/edu/ucsd/msjava/mzid/MzIdAdapter.java
  • src/main/java/edu/ucsd/msjava/mzml/MzMLAdapter.java
  • src/main/java/edu/ucsd/msjava/mzml/MzMLSpectraIterator.java
  • src/main/java/edu/ucsd/msjava/mzml/MzMLSpectraMap.java
  • src/main/java/edu/ucsd/msjava/mzml/SpectrumConverter.java
  • src/main/java/edu/ucsd/msjava/mzml/StaxMzMLParser.java
  • src/main/java/edu/ucsd/msjava/mzml/StaxMzMLSpectraIterator.java
  • src/main/java/edu/ucsd/msjava/mzml/StaxMzMLSpectraMap.java
  • src/main/java/edu/ucsd/msjava/params/IntRangeParameter.java
  • src/main/java/edu/ucsd/msjava/params/ParamManager.java
  • src/main/java/edu/ucsd/msjava/parser/InsPecTPSM.java
  • src/main/java/edu/ucsd/msjava/parser/InsPecTParser.java
  • src/main/java/edu/ucsd/msjava/parser/MSGFDBParser.java
  • src/main/java/edu/ucsd/msjava/parser/MSGappedDictionaryPSM.java
  • src/main/java/edu/ucsd/msjava/parser/MSGappedDictionaryParser.java
  • src/main/java/edu/ucsd/msjava/parser/MascotParser.java
  • src/main/java/edu/ucsd/msjava/parser/MzXMLSpectraIterator.java
  • src/main/java/edu/ucsd/msjava/parser/MzXMLSpectraMap.java
  • src/main/java/edu/ucsd/msjava/parser/MzXMLToMgfConverter.java
  • src/main/java/edu/ucsd/msjava/parser/OMSSAParser.java
  • src/main/java/edu/ucsd/msjava/parser/PSM.java
  • src/main/java/edu/ucsd/msjava/parser/PSMList.java
  • src/main/java/edu/ucsd/msjava/parser/PepXMLParser.java
  • src/main/java/edu/ucsd/msjava/parser/SearchParameterParser.java
  • src/main/java/edu/ucsd/msjava/parser/SortedSpectraIterator.java
  • src/main/java/edu/ucsd/msjava/scripts/AgilentCyclicSpecPreProcess.java
  • src/main/java/edu/ucsd/msjava/scripts/CountSpectra.java
  • src/main/java/edu/ucsd/msjava/scripts/GetDBInfo.java
  • src/main/java/edu/ucsd/msjava/scripts/MergeSpectra.java
  • src/main/java/edu/ucsd/msjava/scripts/SelectSpectra.java
  • src/main/java/edu/ucsd/msjava/scripts/SpecFileValidator.java
  • src/main/java/edu/ucsd/msjava/suffixarray/MassArray.java
  • src/main/java/edu/ucsd/msjava/ui/MSDictionary.java
  • src/main/java/edu/ucsd/msjava/ui/MSGF.java
  • src/main/java/edu/ucsd/msjava/ui/MSGFDB.java
  • src/main/java/edu/ucsd/msjava/ui/MSGFDBLib.java
  • src/main/java/edu/ucsd/msjava/ui/MSGFLib.java
  • src/main/java/edu/ucsd/msjava/ui/MSGFPlus.java
  • src/main/java/edu/ucsd/msjava/ui/MSProfile.java
  • src/main/java/edu/ucsd/msjava/ui/MzIDToTsv.java
  • src/main/java/edu/ucsd/msjava/ui/PRMSpecGen.java
  • src/main/java/edu/ucsd/msjava/ui/ScoringParamGen.java
  • src/main/java/org/systemsbiology/jrap/stax/Base64.java
  • src/main/java/org/systemsbiology/jrap/stax/ByteBufferIterator.java
  • src/main/java/org/systemsbiology/jrap/stax/DataProcessingInfo.java
  • src/main/java/org/systemsbiology/jrap/stax/EndPatternStringIterator.java
  • src/main/java/org/systemsbiology/jrap/stax/FileHeaderParser.java
  • src/main/java/org/systemsbiology/jrap/stax/IndexParser.java
  • src/main/java/org/systemsbiology/jrap/stax/LineIterator.java
  • src/main/java/org/systemsbiology/jrap/stax/MLScanAndHeaderParser.java
  • src/main/java/org/systemsbiology/jrap/stax/MSInstrumentInfo.java
  • src/main/java/org/systemsbiology/jrap/stax/MSOperator.java
  • src/main/java/org/systemsbiology/jrap/stax/MSXMLParser.java
  • src/main/java/org/systemsbiology/jrap/stax/MSXMLSequentialParser.java
  • src/main/java/org/systemsbiology/jrap/stax/MZXMLFileInfo.java
  • src/main/java/org/systemsbiology/jrap/stax/ParentFile.java
  • src/main/java/org/systemsbiology/jrap/stax/Scan.java
  • src/main/java/org/systemsbiology/jrap/stax/ScanAndHeaderParser.java
  • src/main/java/org/systemsbiology/jrap/stax/ScanHeader.java
  • src/main/java/org/systemsbiology/jrap/stax/SoftwareInfo.java
  • src/main/java/org/systemsbiology/jrap/stax/StringBuilderReader.java
  • src/main/java/org/systemsbiology/jrap/stax/TestParser.java
  • src/test/java/edu/ucsd/msjava/mzid/AnalysisProtocolCollectionGenTest.java
  • src/test/java/ims/IMSMSGFTest.java
  • src/test/java/ims/IMSMiscTest.java
  • src/test/java/ims/IMSResultProcessor.java
  • src/test/java/ims/SarcTest.java
  • src/test/java/ipa/TestIPA.java
  • src/test/java/msgfplus/TestIntRangeParameter.java
  • src/test/java/msgfplus/TestMSGFLogger.java
  • src/test/java/msgfplus/TestMSGFPlus.java
  • src/test/java/msgfplus/TestMSLevelFiltering.java
  • src/test/java/msgfplus/TestMZIdentMLGen.java
  • src/test/java/msgfplus/TestMinSpectraPerThread.java
  • src/test/java/msgfplus/TestMisc.java
  • src/test/java/msgfplus/TestParsers.java
  • src/test/java/msgfplus/TestPrimitiveRegression.java
  • src/test/java/msgfplus/TestRunManifestWriter.java
  • src/test/java/msgfplus/TestScoring.java
  • src/test/java/msgfplus/TestStaxMzMLParser.java
  • src/test/resources/Tryp_Pig_Bov.revCat.canno
  • src/test/resources/Tryp_Pig_Bov.revCat.cnlcp
  • src/test/resources/Tryp_Pig_Bov.revCat.csarr
  • src/test/resources/Tryp_Pig_Bov.revCat.cseq
  • src/test/resources/Tryp_Pig_Bov.revCat.fasta
  • src/test/resources/benchmark/PXD001819/README.md
  • src/test/resources/benchmark/PXD001819/mods.txt

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ypriverol and others added 14 commits April 16, 2026 23:21
…ives-gf

Replaces FlexAminoAcidGraph + GeneratingFunction with CSR-based
PrimitiveAminoAcidGraph + flat-array PrimitiveGeneratingFunction in
DBScanner.computeSpecEValue hot path. Ported without the experimental
remainingUses / eager ScoreDist release (proven 5.9% CPU regression
for 3.3% RSS gain).

Legacy FlexAminoAcidGraph and GeneratingFunction remain for other
callers. Parity verified by new TestPrimitiveRegression (ported from
feat/primitives-gf) and existing test suite.

Phase 1 of 3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites PrimitiveGeneratingFunctionGroup as a running merger. Each
per-mass-index GF is computed, merged into the aggregate via
addProbDist, and released before the next mass index is built, so peak
memory stays at one graph + one GF regardless of tolerance-window
width. Math is identical because addProbDist with scoreDiff=0 and
aaProb=1f is a linear sum; the running aggregate transparently widens
its bounds if a later GF has a wider score range.

This addresses the Phase-1 Astral regression (127% slower, +7.3% RSS)
caused by concurrent accumulation of distByNode across all mass
indices. TMT parity preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two user-facing pages to close documentation gaps surfaced in the
landscape review:

- Troubleshooting.md: centroiding (MSConvert peakPicking filter), XML
  prolog / encoding errors, FASTA size limits and split-and-merge
  workaround, -Xmx sizing table, -tasks tuning for OOM, thread-cap
  behaviour, and the OpenMS TOPPAS adapter workaround.
- IsobaricLabeling.md: fixed-mod recipes for TMT-6/10/11, TMTpro-16/18,
  iTRAQ-4/8, the correct -protocol flag per label, and a full worked
  Mods.txt example. Addresses the recurring "does MS-GF+ support
  TMT-16plex?" question (issue MSGFPlus#82) which is a docs, not a code, gap.

Also wires the two pages into docs/README.md's table of contents.

The .gitignore update excludes a local-only working research note
(.claude/investigations/msgfplus_research_report.md) from the repo,
consistent with the existing "benchmark harness is local-only"
convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Astral profiling revealed that synchronized Hashtable contention in the
shared scorer tables dominated runtime: Hashtable.get alone took 23%
of CPU and the surrounding monitor machinery (TrySpin, SafeFetch,
ObjectSynchronizer) added another ~20% — 43% of CPU was sync overhead
with 4 worker threads serializing through per-table monitors.

NewRankScorer tables (fragOFFTable, insignificantFragOFFTable,
rankDistTable, ionErrDistTable, noiseErrDistTable, ionExistenceTable)
are populated once in readFromFile and read-only during search, so
plain HashMap is safe. NewScorerFactory.scorerTable is a mutable cache
with possible concurrent writes before warmup, so it uses
ConcurrentHashMap. ScoringParameterGenerator(s) use HashMap too to
match the NewRankScorer field types (build-time, not search path).

Benchmarks (baseline = dev, branch = feat/primitives-optimization):
  PXD001819 LFQ:  176.5s -> 109.4s (-38.0%),  2254 -> 2472 MB (+9.7%)
  TMT:            644.3s -> 265.6s (-58.8%),  2872 -> 3125 MB (+8.8%)
  Astral:        2155.9s -> 723.3s (-66.5%),  6775 -> 7627 MB (+12.6%)

PSM@1%FDR and SII counts match exactly on all three datasets. RSS
grows modestly because worker threads now actually run in parallel
(no more monitor serialization), so 4 concurrent graph/GF states are
alive instead of effectively one.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync
Native Math.log (libmLog) was 5.46% of CPU in the post-PR#15 Astral
profile. The call sites in NewRankScorer.getErrorScore and
getNodeScore / getMissingIonScore compute log(x/y) over frequency
arrays that are immutable after scorer load; the denominator scale
factor min(ionType.getCharge(), numSegments) is also load-time
constant. Cache the resulting float values once at the end of
readFromInputStream and replace the runtime Math.log calls with
direct array indexing.

Scoring results are bit-identical: same expressions, same operand
ordering, same float rounding; the only difference is that the cast
to float happens once per cell at load instead of per call. Both
hot-path methods keep a fallback to the original computation so
legacy callers that populate the tables without going through
readFromInputStream still work.

Benchmarks on the current machine state (baseline = dev HEAD jar,
same run, same thermal state):

  PXD001819 LFQ:  122.7s -> 110.4s (-10.0%),  2410 ->  2292 MB (-4.9%)
  TMT:            295.7s -> 277.9s ( -6.0%),  2793 ->  2818 MB (+0.9%)
  Astral:        1002.9s -> 883.5s (-12.0%),  7707 ->  7351 MB (-4.6%)

PSM@1%FDR and SII counts match exactly on all three datasets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
perf(scorer): precompute log scores in NewRankScorer
Two small quick-wins from the April 2026 landscape review (research
report Q1 and Q3, msgfplus_research_report.md §0.1):

Q1 - "Skip spectrum since it is not centroided" now tells the user how
to fix the input:
  - profile spectra: "Re-run MSConvert with --filter \"peakPicking true 1-\"
    to centroid the spectra."
  - dense centroided: "Pass -allowDenseCentroidedPeaks 1 if the spectrum
    is already centroided."
This is the most common onboarding failure mode surfaced in issue MSGFPlus#116
and in user support threads; the previous message only said that a
spectrum was skipped, leaving users to guess why.

Q3 - New AnalysisProtocolCollectionGenTest locks in the fix from issue
MSGFPlus#72. If the mzid Enzyme/@missedCleavages attribute ever stops reflecting
the user's -maxMissedCleavages (including the 0 and -1/"no limit"
sentinel values), the test fails. Three cases covered: 2, 0, -1.

No production-code behaviour change for the Q3 path; Q1 is an error
message tweak only. 144 tests pass (was 141).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rser

quantms and other nf-core / SDRF-driven pipelines use
ThermoRawFileParser (TRFP) for raw-to-mzML conversion, not MSConvert.
The previous inline error message mentioned only MSConvert, which is
the wrong tool for a large and growing user base.

- SpecKey.java: per-spectrum hint now covers both paths in one short
  line: "ThermoRawFileParser centroids Thermo MS2 by default;
  MSConvert --filter \"peakPicking true 1-\"".
- Troubleshooting.md: restructured the "Skip spectrum since it is not
  centroided" section into a per-tool fix list (TRFP, MSConvert,
  OpenMS) so users on any pipeline can map the error to their own
  conversion step.

No production behaviour change; strings + docs only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The thread-count cap in MSGFPlus.runMSGFPlus and MSGFDB.runMSGFDB
previously hardcoded "minimum 250 spectra per thread" (ui/MSGFPlus.java,
ui/MSGFDB.java). On many-core hosts running small inputs (e.g. 20 cores,
~1,000 spectra) this capped the search at ~4 threads, surprising users.

Rather than guess a new default, expose the divisor as -minSpectraPerThread
(default 250, min 1). Power users can lower it to raise parallelism on
small inputs; everyone else gets identical behaviour to before.

Wired in both MSGFPlus and the deprecated MSGFDB entry points so behaviour
stays consistent. Addresses issue MSGFPlus#52.

Tests: TestMinSpectraPerThread covers default, override, zero-rejection,
and MSGFDB registration. mvn -B verify: 145/145 tests pass, 57 skipped.

Docs: Troubleshooting.md and MSGFPlus.md now show the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a lightweight static logger at edu.ucsd.msjava.misc.MSGFLogger with
info/debug/warn/error levels. Debug is gated on the existing -verbose 0/1
flag; warn/error go to stderr with [Warning]/[Error] prefixes. No external
dependencies (no slf4j/log4j) to keep the jar small.

Wires MSGFPlus.main() to call MSGFLogger.setVerbose(...) once after
parseParams, so the whole run inherits the CLI setting. Migrates the
top-level main() and the runMSGFPlus(ParamManager) dispatch loop:

  - Error paths: System.err.println("[Error] ...") -> MSGFLogger.error(...)
  - "Processing N spectra" (summary)          -> info
  - Per-file enumeration                      -> debug
  - Per-file "Processing"/"Ignoring" banner   -> info
  - Per-file "Writing results to"/"Output... exists" detail -> debug
  - "MS-GF+ complete" footer                  -> info
  - Decoy-ratio mismatch errors               -> MSGFLogger.error

Default behaviour (-verbose 0) is unchanged for all non-debug messages.
Running with -verbose 1 now exposes the per-file enumeration and the
per-file output/ignore details.

Intentionally narrow scope: the other ~260 System.out.println call sites
across DBScanner, ConcurrentMSGFPlus, BuildSA, etc. are unchanged. This
PR establishes the logger and wiring; case-by-case migration of those
sites can follow as they are touched.

Tests: TestMSGFLogger (7 tests) covers info-always, debug-gating, warn/
error stderr routing, format interpolation, and the isVerbose getter.
mvn -B verify: 152/152 tests pass, 57 skipped (same as before).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Writes <output.mzid>.manifest.json next to each mzIdentML output
capturing the run context: MS-GF+ version and timestamp; Java version,
vendor, OS; max heap and thread count; enzyme / instrument /
activation / protocol; precursor tolerance, isotope-error range,
length and charge bounds, missed-cleavage cap; spec-file and
FASTA-file absolute paths with byte sizes; and the original CLI argv
verbatim. Downstream pipelines (quantms, Galaxy-P, custom
reanalysis scripts) can then verify or reproduce a search without
re-parsing logs.

Called from MSGFPlus.runMSGFPlus after each successful per-file
search. Failures to write are MSGFLogger.warn()-logged and never
abort the search — manifests are advisory metadata, not output.

JSON is hand-rolled (stable key order, UTF-8, 2-space indent) so no
new dependency is pulled into the shaded jar.

Tests: TestRunManifestWriter covers required identity fields, echoed
SearchParams values, argv preservation, null-argv tolerance, and
end-to-end sidecar write/read.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chore(reliability): actionable centroiding error + missedCleavages test
feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants