First bigbio release of msgf+ by ypriverol · Pull Request #14 · bigbio/msgfplus

ypriverol · 2026-04-16T21:28:05Z

No description provided.

@ignore

…Java 17 - Change skipTests from true to false so mvn test actually runs - Update maven-compiler-plugin source/target from 1.8 to 17 (matches runtime) - Add missing compile dependencies: jmzml 1.7.11, fastutil 8.5.12, slf4j-api 1.7.36, logback-classic 1.2.12, commons-io 2.15.1 (master code references these classes but they were not declared) - @ignore TestMzML test that requires Windows-specific DMS files Result: 120 tests run, 53 active, 67 skipped, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…shold In MZIdentMLGen.addSpectrumIdentificationResults(), change `break` to `continue` when a match has DeNovoScore below the minimum threshold. The `break` was incorrectly stopping emission of all subsequent matches for that spectrum, silently dropping valid PSMs from the mzid output. Also add null safety check for spectrum index lookup — if a spectrum index is not found in the spectrum file, log a warning and skip instead of throwing a NullPointerException. Add TestMZIdentMLGen with two integration tests: - testMzidScoreCompleteness: runs MSGF+ search, verifies every SII has all 4 score CVParams (RawScore, DeNovoScore, SpecEValue, EValue) - testMzidStructuralValidity: verifies output mzid has required mzIdentML structure elements Closes MSGFPlus#157 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add new -msLevel CLI parameter to filter spectra by MS level. Accepts single value (e.g., -msLevel 2) or comma-separated range (e.g., -msLevel 2,3). Default is 2 (MS2 only). Changes: - ParamManager: add MS_LEVEL enum and registration - IntRangeParameter: enable single-value parsing, fix typo - SearchParams: add minMSLevel/maxMSLevel fields - SpecKey: filter spectra by MS level in getSpecKeyList() - SpectraAccessor: add setMSLevelRange(), wire to parsers - MzMLAdapter/MzXMLSpectraMap: fix maxMSLevel to be inclusive - MSGFPlus/MSGFDB/MSGFDBLib: wire MS level parameters - pom.xml: remove fastutil shade filter (jmzml 1.7.11 needs full fastutil) Tests: TestIntRangeParameter (9 tests), TestMSLevelFiltering (6 tests) Benchmark (TMT 1.1GB, TDA): Baseline: 1245s, 6654 PSMs@1%FDR -msLevel 2: 957s (-23%), 6936 PSMs@1%FDR (+4.2%) Closes MSGFPlus#159 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(MSGFPlus#159): add -msLevel parameter for MS level filtering

fix(MSGFPlus#157): preserve PSM scores when DeNovoScore is below threshold

fix: enable test suite and fix broken build dependencies

Remove standalone scripts, legacy tools, and unused classes that are not referenced by the core MSGF+ search pipeline, reducing codebase by ~22,000 lines. Deleted entire packages: - ims/ (9 files) — legacy IMS utilities - ipa/ (5 files) — unused isotope pattern analysis - msgf2d/ (8 files) — abandoned 2D scoring experiment - msdictionary/ (7 files) — unused genome dictionary tool - mstag/ (3 files) — unused sequence tagging - scripts/ (6 files) — standalone CLI utilities - msutil/test/ (3 files) — misplaced test classes - msgf/test/ (2 files) — legacy test stubs - msgf/analysis/ (1 file) — unused ROC generator Cleaned mixed packages: - misc/: removed 59 standalone scripts, kept 5 core utilities - msgf/: removed 6 unused graph/scoring classes - msutil/: removed 9 unused filter/annotation classes - msdbsearch/: removed 4 standalone DB tools - parser/: removed 9 legacy format parsers (InsPecT, Mascot, etc.) - ui/: removed 6 legacy entry points (MSGF, MSGFLib, etc.) - mzid/: removed 1 unused adapter stub - msscorer/: removed 1 unused stats class - suffixarray/: removed 1 unused mass array class Also removed dead test methods and cleaned dangling imports. Tests: 119 run, 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rewrite README.md: - Full parameter reference tables covering all 30+ flags organized by category (core search, fragmentation, enzyme, filtering, etc.) - Quick start examples for basic and TMT searches - Modification file format documentation with examples - Build-from-source instructions - Updated requirements to Java 17+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add parameter docs to README and CI/CD workflow

Remove dead code: 150 unused classes, -22K lines

Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the jmzml JAXB-based MzMLUnmarshaller with a lightweight StAX streaming parser that extracts only the 11 fields MSGF+ needs. The new parser builds a spectrum index in a single pass, then preloads all spectra into memory on first random access, eliminating repeated XML parsing during the search phase. Benchmark (TMT 1.1GB mzML, target-decoy, 4 threads): - Wall time: 957s -> 853s (-10.9%) - PSMs at 1% FDR: 6,936 (unchanged) - Score completeness: 100% (unchanged) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…port - Remove jmzml (JAXB-based mzML parser) dependency from pom.xml - Delete old jmzml-dependent classes: MzMLAdapter, MzMLSpectraMap, MzMLSpectraIterator, SpectrumConverter - Add referenceableParamGroupRef resolution to StaxMzMLParser: builds a map of param groups during index pass, resolves refs during spectrum parsing (critical for files that define polarity, MS level, etc. in referenceable groups) - Move turnOffLogs() utility to StaxMzMLParser, update all callers - Keep fastutil dependency (needed by jmzidml at runtime) JAR size reduced from 39.5MB to 38MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jmzReader (uk.ac.ebi.pride.tools:jmzreader:2.0.6) had zero imports anywhere in the codebase — a dead dependency from earlier development. All spectrum file format parsing uses custom implementations: mzML (StaxMzMLParser), mzXML (embedded jrap/stax), MGF/MS2/PKL (custom parsers). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove mzXML file format support entirely: - Delete embedded jrap/stax library (20 files, ~5,800 lines) - Delete MzXMLSpectraMap, MzXMLSpectraIterator, MzXMLToMgfConverter - Delete MzXMLToMgf utility and mzXML test resources (38MB) - Remove MZXML from SpecFileFormat enum, SpectraAccessor, ParamManager - Update misc/scripts/ui classes to remove mzXML code paths mzXML is a legacy format superseded by mzML. Users with mzXML files can convert to mzML using msconvert (ProteoWizard). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- StaxMzMLParser: use ConcurrentHashMap for thread-safe spectrum cache, fix class-level doc (preload-all, not bounded LRU), check index before preloading, propagate exceptions instead of returning null - StaxMzMLSpectraIterator: throw NoSuchElementException when exhausted - SpectraAccessor: throw exception instead of System.exit(-1), validate specFormat is non-null in constructor - SelectSpectra: update stale .mzXML reference to .mzML - pom.xml: fix duplicate <manifest>, remove stale comments, note fastutil is required by jmzidentml at runtime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace ConvertToMgf-based tests (class removed in PR #7) with StaxMzMLParser and SpectraAccessor mzML parsing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines

perf: replace jmzml JAXB parser with StAX-based mzML reader

* chore: add CI/release packaging and benchmark scaffolding Split infra and repository maintenance updates into a dedicated reviewable change set, including workflow automation, Docker packaging, benchmark scripts/docs, and project documentation updates. Exclude large local benchmark artifacts and keep this PR focused on non-hot-path code organization and release hygiene. Made-with: Cursor * chore: keep benchmark folder local-only Remove benchmark scripts/docs from this branch and ignore the entire benchmark directory so local benchmarking assets do not appear in review PRs. Made-with: Cursor * docs: keep single canonical primitives plan Fold memory-reduction guidance into the balanced primitives plan and remove the old duplicate plan file so review and maintenance use one canonical document. Made-with: Cursor * chore: narrow PR1 plans to scope-only docs Remove unrelated strategy and optimization plan documents from PR1 so this branch stays focused on infra/packaging cleanup. Keep only the plans index file in this PR. Made-with: Cursor * chore: remove legacy ZippedReleases folder Delete the obsolete Windows release helper scripts and reference files under ZippedReleases from the repository. Made-with: Cursor * chore: remove legacy extlib dependency jar Delete the obsolete jrap/stax legacy jar under extlib as part of repository cleanup. Made-with: Cursor * fix: address copilot review feedback for PR11 Align docs with actual supported legacy formats, update release pipeline to build from tag version with tests, and fix Docker build JDK requirement. Made-with: Cursor * chore: minor packaging/docs hygiene for PR1 Normalize ignore files, shrink Docker build context, align agent README with dev/CI, and clarify release workflow step naming. Made-with: Cursor * docs: trim examples folder to small referenced artifacts Remove duplicate Tryp_Pig_Bov DB/index copies (tests use src/test/resources), drop large unlinked Excel/PNG teaching files, and add docs/examples/README.md so the directory purpose is obvious. Link the index from the main README. Made-with: Cursor * chore: remove IntelliJ IDEA tips screenshots from docs Made-with: Cursor * docs: replace legacy HTML manuals with Markdown Convert docs/*.html to GitHub-flavored Markdown (pandoc), fix internal links, add docs/README.md as the documentation index, and remove unused style.css. Made-with: Cursor * docs: strip leftover HTML span wrappers from converted Markdown Made-with: Cursor

Add a workflow-dispatch benchmark pipeline on a fixed self-hosted runner profile, with public-data download, metrics emission, and baseline TSV comparison under benchmark/ci/PXD001819 for future dataset expansion. Made-with: Cursor

Use uppercase PXD001819 naming in workflow-visible labels/artifacts and update README to state mzXML is not available in this fork. Made-with: Cursor

Made-with: Cursor

- run_ci.sh: count only opening <SpectrumIdentificationItem> tags for sii_count (prior substring match double-counted closing tags and picked up SpectrumIdentificationItemRef) - run_ci.sh: always emit peak_rss_kb and cpu_percent (NA when GNU time does not expose them) so metrics file format is consistent - compare_metrics.py: support an `optional` column; optional missing/NA metrics warn instead of failing CI - baseline.tsv: add optional column, mark peak_rss_kb optional, fix ubuntu-latest note to reference the self-hosted runner, widen sii_count floor to match the de-duplicated count - README pointers: update stale references to a non-existent benchmark/run_pxd001819_benchmark.sh script - benchmark/README.md: describe the actual committed CI scaffold instead of an uncommitted local harness layout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- extract_metrics.py: stream-parse mzIdentML with ElementTree.iterparse so SII counting and PSM 1% FDR counting no longer rely on line-shaped regex matches over XML - run_ci.sh: use a bash array for SEARCH_ARGS (safe against future flags with spaces), atomic .part downloads, validate cached gzip, default MSGFPLUS_THREADS to 8 to match the workflow, drop the always-zero java_exit metric, and emit integer wall_time_sec - workflow: pin Python via actions/setup-python@v5 so self-hosted runners have a known 3.11 interpreter for the helper scripts - compare_metrics.py: add test_compare_metrics.py covering in-range pass, out-of-range fail, missing required/optional, NA, non-numeric, and empty-range rows (7 tests, all passing) - .gitignore: drop redundant benchmark/** patterns (already covered by benchmark/* + ci/ allowlist); add __pycache__/ and *.pyc - docs: describe new helper and test scripts in both READMEs

fix(benchmark): harden PXD001819 scaffold per review feedback

benchmark

coderabbitai · 2026-04-16T21:28:14Z

Important

Review skipped

Too many files!

This PR contains 282 files, which is 132 over the limit of 150.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1977927b-c8cb-498a-9833-14e5a459cdb3

📥 Commits

Reviewing files that changed from the base of the PR and between f0bc79e and 1bd9ff2.

⛔ Files ignored due to path filters (18)

benchmark/ci/PXD001819/baseline.tsv is excluded by !**/*.tsv
docs/IntelliJ_IDEA_Tips/InvalidateCaches.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging1.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging2_TestClass.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging2_TestMethod.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging3.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging4.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging5.png is excluded by !**/*.png
docs/IntelliJ_IDEA_Tips/UnitTestDebugging6.png is excluded by !**/*.png
docs/examples/Find_best_peptide_for_each_scan.xlsx is excluded by !**/*.xlsx
docs/examples/IsotopeRangeExample1.xlsx is excluded by !**/*.xlsx
docs/examples/IsotopeRangeExample2.xlsx is excluded by !**/*.xlsx
docs/examples/MassErrorHistogram_QC_Mam_19_01-run3_03Jan20.png is excluded by !**/*.png
docs/examples/MassErrorHistogram_QC_Mam_19_01-run3_03Jan20.xlsx is excluded by !**/*.xlsx
docs/examples/PepQValue_Computation_Mockup.xlsx is excluded by !**/*.xlsx
docs/examples/QValue_Computation.xlsx is excluded by !**/*.xlsx
docs/examples/QValue_Computation_Excerpt.xlsx is excluded by !**/*.xlsx
extlib/jrap_StAX_v5.2.jar is excluded by !**/*.jar

📒 Files selected for processing (282)

.claude/CLAUDE.md
.claude/investigations/001-mgf-scan-number-extraction-failure.md
.claude/investigations/002-evalue-target-decoy-leakage-to-percolator.md
.claude/investigations/README.md
.claude/plans/README.md
.claude/skills/README.md
.claude/skills/score-output-safety.md
.dockerignore
.github/workflows/benchmark-pxd001819.yml
.github/workflows/ci.yml
.github/workflows/release.yml
.gitignore
Dockerfile
README.md
ZippedReleases/Distribute_Program.bat
ZippedReleases/Distribute_Program_Use_Proto-2.bat
ZippedReleases/ReferenceFiles/BuildSuffixArray.bat
ZippedReleases/ReferenceFiles/Convert_mzid_to_tsv.bat
ZippedReleases/ReferenceFiles/HowTo_Run_MSGFPlus.txt
ZippedReleases/ReferenceFiles/MSGFPlus_Mods1.txt
ZippedReleases/ReferenceFiles/MSGFPlus_Mods2.txt
ZippedReleases/ReferenceFiles/MSGFPlus_Mods3.txt
ZippedReleases/ReferenceFiles/Start_MSGFPlus.bat
ZippedReleases/ReferenceFiles/Syntax.txt
ZippedReleases/ZipFilesForRelease.bat
benchmark/README.md
benchmark/ci/PXD001819/compare_metrics.py
benchmark/ci/PXD001819/extract_metrics.py
benchmark/ci/PXD001819/run_ci.sh
benchmark/ci/PXD001819/test_compare_metrics.py
benchmark/ci/README.md
docs/BuildSA.html
docs/BuildSA.md
docs/Changelog.html
docs/Changelog.md
docs/IsobaricLabeling.md
docs/MS-GFDB.html
docs/MS-GFDB.md
docs/MSGFDB_ModFile.html
docs/MSGFDB_ModFile.md
docs/MSGFPlus.html
docs/MSGFPlus.md
docs/MzidToTsv.html
docs/MzidToTsv.md
docs/README.md
docs/ScoringParamGen.html
docs/ScoringParamGen.md
docs/Troubleshooting.md
docs/examples/IndexFasta.bat
docs/examples/README.md
docs/examples/Tryp_Pig_Bov.canno
docs/examples/Tryp_Pig_Bov.cnlcp
docs/examples/Tryp_Pig_Bov.csarr
docs/examples/Tryp_Pig_Bov.cseq
docs/examples/Tryp_Pig_Bov.fasta
docs/index.html
docs/style.css
pom.xml
src/main/java/edu/ucsd/msjava/ims/DtaToMSGFInput.java
src/main/java/edu/ucsd/msjava/ims/DtaToMSGFInputDB.java
src/main/java/edu/ucsd/msjava/ims/GetTheBestPerPeptide.java
src/main/java/edu/ucsd/msjava/ims/GetTheBestPerScan.java
src/main/java/edu/ucsd/msjava/ims/MaskSpectra.java
src/main/java/edu/ucsd/msjava/ims/Misc.java
src/main/java/edu/ucsd/msjava/ims/OptimizeCE.java
src/main/java/edu/ucsd/msjava/ims/SplitDta.java
src/main/java/edu/ucsd/msjava/ims/Summarize.java
src/main/java/edu/ucsd/msjava/ipa/Feature.java
src/main/java/edu/ucsd/msjava/ipa/IPA.java
src/main/java/edu/ucsd/msjava/ipa/MS1SpectraMap.java
src/main/java/edu/ucsd/msjava/ipa/MSGFPlusResultSet.java
src/main/java/edu/ucsd/msjava/ipa/PSM.java
src/main/java/edu/ucsd/msjava/misc/AgilentQTOF.java
src/main/java/edu/ucsd/msjava/misc/AnnotatedMgfToMSGFInput.java
src/main/java/edu/ucsd/msjava/misc/AnnotatedSpecGenerator.java
src/main/java/edu/ucsd/msjava/misc/CIDETDPairs.java
src/main/java/edu/ucsd/msjava/misc/CalcFastaDBSize.java
src/main/java/edu/ucsd/msjava/misc/ChargePrediction.java
src/main/java/edu/ucsd/msjava/misc/Chores.java
src/main/java/edu/ucsd/msjava/misc/Clauser.java
src/main/java/edu/ucsd/msjava/misc/CompGraphPaper.java
src/main/java/edu/ucsd/msjava/misc/CompactSATest.java
src/main/java/edu/ucsd/msjava/misc/CompareSearchResults.java
src/main/java/edu/ucsd/msjava/misc/CompositionFirst.java
src/main/java/edu/ucsd/msjava/misc/ControlNew.java
src/main/java/edu/ucsd/msjava/misc/ConvertToMgf.java
src/main/java/edu/ucsd/msjava/misc/CountID.java
src/main/java/edu/ucsd/msjava/misc/CountPSMs.java
src/main/java/edu/ucsd/msjava/misc/CountSequestIDs.java
src/main/java/edu/ucsd/msjava/misc/DatToTxt.java
src/main/java/edu/ucsd/msjava/misc/Deconvolution.java
src/main/java/edu/ucsd/msjava/misc/FileFilter.java
src/main/java/edu/ucsd/msjava/misc/FilteringEfficiency.java
src/main/java/edu/ucsd/msjava/misc/FindPSMIntersection.java
src/main/java/edu/ucsd/msjava/misc/GetProteinLength.java
src/main/java/edu/ucsd/msjava/misc/GetSearchParams.java
src/main/java/edu/ucsd/msjava/misc/HCDCIDETD.java
src/main/java/edu/ucsd/msjava/misc/HeckPercolator.java
src/main/java/edu/ucsd/msjava/misc/HeckRevision.java
src/main/java/edu/ucsd/msjava/misc/HeckWhole.java
src/main/java/edu/ucsd/msjava/misc/IPA.java
src/main/java/edu/ucsd/msjava/misc/IPRGStudy.java
src/main/java/edu/ucsd/msjava/misc/ISBETDAnalysis.java
src/main/java/edu/ucsd/msjava/misc/LibraryScripts.java
src/main/java/edu/ucsd/msjava/misc/MS2ToMgf.java
src/main/java/edu/ucsd/msjava/misc/MSGFDBToInspect.java
src/main/java/edu/ucsd/msjava/misc/MSGFDBToQSpec.java
src/main/java/edu/ucsd/msjava/misc/MSGFLogger.java
src/main/java/edu/ucsd/msjava/misc/MSGFPlusPaper.java
src/main/java/edu/ucsd/msjava/misc/MakePrefixDB.java
src/main/java/edu/ucsd/msjava/misc/MassCalc.java
src/main/java/edu/ucsd/msjava/misc/MergeTargetDecoyFiles.java
src/main/java/edu/ucsd/msjava/misc/MiscScripts.java
src/main/java/edu/ucsd/msjava/misc/MultiThreadExercise.java
src/main/java/edu/ucsd/msjava/misc/MzXMLToMgf.java
src/main/java/edu/ucsd/msjava/misc/PEMMRProcessor.java
src/main/java/edu/ucsd/msjava/misc/ParamToTxt.java
src/main/java/edu/ucsd/msjava/misc/PepIdxToFasta.java
src/main/java/edu/ucsd/msjava/misc/PhosAnalysis.java
src/main/java/edu/ucsd/msjava/misc/PreprocessSpec.java
src/main/java/edu/ucsd/msjava/misc/RunMSGFDBOnGrid.java
src/main/java/edu/ucsd/msjava/misc/RunManifestWriter.java
src/main/java/edu/ucsd/msjava/misc/RunOMSSAOnCCMS.java
src/main/java/edu/ucsd/msjava/misc/SpectraSTToMSGFInput.java
src/main/java/edu/ucsd/msjava/misc/SpectraSTToTSV.java
src/main/java/edu/ucsd/msjava/misc/SplitFasta.java
src/main/java/edu/ucsd/msjava/misc/SplitMgf.java
src/main/java/edu/ucsd/msjava/misc/SuffixArrayTest.java
src/main/java/edu/ucsd/msjava/misc/SwedCAD.java
src/main/java/edu/ucsd/msjava/misc/TopDownAnalysis.java
src/main/java/edu/ucsd/msjava/misc/TrainScoringParameters.java
src/main/java/edu/ucsd/msjava/misc/VennDiagram.java
src/main/java/edu/ucsd/msjava/misc/Zubarev.java
src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java
src/main/java/edu/ucsd/msjava/msdbsearch/FilterDatabase.java
src/main/java/edu/ucsd/msjava/msdbsearch/MakePairedSpectra.java
src/main/java/edu/ucsd/msjava/msdbsearch/ReverseLibDB.java
src/main/java/edu/ucsd/msjava/msdbsearch/SearchParams.java
src/main/java/edu/ucsd/msjava/msdbsearch/ShuffleDB.java
src/main/java/edu/ucsd/msjava/msdictionary/Codon.java
src/main/java/edu/ucsd/msjava/msdictionary/GenomeLocator.java
src/main/java/edu/ucsd/msjava/msdictionary/GenomeSplitter.java
src/main/java/edu/ucsd/msjava/msdictionary/GenomeTranslator.java
src/main/java/edu/ucsd/msjava/msdictionary/MSDicLauncher.java
src/main/java/edu/ucsd/msjava/msdictionary/ProteinLocator.java
src/main/java/edu/ucsd/msjava/msdictionary/TestMSDictionary.java
src/main/java/edu/ucsd/msjava/msgf/AminoAcidGraph.java
src/main/java/edu/ucsd/msjava/msgf/DeNovoSequencer.java
src/main/java/edu/ucsd/msjava/msgf/GenericDeNovoGraph.java
src/main/java/edu/ucsd/msjava/msgf/LengthPredictor.java
src/main/java/edu/ucsd/msjava/msgf/PercolatorAdapter.java
src/main/java/edu/ucsd/msjava/msgf/PrimitiveAminoAcidGraph.java
src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunction.java
src/main/java/edu/ucsd/msjava/msgf/PrimitiveGeneratingFunctionGroup.java
src/main/java/edu/ucsd/msjava/msgf/ReachableNode.java
src/main/java/edu/ucsd/msjava/msgf/ScoredSpectrumSumHP.java
src/main/java/edu/ucsd/msjava/msgf/SpectrumGraphNode.java
src/main/java/edu/ucsd/msjava/msgf/analysis/ROCGenerator.java
src/main/java/edu/ucsd/msjava/msgf/test/MSGFTest.java
src/main/java/edu/ucsd/msjava/msgf/test/MSGFValidation.java
src/main/java/edu/ucsd/msjava/msgf2d/BacktrackPointer2D.java
src/main/java/edu/ucsd/msjava/msgf2d/BacktrackTable2D.java
src/main/java/edu/ucsd/msjava/msgf2d/CombinePairedSpectra.java
src/main/java/edu/ucsd/msjava/msgf2d/GeneratingFunction2D.java
src/main/java/edu/ucsd/msjava/msgf2d/ScoreBound2D.java
src/main/java/edu/ucsd/msjava/msgf2d/ScoreDist2D.java
src/main/java/edu/ucsd/msjava/msgf2d/ScoreDistMerged.java
src/main/java/edu/ucsd/msjava/msgf2d/TestMSGF2D.java
src/main/java/edu/ucsd/msjava/msscorer/ListStat.java
src/main/java/edu/ucsd/msjava/msscorer/NewRankScorer.java
src/main/java/edu/ucsd/msjava/msscorer/NewScorerFactory.java
src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGenerator.java
src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGeneratorWithErrors.java
src/main/java/edu/ucsd/msjava/mstag/Tag.java
src/main/java/edu/ucsd/msjava/mstag/TagTester.java
src/main/java/edu/ucsd/msjava/mstag/Tagger.java
src/main/java/edu/ucsd/msjava/msutil/CompositionAASet.java
src/main/java/edu/ucsd/msjava/msutil/GappedPeptide.java
src/main/java/edu/ucsd/msjava/msutil/IntMassAASet.java
src/main/java/edu/ucsd/msjava/msutil/NominalMassAASet.java
src/main/java/edu/ucsd/msjava/msutil/ScoringFunction.java
src/main/java/edu/ucsd/msjava/msutil/SpecFileFormat.java
src/main/java/edu/ucsd/msjava/msutil/SpecKey.java
src/main/java/edu/ucsd/msjava/msutil/SpectraAccessor.java
src/main/java/edu/ucsd/msjava/msutil/Spectrum.java
src/main/java/edu/ucsd/msjava/msutil/SpectrumAnnotator.java
src/main/java/edu/ucsd/msjava/msutil/SpectrumRecalibrator.java
src/main/java/edu/ucsd/msjava/msutil/SpectrumTester.java
src/main/java/edu/ucsd/msjava/msutil/TopNFilter.java
src/main/java/edu/ucsd/msjava/msutil/test/MzXMLSpectraIteratorTest.java
src/main/java/edu/ucsd/msjava/msutil/test/MzXMLSpectraMapTest.java
src/main/java/edu/ucsd/msjava/msutil/test/SpectraTest.java
src/main/java/edu/ucsd/msjava/mzid/DirectTSVWriter.java
src/main/java/edu/ucsd/msjava/mzid/MZIdentMLGen.java
src/main/java/edu/ucsd/msjava/mzid/MzIDParser.java
src/main/java/edu/ucsd/msjava/mzid/MzIdAdapter.java
src/main/java/edu/ucsd/msjava/mzml/MzMLAdapter.java
src/main/java/edu/ucsd/msjava/mzml/MzMLSpectraIterator.java
src/main/java/edu/ucsd/msjava/mzml/MzMLSpectraMap.java
src/main/java/edu/ucsd/msjava/mzml/SpectrumConverter.java
src/main/java/edu/ucsd/msjava/mzml/StaxMzMLParser.java
src/main/java/edu/ucsd/msjava/mzml/StaxMzMLSpectraIterator.java
src/main/java/edu/ucsd/msjava/mzml/StaxMzMLSpectraMap.java
src/main/java/edu/ucsd/msjava/params/IntRangeParameter.java
src/main/java/edu/ucsd/msjava/params/ParamManager.java
src/main/java/edu/ucsd/msjava/parser/InsPecTPSM.java
src/main/java/edu/ucsd/msjava/parser/InsPecTParser.java
src/main/java/edu/ucsd/msjava/parser/MSGFDBParser.java
src/main/java/edu/ucsd/msjava/parser/MSGappedDictionaryPSM.java
src/main/java/edu/ucsd/msjava/parser/MSGappedDictionaryParser.java
src/main/java/edu/ucsd/msjava/parser/MascotParser.java
src/main/java/edu/ucsd/msjava/parser/MzXMLSpectraIterator.java
src/main/java/edu/ucsd/msjava/parser/MzXMLSpectraMap.java
src/main/java/edu/ucsd/msjava/parser/MzXMLToMgfConverter.java
src/main/java/edu/ucsd/msjava/parser/OMSSAParser.java
src/main/java/edu/ucsd/msjava/parser/PSM.java
src/main/java/edu/ucsd/msjava/parser/PSMList.java
src/main/java/edu/ucsd/msjava/parser/PepXMLParser.java
src/main/java/edu/ucsd/msjava/parser/SearchParameterParser.java
src/main/java/edu/ucsd/msjava/parser/SortedSpectraIterator.java
src/main/java/edu/ucsd/msjava/scripts/AgilentCyclicSpecPreProcess.java
src/main/java/edu/ucsd/msjava/scripts/CountSpectra.java
src/main/java/edu/ucsd/msjava/scripts/GetDBInfo.java
src/main/java/edu/ucsd/msjava/scripts/MergeSpectra.java
src/main/java/edu/ucsd/msjava/scripts/SelectSpectra.java
src/main/java/edu/ucsd/msjava/scripts/SpecFileValidator.java
src/main/java/edu/ucsd/msjava/suffixarray/MassArray.java
src/main/java/edu/ucsd/msjava/ui/MSDictionary.java
src/main/java/edu/ucsd/msjava/ui/MSGF.java
src/main/java/edu/ucsd/msjava/ui/MSGFDB.java
src/main/java/edu/ucsd/msjava/ui/MSGFDBLib.java
src/main/java/edu/ucsd/msjava/ui/MSGFLib.java
src/main/java/edu/ucsd/msjava/ui/MSGFPlus.java
src/main/java/edu/ucsd/msjava/ui/MSProfile.java
src/main/java/edu/ucsd/msjava/ui/MzIDToTsv.java
src/main/java/edu/ucsd/msjava/ui/PRMSpecGen.java
src/main/java/edu/ucsd/msjava/ui/ScoringParamGen.java
src/main/java/org/systemsbiology/jrap/stax/Base64.java
src/main/java/org/systemsbiology/jrap/stax/ByteBufferIterator.java
src/main/java/org/systemsbiology/jrap/stax/DataProcessingInfo.java
src/main/java/org/systemsbiology/jrap/stax/EndPatternStringIterator.java
src/main/java/org/systemsbiology/jrap/stax/FileHeaderParser.java
src/main/java/org/systemsbiology/jrap/stax/IndexParser.java
src/main/java/org/systemsbiology/jrap/stax/LineIterator.java
src/main/java/org/systemsbiology/jrap/stax/MLScanAndHeaderParser.java
src/main/java/org/systemsbiology/jrap/stax/MSInstrumentInfo.java
src/main/java/org/systemsbiology/jrap/stax/MSOperator.java
src/main/java/org/systemsbiology/jrap/stax/MSXMLParser.java
src/main/java/org/systemsbiology/jrap/stax/MSXMLSequentialParser.java
src/main/java/org/systemsbiology/jrap/stax/MZXMLFileInfo.java
src/main/java/org/systemsbiology/jrap/stax/ParentFile.java
src/main/java/org/systemsbiology/jrap/stax/Scan.java
src/main/java/org/systemsbiology/jrap/stax/ScanAndHeaderParser.java
src/main/java/org/systemsbiology/jrap/stax/ScanHeader.java
src/main/java/org/systemsbiology/jrap/stax/SoftwareInfo.java
src/main/java/org/systemsbiology/jrap/stax/StringBuilderReader.java
src/main/java/org/systemsbiology/jrap/stax/TestParser.java
src/test/java/edu/ucsd/msjava/mzid/AnalysisProtocolCollectionGenTest.java
src/test/java/ims/IMSMSGFTest.java
src/test/java/ims/IMSMiscTest.java
src/test/java/ims/IMSResultProcessor.java
src/test/java/ims/SarcTest.java
src/test/java/ipa/TestIPA.java
src/test/java/msgfplus/TestIntRangeParameter.java
src/test/java/msgfplus/TestMSGFLogger.java
src/test/java/msgfplus/TestMSGFPlus.java
src/test/java/msgfplus/TestMSLevelFiltering.java
src/test/java/msgfplus/TestMZIdentMLGen.java
src/test/java/msgfplus/TestMinSpectraPerThread.java
src/test/java/msgfplus/TestMisc.java
src/test/java/msgfplus/TestParsers.java
src/test/java/msgfplus/TestPrimitiveRegression.java
src/test/java/msgfplus/TestRunManifestWriter.java
src/test/java/msgfplus/TestScoring.java
src/test/java/msgfplus/TestStaxMzMLParser.java
src/test/resources/Tryp_Pig_Bov.revCat.canno
src/test/resources/Tryp_Pig_Bov.revCat.cnlcp
src/test/resources/Tryp_Pig_Bov.revCat.csarr
src/test/resources/Tryp_Pig_Bov.revCat.cseq
src/test/resources/Tryp_Pig_Bov.revCat.fasta
src/test/resources/benchmark/PXD001819/README.md
src/test/resources/benchmark/PXD001819/mods.txt

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ives-gf Replaces FlexAminoAcidGraph + GeneratingFunction with CSR-based PrimitiveAminoAcidGraph + flat-array PrimitiveGeneratingFunction in DBScanner.computeSpecEValue hot path. Ported without the experimental remainingUses / eager ScoreDist release (proven 5.9% CPU regression for 3.3% RSS gain). Legacy FlexAminoAcidGraph and GeneratingFunction remain for other callers. Parity verified by new TestPrimitiveRegression (ported from feat/primitives-gf) and existing test suite. Phase 1 of 3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rewrites PrimitiveGeneratingFunctionGroup as a running merger. Each per-mass-index GF is computed, merged into the aggregate via addProbDist, and released before the next mass index is built, so peak memory stays at one graph + one GF regardless of tolerance-window width. Math is identical because addProbDist with scoreDiff=0 and aaProb=1f is a linear sum; the running aggregate transparently widens its bounds if a later GF has a wider score range. This addresses the Phase-1 Astral regression (127% slower, +7.3% RSS) caused by concurrent accumulation of distByNode across all mass indices. TMT parity preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds two user-facing pages to close documentation gaps surfaced in the landscape review: - Troubleshooting.md: centroiding (MSConvert peakPicking filter), XML prolog / encoding errors, FASTA size limits and split-and-merge workaround, -Xmx sizing table, -tasks tuning for OOM, thread-cap behaviour, and the OpenMS TOPPAS adapter workaround. - IsobaricLabeling.md: fixed-mod recipes for TMT-6/10/11, TMTpro-16/18, iTRAQ-4/8, the correct -protocol flag per label, and a full worked Mods.txt example. Addresses the recurring "does MS-GF+ support TMT-16plex?" question (issue MSGFPlus#82) which is a docs, not a code, gap. Also wires the two pages into docs/README.md's table of contents. The .gitignore update excludes a local-only working research note (.claude/investigations/msgfplus_research_report.md) from the repo, consistent with the existing "benchmark harness is local-only" convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Astral profiling revealed that synchronized Hashtable contention in the shared scorer tables dominated runtime: Hashtable.get alone took 23% of CPU and the surrounding monitor machinery (TrySpin, SafeFetch, ObjectSynchronizer) added another ~20% — 43% of CPU was sync overhead with 4 worker threads serializing through per-table monitors. NewRankScorer tables (fragOFFTable, insignificantFragOFFTable, rankDistTable, ionErrDistTable, noiseErrDistTable, ionExistenceTable) are populated once in readFromFile and read-only during search, so plain HashMap is safe. NewScorerFactory.scorerTable is a mutable cache with possible concurrent writes before warmup, so it uses ConcurrentHashMap. ScoringParameterGenerator(s) use HashMap too to match the NewRankScorer field types (build-time, not search path). Benchmarks (baseline = dev, branch = feat/primitives-optimization): PXD001819 LFQ: 176.5s -> 109.4s (-38.0%), 2254 -> 2472 MB (+9.7%) TMT: 644.3s -> 265.6s (-58.8%), 2872 -> 3125 MB (+8.8%) Astral: 2155.9s -> 723.3s (-66.5%), 6775 -> 7627 MB (+12.6%) PSM@1%FDR and SII counts match exactly on all three datasets. RSS grows modestly because worker threads now actually run in parallel (no more monitor serialization), so 4 concurrent graph/GF states are alive instead of effectively one. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync

Native Math.log (libmLog) was 5.46% of CPU in the post-PR#15 Astral profile. The call sites in NewRankScorer.getErrorScore and getNodeScore / getMissingIonScore compute log(x/y) over frequency arrays that are immutable after scorer load; the denominator scale factor min(ionType.getCharge(), numSegments) is also load-time constant. Cache the resulting float values once at the end of readFromInputStream and replace the runtime Math.log calls with direct array indexing. Scoring results are bit-identical: same expressions, same operand ordering, same float rounding; the only difference is that the cast to float happens once per cell at load instead of per call. Both hot-path methods keep a fallback to the original computation so legacy callers that populate the tables without going through readFromInputStream still work. Benchmarks on the current machine state (baseline = dev HEAD jar, same run, same thermal state): PXD001819 LFQ: 122.7s -> 110.4s (-10.0%), 2410 -> 2292 MB (-4.9%) TMT: 295.7s -> 277.9s ( -6.0%), 2793 -> 2818 MB (+0.9%) Astral: 1002.9s -> 883.5s (-12.0%), 7707 -> 7351 MB (-4.6%) PSM@1%FDR and SII counts match exactly on all three datasets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf(scorer): precompute log scores in NewRankScorer

Two small quick-wins from the April 2026 landscape review (research report Q1 and Q3, msgfplus_research_report.md §0.1): Q1 - "Skip spectrum since it is not centroided" now tells the user how to fix the input: - profile spectra: "Re-run MSConvert with --filter \"peakPicking true 1-\" to centroid the spectra." - dense centroided: "Pass -allowDenseCentroidedPeaks 1 if the spectrum is already centroided." This is the most common onboarding failure mode surfaced in issue MSGFPlus#116 and in user support threads; the previous message only said that a spectrum was skipped, leaving users to guess why. Q3 - New AnalysisProtocolCollectionGenTest locks in the fix from issue MSGFPlus#72. If the mzid Enzyme/@missedCleavages attribute ever stops reflecting the user's -maxMissedCleavages (including the 0 and -1/"no limit" sentinel values), the test fails. Three cases covered: 2, 0, -1. No production-code behaviour change for the Q3 path; Q1 is an error message tweak only. 144 tests pass (was 141). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rser quantms and other nf-core / SDRF-driven pipelines use ThermoRawFileParser (TRFP) for raw-to-mzML conversion, not MSConvert. The previous inline error message mentioned only MSConvert, which is the wrong tool for a large and growing user base. - SpecKey.java: per-spectrum hint now covers both paths in one short line: "ThermoRawFileParser centroids Thermo MS2 by default; MSConvert --filter \"peakPicking true 1-\"". - Troubleshooting.md: restructured the "Skip spectrum since it is not centroided" section into a per-tool fix list (TRFP, MSConvert, OpenMS) so users on any pipeline can map the error to their own conversion step. No production behaviour change; strings + docs only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The thread-count cap in MSGFPlus.runMSGFPlus and MSGFDB.runMSGFDB previously hardcoded "minimum 250 spectra per thread" (ui/MSGFPlus.java, ui/MSGFDB.java). On many-core hosts running small inputs (e.g. 20 cores, ~1,000 spectra) this capped the search at ~4 threads, surprising users. Rather than guess a new default, expose the divisor as -minSpectraPerThread (default 250, min 1). Power users can lower it to raise parallelism on small inputs; everyone else gets identical behaviour to before. Wired in both MSGFPlus and the deprecated MSGFDB entry points so behaviour stays consistent. Addresses issue MSGFPlus#52. Tests: TestMinSpectraPerThread covers default, override, zero-rejection, and MSGFDB registration. mvn -B verify: 145/145 tests pass, 57 skipped. Docs: Troubleshooting.md and MSGFPlus.md now show the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a lightweight static logger at edu.ucsd.msjava.misc.MSGFLogger with info/debug/warn/error levels. Debug is gated on the existing -verbose 0/1 flag; warn/error go to stderr with [Warning]/[Error] prefixes. No external dependencies (no slf4j/log4j) to keep the jar small. Wires MSGFPlus.main() to call MSGFLogger.setVerbose(...) once after parseParams, so the whole run inherits the CLI setting. Migrates the top-level main() and the runMSGFPlus(ParamManager) dispatch loop: - Error paths: System.err.println("[Error] ...") -> MSGFLogger.error(...) - "Processing N spectra" (summary) -> info - Per-file enumeration -> debug - Per-file "Processing"/"Ignoring" banner -> info - Per-file "Writing results to"/"Output... exists" detail -> debug - "MS-GF+ complete" footer -> info - Decoy-ratio mismatch errors -> MSGFLogger.error Default behaviour (-verbose 0) is unchanged for all non-debug messages. Running with -verbose 1 now exposes the per-file enumeration and the per-file output/ignore details. Intentionally narrow scope: the other ~260 System.out.println call sites across DBScanner, ConcurrentMSGFPlus, BuildSA, etc. are unchanged. This PR establishes the logger and wiring; case-by-case migration of those sites can follow as they are touched. Tests: TestMSGFLogger (7 tests) covers info-always, debug-gating, warn/ error stderr routing, format interpolation, and the isVerbose getter. mvn -B verify: 152/152 tests pass, 57 skipped (same as before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Writes <output.mzid>.manifest.json next to each mzIdentML output capturing the run context: MS-GF+ version and timestamp; Java version, vendor, OS; max heap and thread count; enzyme / instrument / activation / protocol; precursor tolerance, isotope-error range, length and charge bounds, missed-cleavage cap; spec-file and FASTA-file absolute paths with byte sizes; and the original CLI argv verbatim. Downstream pipelines (quantms, Galaxy-P, custom reanalysis scripts) can then verify or reproduce a search without re-parsing logs. Called from MSGFPlus.runMSGFPlus after each successful per-file search. Failures to write are MSGFLogger.warn()-logged and never abort the search — manifests are advisory metadata, not output. JSON is hand-rolled (stable key order, UTF-8, 2-space indent) so no new dependency is pulled into the shaded jar. Tests: TestRunManifestWriter covers required identity fields, echoed SearchParams values, argv preservation, null-argv tolerance, and end-to-end sidecar write/read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(reliability): actionable centroiding error + missedCleavages test

feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4)

ypriverol and others added 30 commits April 13, 2026 07:51

Merge pull request #5 from bigbio/feature/159-ms-level-filtering

65c2592

feat(MSGFPlus#159): add -msLevel parameter for MS level filtering

Merge pull request #4 from bigbio/fix/157-mzid-missing-psm-scores

5f0ced1

fix(MSGFPlus#157): preserve PSM scores when DeNovoScore is below threshold

Merge pull request #3 from bigbio/fix/test-infrastructure

4f74816

fix: enable test suite and fix broken build dependencies

Merge pull request #8 from bigbio/feature/ci-readme-cleanup

3909fe8

Add parameter docs to README and CI/CD workflow

Merge pull request #7 from bigbio/refactor/dead-code-removal

e955869

Remove dead code: 150 unused classes, -22K lines

Update src/main/java/edu/ucsd/msjava/mzml/StaxMzMLParser.java

55daec4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: update TestParsers after dead-code removal rebase

3eb2966

Replace ConvertToMgf-based tests (class removed in PR #7) with StaxMzMLParser and SpectraAccessor mzML parsing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #9 from bigbio/feature/native-tsv-output

1f00a2b

feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines

Merge pull request #6 from bigbio/feature/stax-mzml-reader

d4c1f9c

perf: replace jmzml JAXB parser with StAX-based mzML reader

chore: align benchmark naming and mzXML messaging

b3f2e98

Use uppercase PXD001819 naming in workflow-visible labels/artifacts and update README to state mzXML is not available in this fork. Made-with: Cursor

docs: drop PXD001819 plan file; point READMEs at CI docs

ea0de94

Made-with: Cursor

Merge pull request #13 from bigbio/claude/review-msgfplus-pr-12-YfoTI

3c47109

fix(benchmark): harden PXD001819 scaffold per review feedback

Merge pull request #12 from bigbio/benchmark

032d088

benchmark

ypriverol and others added 14 commits April 16, 2026 23:21

Merge pull request #15 from bigbio/feat/primitives-optimization

e597230

perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync

Merge pull request #16 from bigbio/perf/precompute-log-scores

9cdae16

perf(scorer): precompute log scores in NewRankScorer

Merge pull request #17 from bigbio/chore/reliability-quick-wins

fb76029

chore(reliability): actionable centroiding error + missedCleavages test

Merge pull request #18 from bigbio/feat/logger-and-run-manifest

1bd9ff2

feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First bigbio release of msgf+#14

First bigbio release of msgf+#14
ypriverol wants to merge 45 commits intomasterfrom
dev

ypriverol commented Apr 16, 2026

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ypriverol commented Apr 16, 2026

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Apr 16, 2026 •

edited

Loading