Conversation
…Java 17 - Change skipTests from true to false so mvn test actually runs - Update maven-compiler-plugin source/target from 1.8 to 17 (matches runtime) - Add missing compile dependencies: jmzml 1.7.11, fastutil 8.5.12, slf4j-api 1.7.36, logback-classic 1.2.12, commons-io 2.15.1 (master code references these classes but they were not declared) - @ignore TestMzML test that requires Windows-specific DMS files Result: 120 tests run, 53 active, 67 skipped, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…shold In MZIdentMLGen.addSpectrumIdentificationResults(), change `break` to `continue` when a match has DeNovoScore below the minimum threshold. The `break` was incorrectly stopping emission of all subsequent matches for that spectrum, silently dropping valid PSMs from the mzid output. Also add null safety check for spectrum index lookup — if a spectrum index is not found in the spectrum file, log a warning and skip instead of throwing a NullPointerException. Add TestMZIdentMLGen with two integration tests: - testMzidScoreCompleteness: runs MSGF+ search, verifies every SII has all 4 score CVParams (RawScore, DeNovoScore, SpecEValue, EValue) - testMzidStructuralValidity: verifies output mzid has required mzIdentML structure elements Closes MSGFPlus#157 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add new -msLevel CLI parameter to filter spectra by MS level. Accepts single value (e.g., -msLevel 2) or comma-separated range (e.g., -msLevel 2,3). Default is 2 (MS2 only). Changes: - ParamManager: add MS_LEVEL enum and registration - IntRangeParameter: enable single-value parsing, fix typo - SearchParams: add minMSLevel/maxMSLevel fields - SpecKey: filter spectra by MS level in getSpecKeyList() - SpectraAccessor: add setMSLevelRange(), wire to parsers - MzMLAdapter/MzXMLSpectraMap: fix maxMSLevel to be inclusive - MSGFPlus/MSGFDB/MSGFDBLib: wire MS level parameters - pom.xml: remove fastutil shade filter (jmzml 1.7.11 needs full fastutil) Tests: TestIntRangeParameter (9 tests), TestMSLevelFiltering (6 tests) Benchmark (TMT 1.1GB, TDA): Baseline: 1245s, 6654 PSMs@1%FDR -msLevel 2: 957s (-23%), 6936 PSMs@1%FDR (+4.2%) Closes MSGFPlus#159 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(MSGFPlus#159): add -msLevel parameter for MS level filtering
fix(MSGFPlus#157): preserve PSM scores when DeNovoScore is below threshold
fix: enable test suite and fix broken build dependencies
Remove standalone scripts, legacy tools, and unused classes that are not referenced by the core MSGF+ search pipeline, reducing codebase by ~22,000 lines. Deleted entire packages: - ims/ (9 files) — legacy IMS utilities - ipa/ (5 files) — unused isotope pattern analysis - msgf2d/ (8 files) — abandoned 2D scoring experiment - msdictionary/ (7 files) — unused genome dictionary tool - mstag/ (3 files) — unused sequence tagging - scripts/ (6 files) — standalone CLI utilities - msutil/test/ (3 files) — misplaced test classes - msgf/test/ (2 files) — legacy test stubs - msgf/analysis/ (1 file) — unused ROC generator Cleaned mixed packages: - misc/: removed 59 standalone scripts, kept 5 core utilities - msgf/: removed 6 unused graph/scoring classes - msutil/: removed 9 unused filter/annotation classes - msdbsearch/: removed 4 standalone DB tools - parser/: removed 9 legacy format parsers (InsPecT, Mascot, etc.) - ui/: removed 6 legacy entry points (MSGF, MSGFLib, etc.) - mzid/: removed 1 unused adapter stub - msscorer/: removed 1 unused stats class - suffixarray/: removed 1 unused mass array class Also removed dead test methods and cleaned dangling imports. Tests: 119 run, 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite README.md: - Full parameter reference tables covering all 30+ flags organized by category (core search, fragmentation, enzyme, filtering, etc.) - Quick start examples for basic and TMT searches - Modification file format documentation with examples - Build-from-source instructions - Updated requirements to Java 17+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add parameter docs to README and CI/CD workflow
Remove dead code: 150 unused classes, -22K lines
Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the jmzml JAXB-based MzMLUnmarshaller with a lightweight StAX streaming parser that extracts only the 11 fields MSGF+ needs. The new parser builds a spectrum index in a single pass, then preloads all spectra into memory on first random access, eliminating repeated XML parsing during the search phase. Benchmark (TMT 1.1GB mzML, target-decoy, 4 threads): - Wall time: 957s -> 853s (-10.9%) - PSMs at 1% FDR: 6,936 (unchanged) - Score completeness: 100% (unchanged) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port - Remove jmzml (JAXB-based mzML parser) dependency from pom.xml - Delete old jmzml-dependent classes: MzMLAdapter, MzMLSpectraMap, MzMLSpectraIterator, SpectrumConverter - Add referenceableParamGroupRef resolution to StaxMzMLParser: builds a map of param groups during index pass, resolves refs during spectrum parsing (critical for files that define polarity, MS level, etc. in referenceable groups) - Move turnOffLogs() utility to StaxMzMLParser, update all callers - Keep fastutil dependency (needed by jmzidml at runtime) JAR size reduced from 39.5MB to 38MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jmzReader (uk.ac.ebi.pride.tools:jmzreader:2.0.6) had zero imports anywhere in the codebase — a dead dependency from earlier development. All spectrum file format parsing uses custom implementations: mzML (StaxMzMLParser), mzXML (embedded jrap/stax), MGF/MS2/PKL (custom parsers). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove mzXML file format support entirely: - Delete embedded jrap/stax library (20 files, ~5,800 lines) - Delete MzXMLSpectraMap, MzXMLSpectraIterator, MzXMLToMgfConverter - Delete MzXMLToMgf utility and mzXML test resources (38MB) - Remove MZXML from SpecFileFormat enum, SpectraAccessor, ParamManager - Update misc/scripts/ui classes to remove mzXML code paths mzXML is a legacy format superseded by mzML. Users with mzXML files can convert to mzML using msconvert (ProteoWizard). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- StaxMzMLParser: use ConcurrentHashMap for thread-safe spectrum cache, fix class-level doc (preload-all, not bounded LRU), check index before preloading, propagate exceptions instead of returning null - StaxMzMLSpectraIterator: throw NoSuchElementException when exhausted - SpectraAccessor: throw exception instead of System.exit(-1), validate specFormat is non-null in constructor - SelectSpectra: update stale .mzXML reference to .mzML - pom.xml: fix duplicate <manifest>, remove stale comments, note fastutil is required by jmzidentml at runtime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ConvertToMgf-based tests (class removed in PR #7) with StaxMzMLParser and SpectraAccessor mzML parsing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines
perf: replace jmzml JAXB parser with StAX-based mzML reader
* chore: add CI/release packaging and benchmark scaffolding Split infra and repository maintenance updates into a dedicated reviewable change set, including workflow automation, Docker packaging, benchmark scripts/docs, and project documentation updates. Exclude large local benchmark artifacts and keep this PR focused on non-hot-path code organization and release hygiene. Made-with: Cursor * chore: keep benchmark folder local-only Remove benchmark scripts/docs from this branch and ignore the entire benchmark directory so local benchmarking assets do not appear in review PRs. Made-with: Cursor * docs: keep single canonical primitives plan Fold memory-reduction guidance into the balanced primitives plan and remove the old duplicate plan file so review and maintenance use one canonical document. Made-with: Cursor * chore: narrow PR1 plans to scope-only docs Remove unrelated strategy and optimization plan documents from PR1 so this branch stays focused on infra/packaging cleanup. Keep only the plans index file in this PR. Made-with: Cursor * chore: remove legacy ZippedReleases folder Delete the obsolete Windows release helper scripts and reference files under ZippedReleases from the repository. Made-with: Cursor * chore: remove legacy extlib dependency jar Delete the obsolete jrap/stax legacy jar under extlib as part of repository cleanup. Made-with: Cursor * fix: address copilot review feedback for PR11 Align docs with actual supported legacy formats, update release pipeline to build from tag version with tests, and fix Docker build JDK requirement. Made-with: Cursor * chore: minor packaging/docs hygiene for PR1 Normalize ignore files, shrink Docker build context, align agent README with dev/CI, and clarify release workflow step naming. Made-with: Cursor * docs: trim examples folder to small referenced artifacts Remove duplicate Tryp_Pig_Bov DB/index copies (tests use src/test/resources), drop large unlinked Excel/PNG teaching files, and add docs/examples/README.md so the directory purpose is obvious. Link the index from the main README. Made-with: Cursor * chore: remove IntelliJ IDEA tips screenshots from docs Made-with: Cursor * docs: replace legacy HTML manuals with Markdown Convert docs/*.html to GitHub-flavored Markdown (pandoc), fix internal links, add docs/README.md as the documentation index, and remove unused style.css. Made-with: Cursor * docs: strip leftover HTML span wrappers from converted Markdown Made-with: Cursor
Add a workflow-dispatch benchmark pipeline on a fixed self-hosted runner profile, with public-data download, metrics emission, and baseline TSV comparison under benchmark/ci/PXD001819 for future dataset expansion. Made-with: Cursor
Use uppercase PXD001819 naming in workflow-visible labels/artifacts and update README to state mzXML is not available in this fork. Made-with: Cursor
Made-with: Cursor
- run_ci.sh: count only opening <SpectrumIdentificationItem> tags for sii_count (prior substring match double-counted closing tags and picked up SpectrumIdentificationItemRef) - run_ci.sh: always emit peak_rss_kb and cpu_percent (NA when GNU time does not expose them) so metrics file format is consistent - compare_metrics.py: support an `optional` column; optional missing/NA metrics warn instead of failing CI - baseline.tsv: add optional column, mark peak_rss_kb optional, fix ubuntu-latest note to reference the self-hosted runner, widen sii_count floor to match the de-duplicated count - README pointers: update stale references to a non-existent benchmark/run_pxd001819_benchmark.sh script - benchmark/README.md: describe the actual committed CI scaffold instead of an uncommitted local harness layout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- extract_metrics.py: stream-parse mzIdentML with ElementTree.iterparse so SII counting and PSM 1% FDR counting no longer rely on line-shaped regex matches over XML - run_ci.sh: use a bash array for SEARCH_ARGS (safe against future flags with spaces), atomic .part downloads, validate cached gzip, default MSGFPLUS_THREADS to 8 to match the workflow, drop the always-zero java_exit metric, and emit integer wall_time_sec - workflow: pin Python via actions/setup-python@v5 so self-hosted runners have a known 3.11 interpreter for the helper scripts - compare_metrics.py: add test_compare_metrics.py covering in-range pass, out-of-range fail, missing required/optional, NA, non-numeric, and empty-range rows (7 tests, all passing) - .gitignore: drop redundant benchmark/** patterns (already covered by benchmark/* + ci/ allowlist); add __pycache__/ and *.pyc - docs: describe new helper and test scripts in both READMEs
fix(benchmark): harden PXD001819 scaffold per review feedback
benchmark
|
Important Review skippedToo many files! This PR contains 282 files, which is 132 over the limit of 150. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (18)
📒 Files selected for processing (282)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ives-gf Replaces FlexAminoAcidGraph + GeneratingFunction with CSR-based PrimitiveAminoAcidGraph + flat-array PrimitiveGeneratingFunction in DBScanner.computeSpecEValue hot path. Ported without the experimental remainingUses / eager ScoreDist release (proven 5.9% CPU regression for 3.3% RSS gain). Legacy FlexAminoAcidGraph and GeneratingFunction remain for other callers. Parity verified by new TestPrimitiveRegression (ported from feat/primitives-gf) and existing test suite. Phase 1 of 3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites PrimitiveGeneratingFunctionGroup as a running merger. Each per-mass-index GF is computed, merged into the aggregate via addProbDist, and released before the next mass index is built, so peak memory stays at one graph + one GF regardless of tolerance-window width. Math is identical because addProbDist with scoreDiff=0 and aaProb=1f is a linear sum; the running aggregate transparently widens its bounds if a later GF has a wider score range. This addresses the Phase-1 Astral regression (127% slower, +7.3% RSS) caused by concurrent accumulation of distByNode across all mass indices. TMT parity preserved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two user-facing pages to close documentation gaps surfaced in the landscape review: - Troubleshooting.md: centroiding (MSConvert peakPicking filter), XML prolog / encoding errors, FASTA size limits and split-and-merge workaround, -Xmx sizing table, -tasks tuning for OOM, thread-cap behaviour, and the OpenMS TOPPAS adapter workaround. - IsobaricLabeling.md: fixed-mod recipes for TMT-6/10/11, TMTpro-16/18, iTRAQ-4/8, the correct -protocol flag per label, and a full worked Mods.txt example. Addresses the recurring "does MS-GF+ support TMT-16plex?" question (issue MSGFPlus#82) which is a docs, not a code, gap. Also wires the two pages into docs/README.md's table of contents. The .gitignore update excludes a local-only working research note (.claude/investigations/msgfplus_research_report.md) from the repo, consistent with the existing "benchmark harness is local-only" convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Astral profiling revealed that synchronized Hashtable contention in the shared scorer tables dominated runtime: Hashtable.get alone took 23% of CPU and the surrounding monitor machinery (TrySpin, SafeFetch, ObjectSynchronizer) added another ~20% — 43% of CPU was sync overhead with 4 worker threads serializing through per-table monitors. NewRankScorer tables (fragOFFTable, insignificantFragOFFTable, rankDistTable, ionErrDistTable, noiseErrDistTable, ionExistenceTable) are populated once in readFromFile and read-only during search, so plain HashMap is safe. NewScorerFactory.scorerTable is a mutable cache with possible concurrent writes before warmup, so it uses ConcurrentHashMap. ScoringParameterGenerator(s) use HashMap too to match the NewRankScorer field types (build-time, not search path). Benchmarks (baseline = dev, branch = feat/primitives-optimization): PXD001819 LFQ: 176.5s -> 109.4s (-38.0%), 2254 -> 2472 MB (+9.7%) TMT: 644.3s -> 265.6s (-58.8%), 2872 -> 3125 MB (+8.8%) Astral: 2155.9s -> 723.3s (-66.5%), 6775 -> 7627 MB (+12.6%) PSM@1%FDR and SII counts match exactly on all three datasets. RSS grows modestly because worker threads now actually run in parallel (no more monitor serialization), so 4 concurrent graph/GF states are alive instead of effectively one. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
perf(msgf): CSR graph + streaming GF merge + drop Hashtable sync
Native Math.log (libmLog) was 5.46% of CPU in the post-PR#15 Astral profile. The call sites in NewRankScorer.getErrorScore and getNodeScore / getMissingIonScore compute log(x/y) over frequency arrays that are immutable after scorer load; the denominator scale factor min(ionType.getCharge(), numSegments) is also load-time constant. Cache the resulting float values once at the end of readFromInputStream and replace the runtime Math.log calls with direct array indexing. Scoring results are bit-identical: same expressions, same operand ordering, same float rounding; the only difference is that the cast to float happens once per cell at load instead of per call. Both hot-path methods keep a fallback to the original computation so legacy callers that populate the tables without going through readFromInputStream still work. Benchmarks on the current machine state (baseline = dev HEAD jar, same run, same thermal state): PXD001819 LFQ: 122.7s -> 110.4s (-10.0%), 2410 -> 2292 MB (-4.9%) TMT: 295.7s -> 277.9s ( -6.0%), 2793 -> 2818 MB (+0.9%) Astral: 1002.9s -> 883.5s (-12.0%), 7707 -> 7351 MB (-4.6%) PSM@1%FDR and SII counts match exactly on all three datasets. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
perf(scorer): precompute log scores in NewRankScorer
Two small quick-wins from the April 2026 landscape review (research
report Q1 and Q3, msgfplus_research_report.md §0.1):
Q1 - "Skip spectrum since it is not centroided" now tells the user how
to fix the input:
- profile spectra: "Re-run MSConvert with --filter \"peakPicking true 1-\"
to centroid the spectra."
- dense centroided: "Pass -allowDenseCentroidedPeaks 1 if the spectrum
is already centroided."
This is the most common onboarding failure mode surfaced in issue MSGFPlus#116
and in user support threads; the previous message only said that a
spectrum was skipped, leaving users to guess why.
Q3 - New AnalysisProtocolCollectionGenTest locks in the fix from issue
MSGFPlus#72. If the mzid Enzyme/@missedCleavages attribute ever stops reflecting
the user's -maxMissedCleavages (including the 0 and -1/"no limit"
sentinel values), the test fails. Three cases covered: 2, 0, -1.
No production-code behaviour change for the Q3 path; Q1 is an error
message tweak only. 144 tests pass (was 141).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rser quantms and other nf-core / SDRF-driven pipelines use ThermoRawFileParser (TRFP) for raw-to-mzML conversion, not MSConvert. The previous inline error message mentioned only MSConvert, which is the wrong tool for a large and growing user base. - SpecKey.java: per-spectrum hint now covers both paths in one short line: "ThermoRawFileParser centroids Thermo MS2 by default; MSConvert --filter \"peakPicking true 1-\"". - Troubleshooting.md: restructured the "Skip spectrum since it is not centroided" section into a per-tool fix list (TRFP, MSConvert, OpenMS) so users on any pipeline can map the error to their own conversion step. No production behaviour change; strings + docs only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The thread-count cap in MSGFPlus.runMSGFPlus and MSGFDB.runMSGFDB previously hardcoded "minimum 250 spectra per thread" (ui/MSGFPlus.java, ui/MSGFDB.java). On many-core hosts running small inputs (e.g. 20 cores, ~1,000 spectra) this capped the search at ~4 threads, surprising users. Rather than guess a new default, expose the divisor as -minSpectraPerThread (default 250, min 1). Power users can lower it to raise parallelism on small inputs; everyone else gets identical behaviour to before. Wired in both MSGFPlus and the deprecated MSGFDB entry points so behaviour stays consistent. Addresses issue MSGFPlus#52. Tests: TestMinSpectraPerThread covers default, override, zero-rejection, and MSGFDB registration. mvn -B verify: 145/145 tests pass, 57 skipped. Docs: Troubleshooting.md and MSGFPlus.md now show the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a lightweight static logger at edu.ucsd.msjava.misc.MSGFLogger with
info/debug/warn/error levels. Debug is gated on the existing -verbose 0/1
flag; warn/error go to stderr with [Warning]/[Error] prefixes. No external
dependencies (no slf4j/log4j) to keep the jar small.
Wires MSGFPlus.main() to call MSGFLogger.setVerbose(...) once after
parseParams, so the whole run inherits the CLI setting. Migrates the
top-level main() and the runMSGFPlus(ParamManager) dispatch loop:
- Error paths: System.err.println("[Error] ...") -> MSGFLogger.error(...)
- "Processing N spectra" (summary) -> info
- Per-file enumeration -> debug
- Per-file "Processing"/"Ignoring" banner -> info
- Per-file "Writing results to"/"Output... exists" detail -> debug
- "MS-GF+ complete" footer -> info
- Decoy-ratio mismatch errors -> MSGFLogger.error
Default behaviour (-verbose 0) is unchanged for all non-debug messages.
Running with -verbose 1 now exposes the per-file enumeration and the
per-file output/ignore details.
Intentionally narrow scope: the other ~260 System.out.println call sites
across DBScanner, ConcurrentMSGFPlus, BuildSA, etc. are unchanged. This
PR establishes the logger and wiring; case-by-case migration of those
sites can follow as they are touched.
Tests: TestMSGFLogger (7 tests) covers info-always, debug-gating, warn/
error stderr routing, format interpolation, and the isVerbose getter.
mvn -B verify: 152/152 tests pass, 57 skipped (same as before).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Writes <output.mzid>.manifest.json next to each mzIdentML output capturing the run context: MS-GF+ version and timestamp; Java version, vendor, OS; max heap and thread count; enzyme / instrument / activation / protocol; precursor tolerance, isotope-error range, length and charge bounds, missed-cleavage cap; spec-file and FASTA-file absolute paths with byte sizes; and the original CLI argv verbatim. Downstream pipelines (quantms, Galaxy-P, custom reanalysis scripts) can then verify or reproduce a search without re-parsing logs. Called from MSGFPlus.runMSGFPlus after each successful per-file search. Failures to write are MSGFLogger.warn()-logged and never abort the search — manifests are advisory metadata, not output. JSON is hand-rolled (stable key order, UTF-8, 2-space indent) so no new dependency is pulled into the shaded jar. Tests: TestRunManifestWriter covers required identity fields, echoed SearchParams values, argv preservation, null-argv tolerance, and end-to-end sidecar write/read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
chore(reliability): actionable centroiding error + missedCleavages test
feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.