FASTA preflight QC for modern bioinformatics pipelines.
FastaGuard checks assembly FASTA files before QUAST, BUSCO, BlobToolKit, CheckM, annotation, or other expensive downstream steps. It validates structure, flags obvious FASTA-level problems, and writes stable reports for humans, workflow engines, and future tool agents.
Run it first when you need to know:
- is this FASTA file structurally valid?
- are identifiers, records, and sequence characters sane?
- are duplicate IDs, high-N content, gap runs, tiny contigs, or GC/length anomalies worth attention?
- can a workflow make a PASS/WARN/FAIL decision from machine-readable output?
FastaGuard is not a replacement for QUAST, BUSCO, BlobToolKit, CheckM, FastQC, seqkit, or MultiQC. It is the earlier preflight and triage layer.
Before QUAST. Before BUSCO. Before BlobToolKit. Before annotation.
Run FastaGuard first.
| Channel | Status |
|---|---|
| GitHub release | v0.3.0 is live with Linux and macOS binaries |
| Bioconda | v0.3.0 is live for Linux and macOS x86_64/ARM64 |
| BioContainers | v0.3.0 is live as a pinned workflow image |
| Source build | v0.3.0 can be built from the Git tag |
Recommended bioinformatics install:
mamba install -c conda-forge -c bioconda fastaguard=0.3.0Containerized workflow install:
docker pull quay.io/biocontainers/fastaguard:0.3.0--hfa8f182_0Run through BioContainers:
docker run --rm quay.io/biocontainers/fastaguard:0.3.0--hfa8f182_0 fastaguard --versionGitHub release binary for Linux x86_64:
curl -L -O https://github.com/ehsanestaji/FastaGuard/releases/download/v0.3.0/fastaguard-v0.3.0-x86_64-unknown-linux-gnu.tar.gz
tar -xzf fastaguard-v0.3.0-x86_64-unknown-linux-gnu.tar.gz
./fastaguard-v0.3.0-x86_64-unknown-linux-gnu/fastaguard --versionGitHub release binary for macOS Apple Silicon:
curl -L -O https://github.com/ehsanestaji/FastaGuard/releases/download/v0.3.0/fastaguard-v0.3.0-aarch64-apple-darwin.tar.gz
tar -xzf fastaguard-v0.3.0-aarch64-apple-darwin.tar.gz
./fastaguard-v0.3.0-aarch64-apple-darwin/fastaguard --versionBuild from the released Git tag:
cargo install --git https://github.com/ehsanestaji/FastaGuard --tag v0.3.0
fastaguard --versionVerify any installed CLI:
fastaguard --version
fastaguard --schemaLocal development build:
cargo build --release --lockedThe --gate pipeline examples below require FastaGuard v0.3.0 or newer.
Run the assembly preflight check:
fastaguard sample.fa \
--profile assembly \
--out fastaguard_report.html \
--json fastaguard.json \
--tsv fastaguard.tsv \
--multiqc fastaguard_mqc.jsonPipeline gate example:
fastaguard sample.fa --profile assembly --gate pipelineThe pipeline gate is the v0.3 assembly preset for workflow stop/go decisions.
It fails on duplicate IDs, invalid characters, invalid FASTA structure, and
high-N content. GC and length outliers remain advisory by default because they
are routing signals, not proof of contamination or misassembly. To make an
advisory finding block a pipeline, add it explicitly with --fail-on.
Inspect the machine-readable contract:
fastaguard --schema
fastaguard --finding-catalog
fastaguard --explain-finding high_n_rateBuild and run the local Docker image:
docker build -t fastaguard:local .
docker run --rm -v "$PWD:/data" fastaguard:local /data/sample.fa \
--profile assembly \
--out /data/fastaguard_report.html \
--json /data/fastaguard.json \
--tsv /data/fastaguard.tsv \
--multiqc /data/fastaguard_mqc.jsonPublished BioContainers provides the v0.3 image for workflow engines:
docker pull quay.io/biocontainers/fastaguard:0.3.0--hfa8f182_0Exit codes:
0 = pass
1 = warnings above configured threshold
2 = hard QC failure
3 = invalid input / tool error
FASTA files are everywhere, but FASTA QC is fragmented across ad hoc scripts, seqkit stats, assembly QC tools, completeness tools, contamination workflows, and pipeline-specific checks. Each is useful, but none is the simple default first command for:
Is this FASTA file valid, sane, interpretable, and ready for downstream tools?
FastaGuard fills that gap:
FastaGuard is a fast, explainable FASTA QC tool that validates assembly FASTA files, detects structural and composition red flags, and produces pipeline-ready reports before expensive downstream analysis.
FastaGuard is assembly-first.
fastaguard sample.fa \
--profile assembly \
--gate pipeline \
--out fastaguard_report.html \
--json fastaguard.json \
--tsv fastaguard.tsv \
--multiqc fastaguard_mqc.jsonThe MVP focuses on:
- FASTA validity
- invalid FASTA structure reports with explainable FAIL verdicts
- duplicate IDs
- duplicate sequences
- invalid nucleotide/IUPAC characters
- empty records
- core assembly stats
- N50, N90, L50, L90
- GC, AT, N, and ambiguity rates
- high-N scaffolds
- gap runs
- suspicious tiny contigs
- explainable PASS / WARN / FAIL verdicts
- machine-readable summaries, actions, scope, and provenance
- stable JSON, TSV, HTML, and MultiQC-compatible outputs
- length histogram and GC-vs-length plot data in JSON and HTML
v0.2 expands the assembly preflight layer with:
- composition outliers
- richer provenance, taxonomy context, and routing hints
- hardened MultiQC and pipeline adoption material
v0.3 adds the assembly gate contract:
--gate pipelinefor default workflow blocking behaviorgate.blocking_findingsfor machine stop/go decisions- checksum provenance with
provenance.input_sha256 - explicit advisory findings for evidence that should route follow-up QC rather than stop a pipeline by default
FastaGuard should recommend deeper tools when they are appropriate:
- QUAST for assembly quality evaluation
- BUSCO for biological completeness
- BlobToolKit for contamination and cobiont exploration
- CheckM for microbial genome completeness and contamination
- seqkit for ad hoc sequence operations
The strategic wedge is earlier:
FastaGuard catches FASTA-level assembly problems before expensive assembly QC.
- Example reports
- Product thesis
- Vision plan
- MVP spec
- Output contract
- Tool landscape
- Adoption plan
- LLM and tooling vision
- Benchmarking
- v0.2 evidence pack
- v0.3 evidence workflow
- Packaging
- v0.3.0 release notes
- v0.2.0 release notes
- v0.1.1 release notes
- v0.1.0 release notes
- Roadmap
- First-release design
v0.3.0 is published on GitHub with Linux and macOS release binaries. It adds the assembly gate contract, checksum provenance, and evidence workflow.
Bioconda serves v0.3.0 for linux-64, linux-aarch64, osx-64, and
osx-arm64. BioContainers publishes the pinned v0.3 workflow image
quay.io/biocontainers/fastaguard:0.3.0--hfa8f182_0.