Add pdf_oxide to Parsers, OCR and extraction by yfedoseev · Pull Request #22 · OneOffTech/awesome-pdf

yfedoseev · 2026-05-30T23:23:40Z

Adds pdf_oxide to the Parsers, OCR and extraction section, following the README format and the awesome-lint rules (verified locally with npx awesome-lint; uses "WebAssembly" per the spell-check rule).

pdf_oxide - A fast Rust PDF library for text and image extraction, markdown conversion, and structured extraction, with bindings for Python, Go, JS/TS, .NET, Java, PHP, Ruby, and WebAssembly, plus a CLI and MCP server.

MIT-licensed Rust core. Happy to move it or adjust wording to fit your conventions — thanks for maintaining the list!

avvertix

Ehi @yfedoseev thanks for sharing. I'm suggesting a small change to keep the description short. If you agree with my reformulation just accept the changes, otherwise please provide a shortened description citing the main feature.

avvertix · 2026-06-02T15:19:07Z

 - [CatchTheTornado/pdf-extract-api](https://github.com/CatchTheTornado/pdf-extract-api) - Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown.
 - [climatepolicyradar/navigator-document-parser](https://github.com/climatepolicyradar/navigator-document-parser) - Parsing PDFs and websites containing laws and policies.
 - [Iteration Layer](https://iterationlayer.com) - An AI-powered API that extracts structured data from PDFs, images, DOCX, and text files.
+- [pdf_oxide](https://github.com/yfedoseev/pdf_oxide) - A fast Rust PDF library for text and image extraction, markdown conversion, and structured extraction, with bindings for Python, Go, JS/TS, .NET, Java, PHP, Ruby, and WebAssembly, plus a CLI and MCP server.


Suggested change

- [pdf_oxide](https://github.com/yfedoseev/pdf_oxide) - A fast Rust PDF library for text and image extraction, markdown conversion, and structured extraction, with bindings for Python, Go, JS/TS, .NET, Java, PHP, Ruby, and WebAssembly, plus a CLI and MCP server.

- [PDF Oxide](https://github.com/yfedoseev/pdf_oxide) - Rust PDF library and CLI for text and image extraction and markdown conversion with bindings for Python, Go, JS, .NET, Java, PHP, Ruby, and MCP server.

yfedoseev · 2026-06-04T21:04:51Z

Thanks @avvertix! I've shortened it. Kept it almost exactly as your suggestion — just retained WASM (it's a first-class binding) and moved the MCP server out of the bindings list so the list reads as pure language bindings:

- [pdf_oxide](https://github.com/yfedoseev/pdf_oxide) - Fast Rust PDF library and CLI for text and image extraction and markdown conversion, with bindings for Python, Go, JS, .NET, Java, PHP, Ruby, and WASM.

Pushed in f4b06d2. Happy to trim further if you'd like it even shorter.

# Conflicts: # README.md

Add pdf_oxide to Parsers, OCR and extraction

ba1bb9d

avvertix reviewed Jun 2, 2026

View reviewed changes

Shorten pdf_oxide description per review feedback

f4b06d2

Merge remote-tracking branch 'origin/main' into add-pdf-oxide

653cc3a

# Conflicts: # README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pdf_oxide to Parsers, OCR and extraction#22

Add pdf_oxide to Parsers, OCR and extraction#22
yfedoseev wants to merge 3 commits into
OneOffTech:mainfrom
yfedoseev:add-pdf-oxide

yfedoseev commented May 30, 2026

Uh oh!

avvertix left a comment

Uh oh!

avvertix Jun 2, 2026

Uh oh!

yfedoseev commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- [pdf_oxide](https://github.com/yfedoseev/pdf_oxide) - A fast Rust PDF library for text and image extraction, markdown conversion, and structured extraction, with bindings for Python, Go, JS/TS, .NET, Java, PHP, Ruby, and WebAssembly, plus a CLI and MCP server.
	- [PDF Oxide](https://github.com/yfedoseev/pdf_oxide) - Rust PDF library and CLI for text and image extraction and markdown conversion with bindings for Python, Go, JS, .NET, Java, PHP, Ruby, and MCP server.

Conversation

yfedoseev commented May 30, 2026

Uh oh!

avvertix left a comment

Choose a reason for hiding this comment

Uh oh!

avvertix Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

yfedoseev commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants