This is the official repository for MathReader, an advanced TTS document reader for academic mathematical documents.
Demo page: https://hyeonsieun.github.io/MathReader_demo/
The experimental code and test dataset developed for our research can be found here.
-
Install Nougat and NVIDIA NeMo and transformers library in your development environment.
-
You can also set up the environment using the following code through the mathreader_environment.yml file
:
conda env create -f ./mathreader_environment.yml
-
Create a folder named 'test_audio' in the same location as MathReader.py.
-
Modify line 102 in MathReader.py (Write the path of the PDF file you want to perform OCR on.).
-
Run
python MathReader.pyin the terminal.
@INPROCEEDINGS{HyeonICASSP25,
author={Hyeon, Sieun and Jung, Kyudan and Kim, Nam-Joon and Ryu, Hyun Gon and Do, Jaeyoung},
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={MathReader : Text-to-Speech for Mathematical Documents},
year={2025},
pages={1-5},
keywords={Text recognition;Error analysis;Pipelines;Optical character recognition;Graphics processing units;Signal processing;Real-time systems;Mathematical models;Text to speech;Speech processing;OCR;T5;TTS;document reader;LaTeX},
doi={10.1109/ICASSP49660.2025.10890531}
}