Extract PDF text in your browser with LiteParse for the web

What it is
LiteParse is an open-source PDF text extractor from LlamaIndex that uses spatial parsing instead of AI models. Picture a tool that reads PDFs like a human would — tracking text position, reading order, columns — but without needing OCR or LLMs. Willison's browser port takes the same Node.js libraries and compiles them to run in-browser via WebAssembly.
Why it matters
This solves the privacy problem with PDF extraction tools. Most services require uploading your documents to their servers. With browser-based LiteParse, everything stays local — useful for confidential documents, compliance work, or just avoiding data leaks. If you're building RAG systems or need to pipe PDF content into LLMs, you now have a client-side preprocessing step.
Key details
- •Original LiteParse is a Node.js CLI tool from LlamaIndex's open source stack
- •Browser version uses the same underlying libraries, compiled to WebAssembly
- •No AI models involved — uses spatial text parsing to maintain document structure
- •Runs entirely client-side, so PDFs never leave your browser
- •Open source and available now
Worth watching
Video data provided by YouTube. Videos link to youtube.com.
