OCR — Extract Text from Scans
Powered by Tesseract.js — runs entirely in your browser. Works on scanned PDFs, photographs, screenshots, and images. No data leaves your device.
ℹ️ First run: Tesseract.js (~4 MB) loads from CDN and is then cached locally in your browser. All subsequent uses are fully offline.
Drop a scanned PDF or image
PDF · JPG · PNG · TIFF · BMP · WebP · GIF
Preparing…
Extracted Text
All pages
Offline Cache Manager
Download Tesseract engine and language data to your browser's local cache. Once cached, OCR works fully offline — no internet required.
The Tesseract.js engine (~700 KB script + ~4 MB per language) is downloaded once and stored in your browser. It persists across browser restarts until you clear browser data or remove it here.
Engine Status
Language Packs Cached
—