Do scanned PDFs need OCR before translation?

Yes. Scanned PDFs are images — text must be OCR-extracted before it can be translated. PDFTranslate handles both steps automatically.

Can text under stamps be recognized?

At 300 DPI+: ~70-85% recognition under stamps. At 150 DPI: 40-60%. Lighter stamps covering less text perform better.

Will handwritten notes be translated?

Yes, but handwriting OCR accuracy is 60-75% to begin with, and translation accuracy compounds on that. Results are usable but not perfect.

Scanned PDF Translation: From OCR to Bilingual Output

The Scenario: 30 Scanned Lease Agreements

Property management company: 30 scanned lease contracts (PDF), Chinese to English for foreign landlords. ~180 pages, mixed 150-300 DPI scans. Complications: stamp overlaps on signatures, handwritten notes ("renewed", "confirmed").

Challenge Analysis

DPI variation: 150 DPI scans ~80-85% OCR accuracy, 300 DPI ~97-98%. Same batch, inconsistent results.

Stamps: Usually over signatures — the most critical text. Recognition drops significantly.

Handwritten notes: Lower OCR accuracy than print to begin with, compounded by sloppy handwriting.

Results

DPI	Print OCR	Under-stamp OCR	Handwriting OCR	Translation Accuracy
300+	97-98%	70-85%	65-75%	92-95%
200	92-95%	50-65%	50-60%	87-90%
150	80-85%	40-55%	35-45%	78-83%

What Can Go Wrong

Tilted scans: Skewed documents drop OCR accuracy ~30%. Some pages need manual deskewing first.

Handwriting intruding into body text: Notes sometimes misread as contract clauses, creating post-translation ambiguity.

Batch memory: 30 files at once strains browsers. Process in batches of 10.

Scanned PDF Translation: From OCR to Bilingual Output

The Scenario: 30 Scanned Lease Agreements

Challenge Analysis

Results

What Can Go Wrong

Want to try AI-powered PDF translation?