Back to Blog
Case Study 8 min

Scanned PDF Translation: From OCR to Bilingual Output

Scanned PDF Translation: From OCR to Bilingual Output

The Scenario: 30 Scanned Lease Agreements

Property management company: 30 scanned lease contracts (PDF), Chinese to English for foreign landlords. ~180 pages, mixed 150-300 DPI scans. Complications: stamp overlaps on signatures, handwritten notes ("renewed", "confirmed").

Challenge Analysis

DPI variation: 150 DPI scans ~80-85% OCR accuracy, 300 DPI ~97-98%. Same batch, inconsistent results.

Stamps: Usually over signatures — the most critical text. Recognition drops significantly.

Handwritten notes: Lower OCR accuracy than print to begin with, compounded by sloppy handwriting.

Results

DPIPrint OCRUnder-stamp OCRHandwriting OCRTranslation Accuracy
300+97-98%70-85%65-75%92-95%
20092-95%50-65%50-60%87-90%
15080-85%40-55%35-45%78-83%

What Can Go Wrong

Tilted scans: Skewed documents drop OCR accuracy ~30%. Some pages need manual deskewing first.

Handwriting intruding into body text: Notes sometimes misread as contract clauses, creating post-translation ambiguity.

Batch memory: 30 files at once strains browsers. Process in batches of 10.

Want to try AI-powered PDF translation?

Start Translating Free