The Scenario: 30 Scanned Lease Agreements
Property management company: 30 scanned lease contracts (PDF), Chinese to English for foreign landlords. ~180 pages, mixed 150-300 DPI scans. Complications: stamp overlaps on signatures, handwritten notes ("renewed", "confirmed").
Challenge Analysis
DPI variation: 150 DPI scans ~80-85% OCR accuracy, 300 DPI ~97-98%. Same batch, inconsistent results.
Stamps: Usually over signatures — the most critical text. Recognition drops significantly.
Handwritten notes: Lower OCR accuracy than print to begin with, compounded by sloppy handwriting.
Results
| DPI | Print OCR | Under-stamp OCR | Handwriting OCR | Translation Accuracy |
|---|---|---|---|---|
| 300+ | 97-98% | 70-85% | 65-75% | 92-95% |
| 200 | 92-95% | 50-65% | 50-60% | 87-90% |
| 150 | 80-85% | 40-55% | 35-45% | 78-83% |
What Can Go Wrong
Tilted scans: Skewed documents drop OCR accuracy ~30%. Some pages need manual deskewing first.
Handwriting intruding into body text: Notes sometimes misread as contract clauses, creating post-translation ambiguity.
Batch memory: 30 files at once strains browsers. Process in batches of 10.