Same Chinese Economics Paper: Three Versions Compared
A 15-page paper on the digital economy from the Journal of Finance and Economics (2024, Issue 3). Three versions: GLM-4.5-Air, DeepSeek V3.2, and a human translator with economics background. Five scoring dimensions.
Results
| Dimension | GLM-4.5-Air | DeepSeek V3.2 | Human |
|---|---|---|---|
| Terminology consistency | 8.0 | 7.2 | 9.5 |
| ZH to EN (academic) | 7.8 | 7.5 | 9.2 |
| EN to ZH | 7.5 | 8.0 | 9.3 |
| Cultural metaphor handling | 7.0 | 6.5 | 8.5 |
| Total | 36.5 | 36.2 | 45.0 |
Surprise Finding
DeepSeek V3.2 outperformed GLM-4.5-Air on EN→ZH. Possibly due to DeepSeek higher EN:ZH ratio in pretraining data. But GLM leads on ZH→EN — results show each engine has directional strengths.
Human translators lead significantly on terminology and cultural metaphors — expected. The biggest gap: cultural metaphor. "内卷" → DeepSeek: "involution", GLM: "overcompetition", human: contextual choice between "excessive competition" or "involution" with annotation.