magazinelogo

Journal of Humanities, Arts and Social Science

ISSN Online: 2576-0548 Downloads: 1190399 Total View: 7907293
Frequency: monthly ISSN Print: 2576-0556 CODEN: JHASAY
Email: jhass@hillpublisher.com
Article Open Access http://dx.doi.org/10.26855/jhass.2025.08.028

Evaluating and Improving LLM-based Translation in Aviation Contexts

Weiwei Liu*, Chenyuan Yu, Yangxi Gong

School of Foreign Languages, Civil Aviation Flight University of China, Guanghan 618307, Sichuan, China.

*Corresponding author: Weiwei Liu

Funding: This work was supported by “the Fundamental Research Funds for the Central Universities” (25CAFUC03077).
Published: September 18,2025

Abstract

Large language model (LLM)-based translation has advanced rapidly, with machine translation and post-editing (MTPE) becoming the main norm. However, the application of LLM in translation of safety-critical domains such as aviation, where precision and reliability are non-negotiable, remains underexplored. This study evaluates the performance of four general-purpose LLMs–ChatGPT, DeepSeek, Kimi, and ERNIE Bot—in translating aviation-related texts from English into Chinese. Using a corpus of ten paired sentences drawn from ICAO documentation and flight manuals, the study combines quantitative evaluation with qualitative analysis. BLEU scores, for their widespread adoption, reproducibility, and suitability, are used. In addition, analysis is done with regard to terminological accuracy and the translation technique most frequently used in aviation texts. The findings suggest that LLMs can support aviation translation by generating quality drafts with most technical terms understood and translated correctly. However, limitations in terminological precision, zero-translation consistency, and stylistic fidelity underscore the need for domain-specific fine-tuning, glossary integration, and post-editing.

Keywords

LLM; Translation; Aviation; Post-editing

References

Doğru, G. (2024). Creating domain-specific translation memories for machine translation fine-tuning. Tradumàtica, (22), 1-30.

Elshin, D., Karpachev, N., Gruzdev, B., Golovanov, I., Ivanov, G., Antonov, A., ... & Denisov, K. (2024, November). From general LLM to translation: How we dramatically improve translation quality using human evaluation data for LLM finetuning. In Proceedings of the Ninth Conference on Machine Translation (pp. 247-252).

Giampieri, P. (2023). Is machine translation reliable in the legal field? A corpus-based critical comparative analysis for teaching ESP at tertiary level. ESP Today, 11(1), 119-137.

Gilchenko, R. A. (2005). The interrelation between the adequacy of aviation terms translation and safety of flights. Bulletin of the National Aviation University, 2(24), 184-188.

Lyu, C., Xu, J., & Wang, L. (2023). New trends in machine translation using large language models: Case examples with ChatGPT. arXiv preprintarXiv:2305.01181.

Mukherjee, A., & Shrivastava, M. (2025). Lost in translation? Found in evaluation: A comprehensive survey on sentence-level translation evaluation. ACM Computing Surveys.

Paletayeva, V., & Zubtsou, I. Professional English Aviation Terms: Usage and Translation. In The XI International Science Conference “Implementation of modern science in practice”, November 29-December 01, San Francisco, USA. 504 p. (p. 441).

Rivera-Trigueros, I. (2022). Machine translation systems and quality assessment: A systematic review. Language Resources and Evaluation, 56(2), 593-619.

Turian, J. P., Shea, L., & Melamed, I. D. (2006). Evaluation of machine translation and its evaluation. Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, 205-208.

Wang, A., Lin, M., & Tognolini, J. (n.d.). A new model for assessing civil aviation translation quality: Civil Aviation Translation Quality Assessment Model. Available at SSRN 5056637.

Wang, H., Wu, H., He, Z., Huang, L., & Church, K. W. (2022). Progress in machine translation. Engineering, 18, 143-153.

Wu, T., He, S., Liu, J., Sun, S., Liu, K., Han, Q. L., & Tang, Y. (2023). A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10(5), 1122-1136.

Zheng, J., Hong, H., Liu, F., Wang, X., Su, J., Liang, Y., & Wu, S. (2024). Fine-tuning large language models for domain-specific machine translation. arXiv preprintarXiv:2402.15061.

How to cite this paper

Evaluating and Improving LLM-based Translation in Aviation Contexts

How to cite this paper: Weiwei Liu, Chenyuan Yu, Yangxi Gong. (2025) Evaluating and Improving LLM-based Translation in Aviation Contexts. Journal of Humanities, Arts and Social Science9(8), 1664-1669.

DOI: http://dx.doi.org/10.26855/jhass.2025.08.028