3078-3348




![]()

CrossRef
OpenAIRE
Index Copenius
Scilit
Zenodo
Google Scholar
J-Gate

Sara Rovira-Esteva
Departament de Traducció, Interpretació i Estudis de l’Àsia Oriental
Universitat Autò noma de Barcelona, Spain
Abstract
The Chinese government introduced the “Chinese Phonetic Notation Plan” (known as Pinyin) in 1958 to combat illiteracy, eventually formalizing it as a standardised transcription system in 2012. The correct application of Pinyin orthographic rules is essential for language learning, international communication, and digitization. This research is driven by the belief that accurate transcription of Chinese text into Pinyin is crucial, while acknowledging that the process can be difficult and tedious when done manually. Therefore, this study aims to assess the performance of various Pinyin automatic transcription tools, identify problematic aspects in transcription, and determine whether customised systems can improve results while reducing user effort. The study employs a multi-phase methodology, including the analysis of representative transcription tools, comparison of errors, and the customisation of a chatbot for enhanced performance. The results reveal that most dedicated tools segment transcriptions at the character level rather than by word. General GenAI systems perform better than specific tools, but none followed the rules consistently. Common problems were identified in reduplication, punctuation, neutral tone, and word identification. Although DeepSeek had better initial performance, the customised and trained version of ChatGPT-4 achieved superior results in adherence to Pinyin rules, though perfect accuracy proved unattainable. This research highlights the challenges faced in automated transcription and offers insights into future improvements for systems aimed at assisting users with Pinyin transcription.
Keywords
Pinyin, Chinese transcription, Pinyin converters, ChatGPT-4, DeepSeek
现有拼音转写工具评估:人工智能能否引领下一代技术?
罗飒岚
翻译、口译与东亚研究系,巴塞罗那自治大学, 西班牙
摘要
1958 年,中国政府颁布《汉语拼音方案》以推动扫盲运动; 2012 年,该方案被进一步确立为标准化转写体系。拼音正词法规则的准确应用直接影响语言学习、国际传播与信息处理。鉴于人工标注耗时费力,而汉语文本的拼音转写在上述场合重要且必须,本研究评估了当前主流自动拼音转写工具的表现,以识别现存问题,并探讨定制化系统能否在减轻人工负荷的同时提高转写精度。研究采用多阶段方法,包括典型工具测评、错误对比以及定制聊天机器人实验。结果显示,(1)大多数专用转写工具仍以单字为转写单元,未能实现词语级切分;(2)通用生成式人工智能系统的表现虽优于部分转写工具,但仍难以稳定地遵循正词法规则;(3)常见误差集中于叠词、标点符号、中性声调及词语切分。虽然 DeepSeek 在初始测试中暂居优势,经定制与训练后的ChatGPT-4 在遵循拼音规则方面却更胜一筹,然仍未达到完全准确。本研究呈现了自动转写实践中的主要挑战,并为后续系统的优化提供了实证参考。
关键词
拼音, 汉字转写, 拼音转写工具, ChatGPT-4, DeepSeek