Post

A Large Language Model in Tibetan Bridges the “Digital Divide” in the AI Era

A Large Language Model in Tibetan Bridges the “Digital Divide” in the AI Era

Published on: 2026-05-18

Source: People’s Republic of China in Russian –

An important disclaimer is at the bottom of this article.

“Large language models are mainly trained on Chinese, English, and other languages, so the understanding and generation of text in Tibetan leaves much to be desired,” says Caizhandunchi, a graduate student at Sichuan University engaged in developing a large language model in Tibetan. “The responses given by typical large language models often lack naturalness, linguistic intuition, and the authentic color of the Tibetan language, with obvious traces of artificial coding.”The Tibetan large language model, trained on a corpus of the Tibetan language and thinking in Tibetan, is capable of compensating for this shortcoming.

Dorji Maizhu, product manager of the large language model in Tibetan DeepZang, said that DeepZang has collected almost 70 million units of Tibetan-Chinese parallel corpus and more than 30,500 hours of spoken Tibetan language recordings, covering the three main dialect zones: U-Tsang, Kham, and Amdo.

According to Dordzhi Maytsyu, a unified script is used in different dialect zones, but the spoken language differs greatly. Thanks to training on a corpus of Tibetan speech from the main dialect regions of the Tibet Autonomous Region, it is possible to support inter-dialect communication in the Tibetan language. According to information about the patent of Jueluo Digital Industry Co., Ltd., the developer of DeepZang, posted on the official intellectual property administration website, the company’s technology, combining recognition of vocal characteristics and dialect classification, allows effective solutions to communication problems caused by dialect differences.

The ability to recognize speech in different dialects allows for effectively lowering the threshold for using a large language model in the Tibetan language. “You don’t need to be proficient in written Tibetan at a high level; you can use AI with the help of voice, and this will help a larger number of people,” notes Losandun, who works as a translator from Tibetan to Chinese in the city of Lhoka in the Shannan Autonomous Region. His colleagues have already gotten used to using large language models in Tibetan language work. “Previously, one text was translated by two or three people and took about 40 minutes.””Now, with the help of AI, one translator can handle everything in just over 20 minutes.”

According to information provided by the company JueluoDigital, DeepZang currently has more than 300,000 users, over 70% of whom are young people aged 18 to 40. “Our users mainly reside in the autonomous region of Sichuan, the provinces of Qin, Sichuan, and Gansu, and many of them are in quite remote areas,” said Lordj Maetzyu.

In Sichuan Electric Grid, all counties (districts, cities) are covered, and 70% of townships and villages have access to 5G networks. Thanks to the development of electrical and telecommunications infrastructure, AI in the Tibetan language will help an even greater number of Tibetan language speakers.

Please note; this information is raw content obtained directly from the source. It represents an exact report of what the source claims and does not necessarily reflect the position of MIL-OSI or its clients.