The preponderance of content written in simplified Chinese characters on the Internet poses a hurdle to Taiwan’s development of machine-learning technology, artificial intelligence (AI) experts said.
The Trustworthy Artificial Intelligence Dialog Engine being developed by the National Science and Technology Council (NSTC) was billed by the government as a key program for bringing innovation to Taiwan’s tech sector.
However, experts are sounding the alarm that the initiative is outgunned by the competition in China, which enjoys superior funding, computing power, staffing and the sheer amount of content available for training the AI.
Photo: Reuters
The data put into generative AI has a deep impact on its behavior and capabilities, as the algorithm needs to be trained by reading texts, Lee Yuh-jye (李育杰), a distinguished research fellow at Academia Sinica’s Research Center for Information Technology Innovation, said on Saturday.
Academia Sinica’s own experimental chatbot, which embarrassed its researchers by claiming to be Chinese and calling Chinese President Xi Jingping (習近平) the leader of Taiwan, was an example of training gone awry, he said.
Web content from China that the bot was exposed to during its training, via the Llama 2 language model, was determined to be the source of the problem, Lee said, adding that the bot’s Chinese views stemmed from data it consumed.
This incident has shown that a domestic language model is needed for the nation to make progress in AI research, he said.
Although Taiwanese social media create a large amount of content, much of it consists of gibberish, short-lived buzzwords and flame war threads on Professional Technology Temple and Dcard, he said.
The dubious quality of the content means that the data have to be massaged to train AI models, Lee said.
Taiwanese researchers prevent Chinese influence on the AI by converting simplified Chinese characters to traditional ones or by excluding material originating in China altogether, Taiwan Artificial Intelligence Association chairman Eric Huang (黃逸華) said.
The latter approach runs the risk of giving too little data to the algorithm, which could worsen the AI’s “hallucinations” and limit its versatility, he said.
The problem is decades in the making and is unlikely to be resolved soon, while making use of social media platforms would lead to significant cost increases due to the need to curate the data, he said.
Meanwhile, Tunghai University assistant law professor Chang Kai-hsin (張凱鑫), said legislation is needed to deal with the implications of generative AI on copyrights and for Taiwan to make headway in machine learning.
Backlash from content creators over their creations being used in AI training has highlighted the legal problems that could arise from the large-scale data mining machine learning requires, he said.
The copyright walls erected around content has exacerbated the scarcity of text in traditional Chinese characters that could be used for Taiwan’s AI research, Chang said, adding that the problem especially hampers government-funded projects.
The safest legal way to utilize content — individually obtaining use permissions from the content’s rightful owner — cannot apply to AI training which harvests tremendous amounts of data, he said.
Crawling the Internet for data is especially fraught with respect to copyrights, trade patents and privacy, Chang said.
The legal strategy of US enterprises engaged in machine learning that incorporates OpenAI is to leverage the fair use principle when litigated, with their success at court being determined on a case-by-case basis, he said.
Taiwan does not have laws governing copyright issues originating from AI use, as the nation’s research on large language models has not yet produced a working specimen, Chang said.
The nation does have an advantage due to the high degree of specificity in Subsection 3 of the Copyright Act (著作權法), which does not need extensive reworking for regulating AI, he said.
The ownership of copyrights on AI-generated content would be calculated by the percentage of human contributions to the creation if the principles espoused in the law are upheld, Chang said.
The NSTC should consider the needs of each link in the AI industry’s supply chain through writing a promised AI basic law, he said, adding that efforts to grow the sector require a suitable regulatory framework.
Chinese Nationalist Party (KMT) Chairman Eric Chu (朱立倫), spokeswoman Yang Chih-yu (楊智伃) and Legislator Hsieh Lung-chieh (謝龍介) would be summoned by police for questioning for leading an illegal assembly on Thursday evening last week, Minister of the Interior Liu Shyh-fang (劉世芳) said today. The three KMT officials led an assembly outside the Taipei City Prosecutors’ Office, a restricted area where public assembly is not allowed, protesting the questioning of several KMT staff and searches of KMT headquarters and offices in a recall petition forgery case. Chu, Yang and Hsieh are all suspected of contravening the Assembly and Parade Act (集會遊行法) by holding
PRAISE: Japanese visitor Takashi Kubota said the Taiwanese temple architecture images showcased in the AI Art Gallery were the most impressive displays he saw Taiwan does not have an official pavilion at the World Expo in Osaka, Japan, because of its diplomatic predicament, but the government-backed Tech World pavilion is drawing interest with its unique recreations of works by Taiwanese artists. The pavilion features an artificial intelligence (AI)-based art gallery showcasing works of famous Taiwanese artists from the Japanese colonial period using innovative technologies. Among its main simulated displays are Eastern gouache paintings by Chen Chin (陳進), Lin Yu-shan (林玉山) and Kuo Hsueh-hu (郭雪湖), who were the three young Taiwanese painters selected for the East Asian Painting exhibition in 1927. Gouache is a water-based
Taiwan would welcome the return of Honduras as a diplomatic ally if its next president decides to make such a move, Minister of Foreign Affairs Lin Chia-lung (林佳龍) said yesterday. “Of course, we would welcome Honduras if they want to restore diplomatic ties with Taiwan after their elections,” Lin said at a meeting of the legislature’s Foreign Affairs and National Defense Committee, when asked to comment on statements made by two of the three Honduran presidential candidates during the presidential campaign in the Central American country. Taiwan is paying close attention to the region as a whole in the wake of a
OFF-TARGET: More than 30,000 participants were expected to take part in the Games next month, but only 6,550 foreign and 19,400 Taiwanese athletes have registered Taipei city councilors yesterday blasted the organizers of next month’s World Masters Games over sudden timetable and venue changes, which they said have caused thousands of participants to back out of the international sporting event, among other organizational issues. They also cited visa delays and political interference by China as reasons many foreign athletes are requesting refunds for the event, to be held from May 17 to 30. Jointly organized by the Taipei and New Taipei City governments, the games have been rocked by numerous controversies since preparations began in 2020. Taipei City Councilor Lin Yen-feng (林延鳳) said yesterday that new measures by