Machine learning improves Arabic language transcription capabilities
Thanks to advances in speech and natural language processing, there is hope that one day you can ask your virtual assistant what is the best salad ingredient. Currently, it is possible to request your home gadget to play music, or turn on voice command, which is a feature that is already visible on some devices.
If you speak Moroccan, Algerian, Egyptian, Sudanese, or any other dialects of the Arabic language, which are very different in each region, where some of them don’t understand each other, this is a different story. If your native language is Arabic, Finnish, Mongolian, Navajo, or any language with a high level of morphological complexity, you may feel left out.
These complex constructions intrigued Ahmed Ali to find a solution. He is a chief engineer in the Arabic Language Technologies group at the Qatar Computing Research Institute (QCRI) —a part of the Hamad Bin Khalifa University of Qatar Foundation and founder of ArabicSpeech, a “community that exists for the benefit of the science of speech. in Arabic and speech technologies. ”
Ali was attracted to the idea of communicating cars, appliances, and gadgets many years ago while at IBM. “Can we make a machine that can understand different dialects — an Egyptian pediatrician to automate a prescription, a teacher in Syria to help children get the key parts of their lesson, or a Moroccan chef describing the best couscous recipe? ” he declares. However, the algorithms that run the machines do not go through the approximately 30 varieties of Arabic, much less understand it. Today, most language recognition tools only work in English and a few other languages.
The coronavirus pandemic has further fueled the growing reliance on voice technologies, where the way natural language processing technologies have helped people follow home stay instructions and steps away from old. However, as we use voice commands to help with e-commerce purchases and manage our homes, the future has many applications.
Millions of people around the world use massive open online courses (MOOCs) for its open access and unlimited participation. Language recognition is one of the key features of the MOOC, where students can search within specific areas of spoken content in courses and enable translations through subtitles. Speech technology enables the digitization of lectures to display spoken words as text in university classrooms.
According to a recent article in Speech Technology magazine, the voice and speech recognition market is estimated to reach $ 26.8 billion by 2025, as millions of consumers and companies around the world rely on voice bots not only to interact with their appliances or cars but also to improve customer service, drive health care innovations, and improve accessibility and inclusivity for those with hearing, speech, or motor impairment.
In a 2019 survey, Capgemini predicted that by 2022, more than two in three consumers will opt for voice assistants rather than visiting stores or bank branches; a share that will reasonably rise, due to the home-based, physical life and commercial distance forced by the global epidemic for more than a year and a half.
However, these devices have failed to deliver in many parts of the world. For 30 Arabic types and millions of people, that’s a huge missed opportunity.
Arabic for machines
English- or French-speaking voice bots are far from perfect. However, teaching machines to understand Arabic are more difficult for many reasons. These are the three commonly recognized challenges:
- Lack of diacritics. Arabic dialects are vernacular, as is primarily spoken. Most of the available text is undiacritized, meaning that it has no accents such as acute (´) or grave (`) that indicate the sound value of the letters. Therefore, it is difficult to determine where the vowels will go.
- Lack of resources. There is a lack of marked data for different Arabic dialects. In general, they lack the standard orthographic rules that dictate how to write a language, including rules or spelling, hyphenation, word breaks, and emphasis. These resources are essential to the training of computer models, and the fact that there are very few of them has hindered the progress of Arabic language recognition.
- Morphological complexity. Arabic speakers have been involved in several code changes. For example, in the French colonies — North Africa, Morocco, Algeria, and Tunisia — dialects include many borrowed French words. As a result, there is a high number of so-called out-of-vocabulary words, which are inaccessible to language recognition technologies because these words are not Arabic.
“But the field is moving with lightning speed,” Ali said. It is a joint effort between many researchers to make it faster. Ali’s Arabic Language Technology lab spearheaded the ArabicSpeech project to integrate Arabic translations with dialects native to each region. For example, the Arabic dialect can be divided into four regional dialects: North African, Egyptian, Gulf, and Levantine. However, since dialects do not follow boundaries, it may be preferable to have one dialect per town; for example, an Egyptian native speaker may distinguish between a person’s Alexandrian dialect from their compatriot from Aswan (a 1,000-kilometer distance on the map).
Build a tech-savvy future for everyone
At this point, machines are just as accurate as human transcribers, thanks in large part to advances in deep neural networks, a subfield of machine learning in artificial intelligence that relies on algorithm inspired by how the human brain functions, biologically and functionally. However, to date, language recognition has been somewhat hacked. The technology has a history of relying on a variety of modules for acoustic modeling, constructing pronunciation lexicons, and language modeling; all modules need to be trained separately. Recently, researchers have been training models that modify sound features directly in text transcripts, which can optimize all features for the final task.
Even with these advances, Ali is still unable to provide a voice command on most of his native Arabic devices. “It’s 2021, and I still can’t speak many machines in my dialect,” he commented. “I mean, now I have a device to understand my English, but machine recognition in multi-dialect Arabic speech hasn’t happened yet.”
Doing so was the focus of Ali’s work, which reached the first transformer for Arabic speech recognition and its dialects; one that has achieved hitherto unparalleled performance. Called the QCRI Advanced Transcription System, the technology is now used by broadcasters Al-Jazeera, DW, and BBC to transcribe online content.
There are several reasons that Ali and his team are successful in making these speaking machines today. Basically, he said, “There is a need to have resources in all dialects. We need to build the resources to train the model.” Advances in computer processing mean that computationally intensive machine learning is now happening. in a graphics processing unit, which can quickly process and display complex graphics. As Ali said, “We have a good architecture, good modules, and we have data that represents the truth.”
Researchers from QCRI and Kanari AI recently built models to achieve human parity in Arabic broadcast news. The system demonstrates the effect of subtitling Aljazeera daily reports. While the English human error rate (HER) is about 5.6%, the research reveals that Arabic HER is higher and can reach 10% due to the morphological complexity of the language and the lack of standard orthographic rules in dialectal Arabic. Thanks to recent advances in deep learning and end-to-end architecture, the Arabic language recognition engine has been able to outperform native speakers in broadcast news.
While Modern Standard Arabic speech recognition seems to be working well, researchers from QCRI and Kanari AI are distracted from testing the boundaries of dialectal processing and achieving great results. Since no one speaks Modern Standard Arabic at home, attention to the dialect is what we need to be understood by our voice assistants.
This content was written by Qatar Computing Research Institute, Hamad Bin Khalifa University, member of the Qatar Foundation. It was not written by the editorial staff of the MIT Technology Review.