INFORMATICA JOURNAL

DOI LINK: https://doi.org/10.59671/MBULJ
Paper ID:MBULJ
Volume:35
Issue:2
Title:Boosting English-Amharic machine translation using corpus augmentation and Transformer
Abstract:The Transformer-based neural machine translation (NMT) model has been very successful in recent years and has become a new mainstream method. However, using them in lowresourced languages requires large amounts of data and efficient model configuration (hyperparameter tuning) mechanisms. The scarcity of parallel texts is a bottleneck for high quality (N) MTs, especially for under resourced languages like Amharic. As a result, this paper presents an attempt to improve English-Amharic MT by introducing three different vanilla Transformer architectures, with different hyper-parameter values. To obtain additional training material, offline token level corpus augmentation was applied to the previously collected English-Amharic parallel corpus. Compared to previous work on Amharic MT, the best of the three Transformer models have achieved state-of-the-art BLEU scores. In fact, we were able to achieve this result by employing corpus augmentation techniques and hyper-parameter tuning.
Keywords:machine translation, Amharic language, corpus augmentation, NMT, Transformer
Authors:Yohannes Biadgligne, Kamel Smaili
Paper PDF Link: View full PDF