Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation

Ebrahim S.; Hegazy, Doaa; Mostafa M.; El-Beltagy S.;

Abstract


© 2017 The Author(s). In this paper we introduce a new method for detecting a type of English Multiword Expressions (MWEs), which is phrasal verbs, into an English-Arabic phrase-based statistical machine translation (PBSMT) system. The detection starts with parsing the English side of the parallel corpus, detecting various linguistic patterns for phrasal verbs and finally integrate them into the En-Ar PBSMT system. In addition, the paper explores the effect of cliticizing specific words in English that have no Arabic equivalent. The results, which reported with the BLEU scores, showed that some patterns achieved significant improvements compared to other patterns and still the baseline achieves the highest score. This paper shows that, by detecting more linguistic patterns and integrating them into En-Ar SMT system, translation quality could be improved with other integration methods. Yet, the results show which path is worth to follow and clarifies the perspective that linguistic features are not handled properly in the statistically learned models.


Other data

Title Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation
Authors Ebrahim S. ; Hegazy, Doaa ; Mostafa M. ; El-Beltagy S. 
Issue Date 1-Jan-2017
Journal Procedia Computer Science 
DOI 111
http://api.elsevier.com/content/abstract/scopus_id/85037742623
117
10.1016/j.procs.2017.10.099
Scopus ID 2-s2.0-85037742623

Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

Citations 7 in scopus
views 14 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.