Automatic generation of the multilingual dictionary of synonyms for crime-related terms and words
26/04/2021, 11:00, en ligne via Teams
In our study we suggest the approach to automated multilingual dictionary generation that covers the domain focused on the criminal topic. We divided the process into three steps. Firstly, we have manually formed a structure of the dictionary as XML document and have filled it with basic lexis. The second part of our study includes the establishment of two corpora focused on criminal topics. The first multilingual corpus comprises texts in Russian, Ukrainian and English languages. The second multilingual corpus that we use in our study is a parallel Russian-Kazakh corpus.
The last stage of the dictionary building is its automatic filling and extension. In the most common case, the fact is a triplet of the Subject, Object and Predicate. If two components of the triplet are found in our basic dictionary and one component is not found in it, the last one is automatically placed in the dictionary.
Keywords: criminal topics, multilingual corpus, parallel corpus, logical-linguistic model