KFU-made anti-plagiarism tool for Tatar language publications enters use
According to Director of the Institute of Philology and Intercultural Communication Radif Zamaletdinov, the problem of automated anti-plagiarism control in indigenous languages is very pertinent.
“The development of a program to search for borrowings in the languages of the peoples of Russia, including the Tatar language, is not an easy task, since there are serious differences in the structures of, for example, the Russian and Tatar languages, and there are no systematized databases of scientific information written in the languages of the peoples of Russia. Nevertheless, this task is very important, as this program, as we deeply believe, will help to assess the real scientific contribution of each Tatar scientist, will stimulate independence in scientific activity,” notes Dr Zamaletdinov.
The Center for Strategic Research in Indigenous Languages and Cultures has undertaken a huge work by initiating a unified Tatar language scientific library – a corpus of texts for the plagiarism assessment model. Initial design of a borrowing model for Tatar-language scientific texts based on word frequency analysis and homoglyphic search was carried out. Development of a model for searching cross-lingual borrowings from publications in other languages and its integration into the borrowing search system was started. A test version of the program has been developed on the basis of the text corpus being formed, the project has been launched in pilot mode. Search algorithms are now being optimized.
“We focused primarily on the web service. In addition, we jointly wrote parsers for texts in the Tatar language, since it was necessary to form a primary corpus of texts, and also tested common algorithms for searching for borrowings. At the second stage, the work was already focused on more serious NLP methods, as well as on the features of the Tatar language. It was important to determine what borrowings could be, how to expand the corpus of Tatar texts, and much more. In fact, this project is an excellent example of the synergistic effect of interdisciplinary interaction between specialists in the field of the Tatar language and software development,” says Director of the Institute of IT and Intelligent Systems Mikhail Abramsky.
A copyright certificate has been obtained for the anti-plagiarism software tool.