[This article belongs to Volume - 54, Issue - 02]
Gongcheng Kexue Yu Jishu/Advanced Engineering Science
Journal ID : AES-19-11-2022-454

Title : PARTS OF SPEECH TAGGING OF THE NYISHI LANGUAGE USING HMM
Joyir Siram [1], Koj Sambyo [2], Achyuth Sarkar [3]

Abstract :

A natural language is one that humans speak, write, or sign for everyday communication, in contrast to formal languages. Natural language processing refers to the computational processes needed to allow a computer to process information using natural language. Nyishi part-ofspeech tagging is more challenging to answer than the English equivalent because it must be combined with the word identification problem. A POS Tagger assigns the appropriate tag, such as a noun, adjective, verb, or adverb, to each word of the input sentence. We incorporate Penn Treebank's tag set concept of word tagging format. A POS tagger's Tag set and Disambiguation Rules are essential components. The lack of a corpus for computational processing makes POS tagging for the Nyishi language challenging. Here, we discuss our work on first-order, fully linked hidden Markov models-based Nyishi part-of-speech tagging. For training and testing, a corpus of about 30,000 Nyishi characters is used. A Viterbi-based word identification algorithm divides an article into clauses and subsequently into words. The following experimental findings are presented for various testing scenarios 89% of the words in the testing data can be accurately tagged by the system.