A Hierarchical Part of Speech Tag set for Saraiki Language
Keywords:
Natural Language Processing, Language Processing, Saraiki Language, POS Tagging, Tag SetAbstract
Human languages are complex due to diverse nature of expression whether in spoken or written forms. Natural Language Processing (NLP) combines the power of linguistics and Artificial Intelligence (AI) techniques, which enables the computers to understand natural languages as humans do. Each language has its own unique set of grammar, syntax, terms, slangs and rules. Language processing involves certain tools which plays vital role in construction, analysis and manipulation of any language. Part of Speech (POS) tagging is one of the essential and basic process for other applications in NLP. POS tagging is the process of classifying words into their parts of speech like noun, verb, preposition, adverb etc. to a word. POS tagging involves a use of proper POS tagset used to label distinct parts of the text with grammatical annotations. This helps to identify linguistic features of a word, phrase or discourse in the text corpus during annotation process. Saraiki language (SL) is one of the ancient regional languages of central Pakistan at present This research work focuses on the development of SL POS tagset that will help the community to manipulate SL resources digitally. SL tagset consist of fourty-seven classes of words along with sub-classes and their tags.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Southern Journal of Arts & Humanities
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.