A DISCOURSE-BASED INFORMATION RETRIEVAL FOR TAMIL LITERARY TEXTS

Authors

  • Anita Ramalingam Department of Computer Science and Engineering SRM Institute of Science and Technology, India
  • Subalalitha Chinnaudayar Navaneethakrish Department of Computer Science and Engineering SRM Institute of Science and Technology, India

DOI:

https://doi.org/10.32890/jict2021.20.3.4

Keywords:

Discourse parser, Morphological Analyzer, Inverted indexing, Ranking, Tamil information retrieval

Abstract

Tamil literature has many valuable thoughts that can help the human community to lead a successful and a happy life. Tamil literary works are abundantly available and searched on the World Wide Web (WWW), but the existing search systems follow a keyword-based match strategy which fails to satisfy the user needs. This necessitates the demand for a focused Information Retrieval System that semantically analyses the Tamil literary text which will eventually improve the search system performance. This paper proposes a novel Information Retrieval framework that uses discourse processing techniques which aids in semantic analysis and representation of the Tamil Literary text. The proposed framework has been tested using two ancient literary works, the Thirukkural and Naladiyar, which were written during 300 BCE. The Thirukkural comprises 1330 couplets, each 7 words long, while the Naladiyar consists of 400 quatrains, each 15 words long. The proposed system, tested with all the 1330 Thirukkural couplets and 400 Naladiyar quatrains, achieved a mean average precision (MAP) score of 89%. The performance of the proposed framework has been compared with Google Tamil search and a keyword-based search which is a substandard version of the proposed framework. Google Tamil search achieved a MAP score of 56% and keyword-based method achieved a MAP score of 62% which shows that the discourse processing techniques improves the search performance of an Information Retrieval system.

Metrics

Metrics Loading ...

Additional Files

Published

11-06-2021

How to Cite

Ramalingam, A. ., & Navaneethakrish, S. C. . (2021). A DISCOURSE-BASED INFORMATION RETRIEVAL FOR TAMIL LITERARY TEXTS. Journal of Information and Communication Technology, 20(3), 353–389. https://doi.org/10.32890/jict2021.20.3.4