[This article belongs to Volume - 55, Issue - 01, 2023]
Gongcheng Kexue Yu Jishu/Advanced Engineering Science
Journal ID : AES-05-1-2023-002

Title : HANDLING SPARSE AND MISSING TEXT DATA USING DEEP LEARNING APPROACH
Sowmya V1, Dr M V Vijaya Kumar2,

Abstract :

In recent years, deep learning techniques have revolutionized the way natural language processing (NLP) is being utilized to accomplish many real-world language generation tasks, like machine translation, text summarization, chatbots, and dialog generation. Despite this, there are still many challenges to be addressed, one of which is handling the missing data. In the real world, there are many instances of incompleteness and missing values in text datasets resulting from unrecorded observations, limiting the usefulness of language generation models. In most cases, imputation techniques replace missing data by substituting some values to preserve information within the data. Unfortunately, most imputation methods operate on numerical and are rarely applied to textual data, which remains a challenging problem. This paper proposes a deep learning-driven text imputation model that determines the probability of missing words in a sentence based on preceding and subsequent terms. A sequence-to-sequence language model is developed that uses a recurrent neural network and attention mechanism. On the other hand, the study applies an iterative search optimization algorithm to a trained model to predict the most likely words and insert them into the missing place that exists anywhere in the given input text data. The result indicated that the proposed imputation scheme consistently and efficiently replaced missing words with appropriate words in the given missing text. The proposed model is suitable for many real-world language generation and text analytics tasks.