OPHELIA a neural solution for text classification using joint embeddings of words and KG entities
Ir para navegação
Ir para pesquisar
Título principal
OPHELIA [recurso eletrônico] : a neural solution for text classification using joint embeddings of words and KG entities / Liliane Soares da Costa ; orientador, Renato Fileto
Data de publicação
2023
Descrição física
95 p. : il.
Nota
Disponível somente em versão on-line.
Tese (doutorado) – Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2023.
Inclui referências.
OPHELIA [recurso eletrônico] : a neural solution for text classification using joint embeddings of words and KG entities / Liliane Soares da Costa ; orientador, Renato Fileto
Data de publicação
2023
Descrição física
95 p. : il.
Nota
Disponível somente em versão on-line.
Tese (doutorado) – Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2023.
Inclui referências.
Abstract: The continuous expansion of textual data collection and dissemination has made text classification a crucial task for harnessing the massive amounts of digital text available today. Text classification aims to categorize a text document into one or more predefined categories within a specific application domain. Existing text classification approaches may be hindered when using just the bag-of-words model to represent features because it ignores word order and senses, which can vary depending on context. Word embeddings have recently emerged to address these limitations, allowing for significant performance improvements by condensing language knowledge into dense vectors. Furthermore, real-world entity relationships expressed in knowledge graphs can be condensed into dense vectors through knowledge embeddings. However, existing approaches do not fully leverage knowledge embeddings by failing to consider them in their models. Traditional text representation models are limited as they solely focus on words, lacking the ability to differentiate between documents that share the same vocabulary but offer different perspectives on a given subject. In this context, this work emerges in response to the diverse applications of automatic text classification. Additionally, it builds upon the potential of vector space representations and seeks to bridge the gap related to understanding the semantics present in natural language data. The primary goal of this study is to advance research in the field of Text Classification by incorporating semantic aspects into the representation of document collections. To achieve this, we propose OPHELIA, a Deep Neural Network (DNN) approach for text classification tasks using knowledge and word embeddings. OPHELIA exploits jointly trained embeddings of knowledge graphs and text. These embeddings can provide more consolidated contextual information than separate embeddings of text and knowledge, and their use for enhancing text classification has not been sufficiently explored yet. FastText is used to jointly train word and knowledge embeddings, allowing them to be consistently integrated into a single embedded space. The neural network used for OPHELIA is the Feedforward Neural Network and Capsule Network. This thesis first provides a comprehensive review of the literature on text classification using embeddings as features. Then, we describe the algorithms and architectures that constitute OPHELIA. We conduct experiments with different deep neural network models with varying numbers of hidden cells and hidden layers. Each architecture is evaluated with its optimal parameter combination to compare its performance with state-of-theart approaches. Our results demonstrate that OPHELIA outperforms existing approaches on the BBC dataset and remains competitive on AG News and Reuters-21578.