DBpedia Type and Entity Detection Using Word Embeddings and N-gram Models

Zhou, Hanqing

DBpedia Type and Entity Detection Using Word Embeddings and N-gram Models

dc.contributor.author	Zhou, Hanqing
dc.contributor.supervisor	Zouaq, Amal
dc.contributor.supervisor	Inkpen, Diana
dc.date.accessioned	2018-03-20T13:39:46Z
dc.date.available	2018-03-20T13:39:46Z
dc.date.issued	2018
dc.description.abstract	Nowadays, knowledge bases are used more and more in Semantic Web tasks, such as knowledge acquisition (Hellmann et al., 2013), disambiguation (Garcia et al., 2009) and named entity corpus construction (Hahm et al., 2014), to name a few. DBpedia is playing a central role on the linked open data cloud; therefore, the quality of this knowledge base is becoming a central point of focus. However, there are some issues with the quality of DBpedia. In particular, DBpedia suffers from three major types of problems: a) invalid types for entities, b) missing types for entities, and c) invalid entities in the resources’ description. In order to enhance the quality of DBpedia, it is important to detect these invalid types and resources, as well as complete missing types. The three main goals of this thesis are: a) invalid entity type detection in order to solve the problem of invalid DBpedia types for entities, b) automatic detection of the types of entities in order to solve the problem of missing DBpedia types for entities, and c) invalid entity detection in order to solve the problem of invalid entities in the resource description of a DBpedia entity. We compare several methods for the detection of invalid types, automatic typing of entities, and invalid entities detection in the resource descriptions. In particular, we compare different classification and clustering algorithms based on various sets of features: entity embedding features (Skip-gram and CBOW models) and traditional n-gram features. We present evaluation results for 358 DBpedia classes extracted from the DBpedia ontology. The main contribution of this work consists of the development of automatic invalid type detection, automatic entity typing, and automatic invalid entity detection methods using clustering and classification. Our results show that entity embedding models usually perform better than n-gram models, especially the Skip-gram embedding model.	en
dc.identifier.uri	http://hdl.handle.net/10393/37324
dc.identifier.uri	http://dx.doi.org/10.20381/ruor-21596
dc.language.iso	en	en
dc.publisher	Université d'Ottawa / University of Ottawa	en
dc.subject	Natural Language Processing	en
dc.subject	Semantic Web	en
dc.subject	Machine Learning	en
dc.subject	DBpedia	en
dc.subject	Entity Embeddings	en
dc.subject	N-grams	en
dc.title	DBpedia Type and Entity Detection Using Word Embeddings and N-gram Models	en
dc.type	Thesis	en
thesis.degree.discipline	Génie / Engineering	en
thesis.degree.level	Masters	en
thesis.degree.name	MCS	en
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science	en

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Zhou_Hanqing_2018_thesis.pdf
Taille:: 2.64 MB
Format:: Adobe Portable Document Format
Description:

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -