Repository logo

DBpedia Type and Entity Detection Using Word Embeddings and N-gram Models

dc.contributor.authorZhou, Hanqing
dc.contributor.supervisorZouaq, Amal
dc.contributor.supervisorInkpen, Diana
dc.date.accessioned2018-03-20T13:39:46Z
dc.date.available2018-03-20T13:39:46Z
dc.date.issued2018
dc.description.abstractNowadays, knowledge bases are used more and more in Semantic Web tasks, such as knowledge acquisition (Hellmann et al., 2013), disambiguation (Garcia et al., 2009) and named entity corpus construction (Hahm et al., 2014), to name a few. DBpedia is playing a central role on the linked open data cloud; therefore, the quality of this knowledge base is becoming a central point of focus. However, there are some issues with the quality of DBpedia. In particular, DBpedia suffers from three major types of problems: a) invalid types for entities, b) missing types for entities, and c) invalid entities in the resources’ description. In order to enhance the quality of DBpedia, it is important to detect these invalid types and resources, as well as complete missing types. The three main goals of this thesis are: a) invalid entity type detection in order to solve the problem of invalid DBpedia types for entities, b) automatic detection of the types of entities in order to solve the problem of missing DBpedia types for entities, and c) invalid entity detection in order to solve the problem of invalid entities in the resource description of a DBpedia entity. We compare several methods for the detection of invalid types, automatic typing of entities, and invalid entities detection in the resource descriptions. In particular, we compare different classification and clustering algorithms based on various sets of features: entity embedding features (Skip-gram and CBOW models) and traditional n-gram features. We present evaluation results for 358 DBpedia classes extracted from the DBpedia ontology. The main contribution of this work consists of the development of automatic invalid type detection, automatic entity typing, and automatic invalid entity detection methods using clustering and classification. Our results show that entity embedding models usually perform better than n-gram models, especially the Skip-gram embedding model.en
dc.identifier.urihttp://hdl.handle.net/10393/37324
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-21596
dc.language.isoenen
dc.publisherUniversité d'Ottawa / University of Ottawaen
dc.subjectNatural Language Processingen
dc.subjectSemantic Weben
dc.subjectMachine Learningen
dc.subjectDBpediaen
dc.subjectEntity Embeddingsen
dc.subjectN-gramsen
dc.titleDBpedia Type and Entity Detection Using Word Embeddings and N-gram Modelsen
dc.typeThesisen
thesis.degree.disciplineGénie / Engineeringen
thesis.degree.levelMastersen
thesis.degree.nameMCSen
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Zhou_Hanqing_2018_thesis.pdf
Size:
2.64 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: