A Step-Wise Methodology using Semi-Supervised Topic Modelling to Recommend Contextual Relationships for an Ontology Engineer

Author(s)
SAWYER JONATHAN PETER
Advisor
Seok-Won Lee
Department
일반대학원 컴퓨터공학과
Publisher
The Graduate School, Ajou University
Publication Year
2016-08
Language
eng
Keyword
Topic ModellingOntology LearningLabelled-LDAWordNET
Alternative Abstract
Students are faced with an increasing amount of complex decision making during the duration of their studies. Those decisions might involve choosing a degree that lines up with their career objectives. They may have a specific curiosity in mathematics or science which they may not entirely understand and wish to explore further. This may lead a learner to a search engine to explore their curiosity further. This in itself can be a challenging task as the learner has little knowledge about the subject matter to start off with and may not know how to initiate their enquiry. They may search for “Computer Science” but find it hard to understand the purpose or contents of subject matter. Thus, we found it necessary to create a frequently asked system called GOLD. One of the challenges of building a system like this is getting at large amounts of information hidden in corpus, therefore we wanted to use a high performance algorithm to gather topics from large sets of corpus. This resulted in the investigation of different approaches including Naive Bayes and Support Vector Machines, both of which can be used to classify text. After careful examination and consideration of the suitability for our context we decided that the family of topic modelling algorithms within the Latent Dirichlet Allocation to be the most suitable. Two semi-supervised algorithms namely Labeled Latent Dirichlet Allocation (L-LDA) and Partial Latent Dirichlet Allocation (P-LDA) both provide a means of supervising topic models around specific topic labels. The next challenge is determination of suitable labels to train our topic models. The approached we used to generate a set of concepts that make up all possible answers generatable from the ontology. We generate these concepts into concept hierarchies which represent the logical flow of information. These form what we call the known or visible layer of our system. The hidden layer of the system consists of a corpus, the concept trees from the known layer, a local taxonomy and a label selection algorithm to choose the most suitable labels for training our L-LDA models. This pre-processing step highlights unsuitable labels and constructs a CSV file for consumption by our SCALA code which is subsequently fed into the Stanford Topic Modelling Toolkit. The results form the basis for evaluation of the completeness of the ontology and the derived ontology answers. We then approached this problem from a different angle by focusing providing new branches and relationships that didn’t exist within the ontology. Firstly, we did this by creating a tool that uses topic modeling to discover related topics to a concept and provided an interface to WordNET that allowed exploration of connected graphs of words. Secondly, we used Brills Pars of Speech algorithm to classify the topic models into parts of speech and used that to construct relationships between nodes and new concepts. We evaluated this system using the case study methodology and a set of topic model evaluation experiments and two separate surveys. One for the quality of the topic models and the second for evaluation of the ontology analysis results.
URI
https://dspace.ajou.ac.kr/handle/2018.oak/13285
Fulltext

Appears in Collections:
Graduate School of Ajou University > Department of Computer Engineering > 3. Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse