AJOU Central Library Repository: A Graph Model Based Approach for Document Novelty Detection

BROWSE

Graduate School of Ajou University Department of Computer Engineering 3. Theses(Master)

A Graph Model Based Approach for Document Novelty Detection

DC Field	Value	Language
dc.contributor.advisor	Tae Sun Chung	-
dc.contributor.author	ARUL RAYAN NAVINO NIRMAL	-
dc.date.accessioned	2018-11-08T08:21:56Z	-
dc.date.available	2018-11-08T08:21:56Z	-
dc.date.issued	2015-08	-
dc.identifier.other	20293	-
dc.identifier.uri	https://dspace.ajou.ac.kr/handle/2018.oak/13150	-
dc.description	학위논문(석사)--아주대학교 일반대학원 :컴퓨터공학과,2015. 8	-
dc.description.tableofcontents	ACKNOWLEDGEMENTS i ABSTRACT ii TABLE OF CONTENTS iii LIST OF FIGURES v LIST OF TABLES vi CHAPTER 1. Introduction 1 1.1 Motivation 1 1.2 Challenges 2 1.3 Contribution 3 CHAPTER 2. Related Work 4 2.1 Authorship Attribution 4 2.2 Document Classification 5 2.3 E-mails Classification and Categorization 6 2.4 Novelty Detection 7 CHAPTER 3. The Proposed Model 8 3.1 Overview of the model 8 3.2 Feature-Set Selection 11 3.2.1 Frequency method 12 3.2.2 TFIDF method 12 3.2.3 Graph-Model based approach 13 3.3 Text Representation 17 3.3.1 Binary representation 17 3.3.2 Frequency representation 17 3.3.3 TFIDF representation 18 3.3.4 Hadamard representation 18 3.3.5 Probability representations 18 3.4 Classifier Algorithms 20 3.4.1 Prototype algorithm 20 3.4.2 Nearest neighbor algorithm 21 3.4.3 Naive Bayes method 21 3.4.4 Auto encoder 21 3.4.5 One Class SVM 22 CHAPTER 4. Experimental Results 24 4.1 Dataset & Evaluation Parameters 24 4.1.1 Data collection 24 4.1.2 Evaluation Parameters 25 4.2 Experiment Setup 27 4.2.1 Cross validation 27 4.2.2 Implementation 27 4.3 Comparison with other techniques 29 4.3.1 Feature Set Selection 29 4.3.2 Text Representation 33 4.3.3 Classification Models 37 4.4 Evaluation of the Proposed Model 41 4.4.1 Effect of ?? 42 4.4.2 Effect of feature Size 44 4.5 Optimization 47 CHAPTER 5. Discussion and Conclusion 49 5.1 Possible Applications 49 5.2 Future Work 50 5.3 Conclusion 51 REFERENCES 52	-
dc.language.iso	eng	-
dc.publisher	The Graduate School, Ajou University	-
dc.rights	아주대학교 논문은 저작권에 의해 보호받습니다.	-
dc.title	A Graph Model Based Approach for Document Novelty Detection	-
dc.type	Thesis	-
dc.contributor.affiliation	아주대학교 일반대학원	-
dc.contributor.department	일반대학원 컴퓨터공학과	-
dc.date.awarded	2015. 8	-
dc.description.degree	Master	-
dc.identifier.localId	705397	-
dc.identifier.url	http://dcoll.ajou.ac.kr:9080/dcollection/jsp/common/DcLoOrgPer.jsp?sItemId=000000020293	-
dc.subject.keyword	Computer Science	-
dc.subject.keyword	Text Mining	-
dc.subject.keyword	Novelty Detection	-
dc.description.alternativeAbstract	Document Novelty Detection is a concept learning problem wherein the system gains its knowledge only from the positive documents under a concept and with that limited knowledge it attempts to detect the negative cases. This work focuses on learning author style as a concept from the given set of documents, particularly e-mails. Since author attribution for smaller texts such as e-mails is more complex compared to larger documents, the techniques originally used for the large documents prove inefficient for smaller texts. The main goal of this work is to address this shortcoming of existing algorithms in detecting aberration in author style. A graph model based technique for feature set extraction from small documents has been proposed and evaluated. Also two probability based text representation schemes have been developed that could best represent a text document to an underlying one-class SVM classifier. The proposed models have been compared and evaluated against the public Enron e-mail dataset. Applying graph based feature set extraction technique in combination with the inclusive compound probability based text representation has proved to be very efficient and hence we have extensively evaluated the effect of all controlling parameters to arrive at the optimal values.	-

Appears in Collections:: Graduate School of Ajou University > Department of Computer Engineering > 3. Theses(Master)

Files in This Item:: There are no files associated with this item.

Show simple item record

qrcode

트윗하기

License

STATISTICS: Total Visit :3,727,429; Total Download :1,818; Today View :5,087

AJOU Central Library Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.

BROWSE

Browse