인코더-디코더 모델에서 다양한 한국어 표현방법의 결합을 통한 언어모델링의 성능향상

Alternative Title
Jaeyeon Lee
Author(s)
이재연
Alternative Author(s)
Jaeyeon Lee
Advisor
손경아
Department
일반대학원 컴퓨터공학과
Publisher
The Graduate School, Ajou University
Publication Year
2017-02
Language
eng
Keyword
Deep LearningNLPKorean Language ModelSequence-to-SequenceEncoder-Decoder▲》 [856][4 ]
Alternative Abstract
Most successful neural language modeling have mainly focused on English language and operated at the level of words. Although word-level based neural language model performs well, some other languages cannot get word-level token easily especially in Korean language model. In the field of natural language processing for the Korean language, morpheme-level approaches are commonly employed as alternatives to the word in English. However, it causes the dependency to the external morphology analyzer. Accordingly, character-level approaches are preferred for Korean neural language modeling, but there are several ways to represent the Korean language in character-level, as which we call them Korean-letter level and Korean-grapheme level representations. In this thesis, we investigate the best representations of the Korean language with evaluation on three experiments: reconstruction, marker classification, and spelling correction. Since there are no public datasets in the Korean language for the three tasks, we first generate the datasets using a Korean public corpus. Furthermore, we propose an advanced architecture which can effectively employ various representations for sequence-to-sequence problem in the Korean language. In our experiments, we showed that the proposed architecture outperforms the traditional architectures that use only one of the representations.
URI
https://dspace.ajou.ac.kr/handle/2018.oak/11137
Fulltext

Appears in Collections:
Graduate School of Ajou University > Department of Computer Engineering > 3. Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse