음소 통계를 활용한 고유명사 인식 정확도 향상 기법

DC Field Value Language
dc.contributor.advisor노병희-
dc.contributor.author김규석-
dc.date.accessioned2019-08-13T16:41:12Z-
dc.date.available2019-08-13T16:41:12Z-
dc.date.issued2019-08-
dc.identifier.other29033-
dc.identifier.urihttps://dspace.ajou.ac.kr/handle/2018.oak/15555-
dc.description학위논문(석사)--아주대학교 :정보통신,2019. 8-
dc.description.abstract4차 산업혁명 시대에 음성인식 인터페이스는 기기조작 분야와 전화걸기, 위치찾기, 날씨확인 등의 정보검색 분야에서 널리 활용되고 있다. 현재 음성인식 인터페이스는 딥러닝, 머신러닝과 같은 데이터 축적 기술을 기반으로 한 기계학습 기술을 이용해 억양과 발음을 학습하고 참조데이터를 활용하기도 하여 인식의 정확도를 높이고 있다. 그러나 새로운 사물은 계속해서 생성되는 만큼 고유명사의 수 또한 계속해서 증가하고 있다. 또한, 한국어와 같은 언어의 경우에는 중국어나 영어에 비해서 사용인구가 적은 만큼 학습량도 적은 상황이다. 따라서, 고유명사의 음성인식 정확도를 높이기 위해 딥러닝과 같은 데이터 축적을 통한 기계학습에 의존하지 않고 음향학적 음성인식 기술과 후보정 기술 등을 개선해야 할 필요가 있다. 본 논문에서는 1개 이상의 음성인식 N-best 결과물에 대한 음소들의 TF 통계를 내어 참조데이터가 미존재시, 최고 통계치의 결과물을 선택하고, 참조데이터가 존재시, 결과물과 일치하는 데이터를 선택한다. 만약, 참조데이터에 일치하는 데이터가 없는 경우 LED(Levenshtein Edit Distance) 값을 기준으로 결과물을 선택한다. 제안하는 알고리즘을 검증하기 위해 Google Voice를 활용하여 Index별 TF 최고값들을 조합한 새로운 단어를 생성해 내는 기법을 사용한다. 데이터 축적을 통한 기계학습 빈도가 낮은 고유명사를 활용하여 제안하는 방법을 실험한 결과, 고유명사의 음성인식 정확도가 향상됨을 확인하였다.-
dc.description.tableofcontents1. 서 론 ·····························································································1 1.1. 연구의 배경 ················································································1 1.2. 연구의 필요성 및 목적··································································2 1.3. 연구 범위 및 구성········································································3 2. 관련 연구 ······················································································4 2.1. Voice Recognition Interface의 특징 ··········································4 2.1.1. Voice Recognition의 원리 및 보정기술 ····································4 2.1.2. Voice Recognition API의 내부 구조 ·········································6 2.1.3. 상황별 Voice Recognition Interface의 사용빈도 ······················ 7 2.1.4. 상황별 Voice Recognition 명령어 ············································9 2.2. 관련 기술 ···················································································11 2.2.1. Soundex ················································································11 2.2.2. Levenshtein Edit Distance ·····················································13 2.2.3. TF-IDF(Term Frequency-Inverse Document Frequency) ·········15 3. TF와 LED를 활용한 고유명사 인식 정확도 향상 ································17 3.1. 변수 및 고유명사의 정의······························································17 3.2. 공유명사 음성인식 결과의 음소별 TF 산출 방법 ·····························19 3.2.1. 음소별 Index 정의 ···································································19 3.2.2. Index별 TF 계산 ······································································19 3.3. 제안 방법 ···················································································21 4. Simluation 및 평가 ········································································24 4.1. Simulation 구성 ·········································································24 4.2. 참조데이터 유무에 따른 결과 ························································26 4.3. 동일한 TF 값을 가진 음소의 선택 기준 ···········································27 4.4. 종성 유무의 판단 기준 ·································································28 4.5. 결과물에서 TF 선택 기준 ······························································30 4.6. 참조데이터가 없는 과의 결과 차이 ·················································32 4.7. 참조데이터 미존재시 Google Voice, TF를 활용한 결과··················· 35 4.8. 참조데이터 존재시 Soundex, TF/LED를 활용한 결과·······················36 5. 결론 ·······························································································39 참고 문헌 ····························································································41 Abstract ····························································································44-
dc.language.isokor-
dc.publisherThe Graduate School, Ajou University-
dc.rights아주대학교 논문은 저작권에 의해 보호받습니다.-
dc.title음소 통계를 활용한 고유명사 인식 정확도 향상 기법-
dc.title.alternativeKyuseok Kim-
dc.typeThesis-
dc.contributor.affiliation아주대학교 정보통신대학원-
dc.contributor.alternativeNameKyuseok Kim-
dc.contributor.department정보통신대학원 정보통신-
dc.date.awarded2019. 8-
dc.description.degreeMaster-
dc.identifier.localId952159-
dc.identifier.uciI804:41038-000000029033-
dc.identifier.urlhttp://dcoll.ajou.ac.kr:9080/dcollection/common/orgView/000000029033-
Appears in Collections:
ETC > ETC
Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse