Convolution Neural Networks를 위한 고속 Convolution 알고리즘 과 가속기

DC Field Value Language
dc.contributor.advisor선우명훈-
dc.contributor.author김태선-
dc.date.accessioned2019-04-01T16:42:37Z-
dc.date.available2019-04-01T16:42:37Z-
dc.date.issued2019-02-
dc.identifier.other28581-
dc.identifier.urihttps://dspace.ajou.ac.kr/handle/2018.oak/15241-
dc.description학위논문(박사)--아주대학교 일반대학원 :전자공학과,2019. 2-
dc.description.tableofcontentsI. Introduction 6 II. Overview of Convolutional Neural Networks 12 A. Overall Architecture 12 B. Convolution Layer 13 C. Pooling Layer 17 D. Convolutional Neural Networks 19 1. LeNet 19 2. AlexNet 21 3. VGG-16 24 4. GoogLeNet 27 III. Acceleration for Deep Neural Networks 32 A. Quantization and Binarization 32 B. Pruning and Sharing 32 C. Low-Rank Factorization and Sparsity 34 IV. Two-Step MAC Operation for Convolutional Layer 35 V. Architecture for Two-Step MAC Operation 50 D. Modified HCCA 50 E. Reconstruction of the output pixel ordering 55 F. Overall architecture 59 G. PEG Architecture 60 H. Temporary Feature Map 66 VI. Experimental Results 68 I. Algorithms performance 68 J. Hardware Accelerator 72 VII. Conclusions 76 Bibliography 78-
dc.language.isoeng-
dc.publisherThe Graduate School, Ajou University-
dc.rights아주대학교 논문은 저작권에 의해 보호받습니다.-
dc.titleConvolution Neural Networks를 위한 고속 Convolution 알고리즘 과 가속기-
dc.typeThesis-
dc.contributor.affiliation아주대학교 일반대학원-
dc.contributor.department일반대학원 전자공학과-
dc.date.awarded2019. 2-
dc.description.degreeDoctoral-
dc.identifier.localId905150-
dc.identifier.uciI804:41038-000000028581-
dc.identifier.urlhttp://dcoll.ajou.ac.kr:9080/dcollection/common/orgView/000000028581-
dc.description.alternativeAbstractRecent advances in computing power made possible by developments of faster general-purpose graphics processing units (GPGPUs) have increased the complexity of convolutional neural network (CNN) models. However, because of the limited applications of the existing GPGPUs, CNN accelerators are becoming more important. The current accelerators focus on improvement in memory scheduling and architectures. Thus, the number of multiplier-accumulator (MAC) operations is not reduced. In this study, a new convolution layer operation algorithm is proposed using the coarse-to-fine method instead of hardware or architecture approaches. This algorithm is shown to reduce the MAC operations by 33%. However, the accuracy of the Top 1 is decreased only by 3% and the Top 5 only by 1%. . Furthermore, the proposed hardware accelerator demonstrates higher performance, lower power consumption, and higher energy efficiency than other ASIC implementations except for [45]. The proposed accelerator demonstrates a performance higher by 1.7×, a 65% decrease in on-chip memory, and a gate count lower by 20% compared to the hardware accelerator of [45]. Although the proposed accelerator has a larger gate count, it demonstrates higher performance, lower power consumption, energy efficiency improved by 1.7–1.8×, and a chip memory size smaller than that for the accelerator of [22].-
Appears in Collections:
Graduate School of Ajou University > Department of Electronic Engineering > 4. Theses(Ph.D)
Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse