Recent advances in computing power made possible by developments of faster general-purpose graphics processing units (GPGPUs) have increased the complexity of convolutional neural network (CNN) models. However, because of the limited applications of the existing GPGPUs, CNN accelerators are becoming more important. The current accelerators focus on improvement in memory scheduling and architectures. Thus, the number of multiplier-accumulator (MAC) operations is not reduced. In this study, a new convolution layer operation algorithm is proposed using the coarse-to-fine method instead of hardware or architecture approaches. This algorithm is shown to reduce the MAC operations by 33%. However, the accuracy of the Top 1 is decreased only by 3% and the Top 5 only by 1%. . Furthermore, the proposed hardware accelerator demonstrates higher performance, lower power consumption, and higher energy efficiency than other ASIC implementations except for [45]. The proposed accelerator demonstrates a performance higher by 1.7×, a 65% decrease in on-chip memory, and a gate count lower by 20% compared to the hardware accelerator of [45]. Although the proposed accelerator has a larger gate count, it demonstrates higher performance, lower power consumption, energy efficiency improved by 1.7–1.8×, and a chip memory size smaller than that for the accelerator of [22].