AJOU Central Library Repository: An Optimized Storage Architecture for Improving ML Platforms Provisioned with Underlying Deduplication Enabled Storage Clusters

BROWSE

Graduate School of Ajou University Department of Artificial Intelligence 4. Theses(Ph.D)

An Optimized Storage Architecture for Improving ML Platforms Provisioned with Underlying Deduplication Enabled Storage Clusters

DC Field	Value	Language
dc.contributor.advisor	Tae-Sun Chung	-
dc.contributor.author	HAMANDAWANA PRINCE	-
dc.date.accessioned	2022-11-29T02:32:42Z	-
dc.date.available	2022-11-29T02:32:42Z	-
dc.date.issued	2021-02	-
dc.identifier.other	30533	-
dc.identifier.uri	https://dspace.ajou.ac.kr/handle/2018.oak/20268	-
dc.description	학위논문(박사)--아주대학교 일반대학원 :인공지능학과,2021. 2	-
dc.description.tableofcontents	1 Introduction 1 1.1 Introduction 1 1.2 Contributions of This Dissertation 6 2 Background 7 2.1 Related Works 7 2.1.1 ML Pipeline and Data Access Patterns 9 2.1.2 Data Duplicate Elimination 14 Cluster-scale Data Deduplication 15 2.2 Motivation 20 2.2.1 Dedup challenges in ML and DL platforms 20 2.2.2 Preliminary Analysis 22 3 Proposed REDUP architecture 27 3.1 REDUP Overview 27 3.1.1 Goals 27 3.1.2 System Overview 29 RCR cache pool 31 RD cache pool 32 3.2 Design and Implementation 33 3.2.1 Hierarchical Object Caching 33 3.2.2 RCR Cache 34 RFCB caching policy 34 LA-caching policy 35 RCR cache admission and replacement flow 36 3.2.3 REDUP Object 37 3.2.4 RD Cache 37 RCO HitSet 38 HA-EB caching policy 39 RD cache admission flow 40 RD cache replacement flow 42 3.2.5 Cache Throttling Manager 44 4 Performance Evaluation 46 4.1 Evaluation 46 4.1.1 Testbed Setup 48 Neural Networks and ML platforms 48 4.1.2 Performance Analysis with CNNs 49 4.1.3 Performance analysis with RNNs 52 4.1.4 Hit Ratio Analysis 54 Read Hit Ratio 55 Effect of incremental component addition in REDUP 55 4.1.5 Effect of RD SSD Temp 57 4.1.6 Cumulative distribution function of Training progress 59 5 Conclusions 60 5.1 Conclusion 60 References 62	-
dc.language.iso	eng	-
dc.publisher	The Graduate School, Ajou University	-
dc.rights	아주대학교 논문은 저작권에 의해 보호받습니다.	-
dc.title	An Optimized Storage Architecture for Improving ML Platforms Provisioned with Underlying Deduplication Enabled Storage Clusters	-
dc.type	Thesis	-
dc.contributor.affiliation	아주대학교 일반대학원	-
dc.contributor.department	일반대학원 인공지능학과	-
dc.date.awarded	2021. 2	-
dc.description.degree	Doctoral	-
dc.identifier.localId	T000000030533	-
dc.identifier.uci	I804:41038-000000030533	-
dc.identifier.url	http://dcoll.ajou.ac.kr:9080/dcollection/common/orgView/000000030533	-
dc.subject.keyword	Machine Learning based storage architectures.	-
dc.description.alternativeAbstract	The advancement and ubiquitousness of Machine Learning (ML) is unarguably the new wave driving modern day and future enterprise computing platforms. However, the incessant deluge of ML associated data, collected from millions of data sources presents data storage challenges. Continuous scaling of the storage to meet the ML storage demands results in unwarranted escalating storage demands. However, there exists a lot of duplicate data in ML/DL related workloads in which if eliminated will result in significant amortization of storage costs. The adoption of deduplication provisioned storage has been so far a storage cost cutting driver in today’s enterprise clusters. However, these large scale machine learning (ML) platforms are facing challenges when integrated with deduplication enabled storage clusters. In the quest to achieve smart and efficient storage utilization, removal of duplicate data introduce bottlenecks since deduplication alters the I/O transaction layout of the storage system. Therefore, it is critical to address such deduplication overhead for acceleration of ML/DL computation in deduplication storage. Existing state of the art ML/DL storage solutions such as Alluxio and Auto-Cache adopt non deduplication-aware caching mechanisms which lacks the much needed performance boost when adopted in deduplication enabled ML/DL clusters. In this paper, we introduce REDUP, which eliminates the performance drop caused by enabling deduplication in ML/DL storage clusters. At the core, is a REDUP Caching Manager (RDCM), composed of a 2-tier deduplication layout-aware caching mechanism. The RDCM provides an abstraction of the underlying deduplication storage layout to ML/DL applications and provisions a decoupled acceleration of object reconstruction during ML/DL read operations. Our REDUP evaluation shows negligible performance drop in ML/DL training performances as compared to a baseline cluster without deduplication. When compared to other state-of-the-art solutions, our proposed design outperforms Alluxio and Auto-Cache by 16% in worst case scenario, in terms of training speed.	-

Appears in Collections:: Graduate School of Ajou University > Department of Artificial Intelligence > 4. Theses(Ph.D)

Files in This Item:: There are no files associated with this item.

Show simple item record

qrcode

트윗하기

License

STATISTICS: Total Visit :5,119,698; Total Download :2,118; Today View :9,321

AJOU Central Library Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.

BROWSE

Browse