Small File Indexing Scheme for HDFS with Erasure Coding
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Sangyoon Oh | - |
dc.contributor.author | TEREFE ANENE BEKUMA | - |
dc.date.accessioned | 2018-11-08T08:28:10Z | - |
dc.date.available | 2018-11-08T08:28:10Z | - |
dc.date.issued | 2018-08 | - |
dc.identifier.other | 28110 | - |
dc.identifier.uri | https://dspace.ajou.ac.kr/handle/2018.oak/14060 | - |
dc.description | 학위논문(석사)--아주대학교 일반대학원 :컴퓨터공학과,2018. 8 | - |
dc.description.tableofcontents | 1. Introduction 1 2. Related Works 4 3. Backgrounds 11 3.1 HDFS Architecture 11 3.2 Erasure Coding on HDFS 16 4. The Small File Problem 19 5. Proposed Scheme 23 5.1 Design 23 5.2 The Small File Processor (SFP) 24 5.3 File Extracting 28 5.4 Reading and Writing Files 30 6. Evaluation and Results 33 6.1 Experimental Environment Setup 33 6.2 Experimental Results 35 7. Conclusion and Future Work 42 REFERENCES 43 | - |
dc.language.iso | eng | - |
dc.publisher | The Graduate School, Ajou University | - |
dc.rights | 아주대학교 논문은 저작권에 의해 보호받습니다. | - |
dc.title | Small File Indexing Scheme for HDFS with Erasure Coding | - |
dc.type | Thesis | - |
dc.contributor.affiliation | 아주대학교 일반대학원 | - |
dc.contributor.department | 일반대학원 컴퓨터공학과 | - |
dc.date.awarded | 2018. 8 | - |
dc.description.degree | Master | - |
dc.identifier.localId | 887573 | - |
dc.identifier.uci | I804:41038-000000028110 | - |
dc.identifier.url | http://dcoll.ajou.ac.kr:9080/dcollection/common/orgView/000000028110 | - |
dc.subject.keyword | Distributed File Systems | - |
dc.subject.keyword | Small File Storage | - |
dc.subject.keyword | Hadoop Distributed File System | - |
dc.subject.keyword | Erasure Coding | - |
dc.description.alternativeAbstract | Hadoop Distributed File System (HDFS) is designed to store and manage large files. It stores the file system metadata in the NameNode’s memory for high performance. Since there is a single NameNode in HDFS cluster, it suffers from high memory usage of the NameNode when processing a massive number of small files. This problem is referred as ‘the small file problem’. The small file problem occurs because HDFS stores each small file on a separate storage block in the DataNode and maintains an individual metadata on the NameNode. Researchers suggested merging small files to large file with a size of one HDFS block to reduces the memory usage of the NameNode. They considered HDFS with contiguous block layout to solve the small file problem. However, a striped block layout has been adopted by Hadoop when Erasure Coding is enabled. In this block layout, a file is divided into smaller parts of 1MB size and distributed across multiple storage blocks of a DataNode. As a result, it creates a possibility to further reduce the memory usage of the small files by increasing the merged file size to fully fill the multiple storage blocks while maintaining the same size of metadata. Therefore, we propose a new scheme for solving the small file problem that further reduces the memory usage of the NameNode by considering a striped block layout. However, it brings a new challenge as it needs a novel file indexing and file extracting methods to access the small files. We introduced a program named Small File Processor (SFP) which performs file merging and indexing on small files. A file extracting algorithm has been implemented to read the small files from their corresponding merged file. The experiment result shows that the proposed scheme reduces the memory usage of NameNode and improves the write access speed of the small files compared to the default HDFS with Erasure Coding. | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.