An Efficient Fault-Tolerant and Reliable Data Integrity Framework for Object-Based Big Data Transfer Systems

Author(s)
PREETHIKA KASU
Advisor
TAE-SUN CHUNG
Department
일반대학원 인공지능학과
Publisher
The Graduate School, Ajou University
Publication Year
2022-08
Language
eng
Keyword
Big databloom filterdata integritygeo-distributed data centershigh-performance computingparallel file system
Alternative Abstract
Data has overwhelmed the digital world in terms of volume, variety, and velocity. Individuals, business organizations, computational science simulations, and experiments produce huge volumes of data on a daily basis. Often, this data is shared by data centers distributed geographically for storage and analysis. However, for transferring such huge volumes of data across geo-distributed data centers in a timely manner, data transfer tools are facing unprecedented challenges. Fault is one of the major challenges in distributed environments; hardware, network, and software might fail at any instant. Thus, high-speed and fault tolerant data transfer frameworks are vital for transferring data efficiently between the data centers. In this thesis, we propose a novel bloom filter-based data aware probabilistic fault tolerance (DAFT) mechanism to efficiently recover from such failures. We also propose a data and layout aware mechanism for fault tolerance (DLFT) to effectively handle the false positive matches of DAFT. We evaluate the data transfer and recovery time overheads of the proposed fault tolerance mechanisms on the overall data transfer performance. The experimental results demonstrate that the DAFT and DLFT mechanisms are very efficient in recovering from the faults while minimizing the memory, storage, computation, and recovery time overheads. Furthermore, we observe negligible impact on the overall data transfer performance. Protecting the integrity of data against the failures of various intermediate components involved in the end-to-end path of data transfer is a salient feature of big data transfer tools. Although most of these components provide some degree of data integrity, they are either too expensive or inefficient in recovering corrupted data. This necessitates the need to maintain application-level end-to-end integrity verification during data transfer. However, owing to the sheer size of the data, supporting end-to-end integrity verification with big data transfer tools incurs computational, memory, and storage overheads. In this thesis, we propose a cross-referencing bloom filter based data integrity verification framework for big data transfer systems. This framework has three advantages over state-of-the-art data integrity techniques: lower computation and memory overhead, and zero false-positive errors for a restricted number of elements. We evaluate the computation, memory, recovery time, and false-positive overhead of the proposed framework and compare them with state-of-the-art solutions. The evaluation results show that the proposed framework is very efficient in detecting and recovering from integrity errors while eliminating false-positives of the bloom filter data structure. In addition, we observe negligible computation, memory, and recovery overheads for all workloads.
URI
https://dspace.ajou.ac.kr/handle/2018.oak/21195
Fulltext

Appears in Collections:
Graduate School of Ajou University > Department of Artificial Intelligence > 4. Theses(Ph.D)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse