AJOU Central Library Repository: Can hierarchical client clustering mitigate the data heterogeneity effect in federated learning?

BROWSE

Graduate School of Ajou University Department of Artificial Intelligence 3. Theses(Master)

Can hierarchical client clustering mitigate the data heterogeneity effect in federated learning?

Author(s): 이승준

Alternative Author(s): Seungjun Lee

Advisor: 오상윤

Department: 일반대학원 인공지능학과

Publisher: The Graduate School, Ajou University

Publication Year: 2023-02

Language: eng

Keyword: client clustering; data heterogeneity; federated learning; hierarchical aggregation

Abstract: 연합 학습(federated learning)은 수십만 개에 달하는 사용자 데이터를 사용하여 심층 신경망을 학습시키기 위하여 제안되었다. 이 기법은 개인정보를 보호할 수 있다는 특징 덕분에 많은 관심을 받아 왔다. 하지만 아직 풀어야할 중요한 문제가 남아 있다. 첫 번째는 동시에 참여 가능한 클라이언트 수의 한계이다. 클라이언트의 수가 증가할 경우 하나만 존재하는 파라미터 서버가 쉽게 병목 지점이 될 수 있으며 또한 낙오자(straggler)가 발생하기 쉬워진다. 두 번째는 데이터 이질성 문제로 전역 모델(global model)의 정확도에 악영향을 끼치는 문제이다. 개인 정보를 보호하기 위하여 사용자 데이터는 사용자 기기에 남아있어야 하기에 기존 분산 심층 학습에서 데이터를 균질하게 만들기 위해 사용하던 데이터 섞기는 사용하기 어렵다. 이 연구에서는 동시에 참여 가능한 클라이언트의 수를 늘리고 동시에 데이터 이질성 문제를 완화하기 위한 CCFed라고 불리는 클라이언트 클러스터링 및 모델 취합(model aggregation) 방법을 제안한다. CCFed는 집합 분할 문제(set partition problem)을 사용하여 클러스터간 데이터가 균질하게 분배되도록 하고 이를 통해 비항등독립분포의 영향을 완하하여 학습 성능이 향상되도록 한다. 본 연구의 실험에서는 CCFed가 FedAvg와 비교했을 때 벤치마크 데이터셋에서 FedAvg 대비 약 50%의 라운드만으로 약 2.5에서 7%p의 정확도 향상이 있음을 보여주었다.

Alternative Abstract: Federated learning (FL) was proposed for training a deep neural network model using millions of user data. The technique has attracted considerable attention owing to its privacy-preserving characteristic. However, two major challenges exist. The first is the limitation of simultaneously participating clients. If the number of clients increases, the single parameter server easily becomes a bottleneck and is prone to have stragglers. The second is data heterogeneity, which adversely affects the accuracy of the global model. Because data should remain at user devices to preserve privacy, we cannot use data shuffling, which is used to homogenize training data in traditional distributed deep learning. This work proposes a client clustering and model aggregation method, CCFed, to increase the number of simultaneously participating clients and mitigate the data heterogeneity problem. CCFed improves the learning performance using set partition modeling to let data be evenly distributed between clusters and mitigate the effect of a non-IID environment. Experiments show that CCFed can achieve a 2.5-7%p higher accuracy using CCFed compared with FedAvg, where CCFed requires only approximately 50% of rounds compared with FedAvg training on benchmark datasets.

URI: https://dspace.ajou.ac.kr/handle/2018.oak/24569

Fulltext

Appears in Collections:: Graduate School of Ajou University > Department of Artificial Intelligence > 3. Theses(Master)

Files in This Item:: There are no files associated with this item.

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

qrcode

트윗하기

License

STATISTICS: Total Visit :4,682,905; Total Download :2,059; Today View :2,367

AJOU Central Library Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.

BROWSE

Browse