Real-Time Lightweight Human Parsing Based on Class Relationship Knowledge Distillation
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 황원준 | - |
dc.contributor.author | LANG YUQI | - |
dc.date.accessioned | 2025-01-25T01:35:51Z | - |
dc.date.available | 2025-01-25T01:35:51Z | - |
dc.date.issued | 2023-08 | - |
dc.identifier.other | 32930 | - |
dc.identifier.uri | https://dspace.ajou.ac.kr/handle/2018.oak/24285 | - |
dc.description | 학위논문(석사)--아주대학교 일반대학원 :인공지능학과,2023. 8 | - |
dc.description.tableofcontents | I Introduction 1 <br>II Related Works 5 <br>III Proposed Method 8 <br> 3.1 Framework Overview 9 <br> 3.2 Proposed Method 9 <br> 3.2.1 Effective model light-weighting methods 9 <br> 3.2.2 An Effective Lightweight Spatial Feature Fusion Attention Method for Human Parsing Models(LSFA) 10 <br> 3.2.3 Applying the intra-class and inter-class relationship approach to knowledge distillation 12 <br>IV. Experimental Results and Discussion 16 <br> 4.1 Dataset 16 <br> 4.2 Implementation Details 16 <br> 4.3 Inference speed and performance 17 <br> 4.4 Ablation experiment 19 <br>V Conclusion 21 <br>References 22 | - |
dc.language.iso | eng | - |
dc.publisher | The Graduate School, Ajou University | - |
dc.rights | 아주대학교 논문은 저작권에 의해 보호받습니다. | - |
dc.title | Real-Time Lightweight Human Parsing Based on Class Relationship Knowledge Distillation | - |
dc.type | Thesis | - |
dc.contributor.affiliation | 아주대학교 대학원 | - |
dc.contributor.alternativeName | LANG YUQI | - |
dc.contributor.department | 일반대학원 인공지능학과 | - |
dc.date.awarded | 2023-08 | - |
dc.description.degree | Master | - |
dc.identifier.localId | T000000032930 | - |
dc.identifier.url | https://dcoll.ajou.ac.kr/dcollection/common/orgView/000000032930 | - |
dc.subject.keyword | Human Parsing | - |
dc.subject.keyword | Knowledge Distillation | - |
dc.subject.keyword | Model Lightweight | - |
dc.description.alternativeAbstract | In the field of computer vision, understanding human objectives is a crucial and chal- <br>lenging task, as it requires recognizing and comprehending human presence and behavior in <br> <br>images or videos. Within this domain, human parsing is an extremely challenging task, as <br>it necessitates accurately locating the human region and dividing it into multiple semantic <br>areas. This is a dense prediction task that demands powerful computational capabilities <br>and high-precision models. Recently, with the continuous development of computer vision <br> <br>technologies, human parsing has been widely applied to other tasks related to human ob- <br>jectives, such as pose estimation, and human image generation. These applications are <br> <br>expected to play an increasingly important role in future artificial intelligence research. <br> <br>To achieve real-time human parsing tasks on devices with limited computational re- <br>sources, we have designed and introduced a lightweight human parsing model. We chose <br> <br>Resnet18 as the core network structure and simplified the traditional pyramid module used <br> <br>to obtain high-definition contextual information, thus significantly reducing the complex- <br>ity of the model. Additionally, to enhance the parsing accuracy of the model, we integrated <br> <br>a spatial attention fusion strategy. Our lightweight model exhibits efficient performance <br>and achieves high segmentation accuracy on the commonly used dataset for human parsing <br>tasks, Look into Person (LIP). Although traditional models perform excellently in terms of <br>segmentation accuracy, their high complexity and abundance of parameters restrict their <br>use on devices with limited computational resources. To further improve the accuracy of <br> <br>our lightweight network, we also implemented knowledge distillation techniques. The tra- <br>ditional knowledge distillation method uses the Kullback-Leibler (KL) divergence to match <br> <br>the prediction probability scores of teacher-student models. However, this approach may <br>be ineffective at learning useful knowledge when there is a significant difference between <br>the teacher and student networks. Therefore, we adopted a new distillation standard, <br>based on inter-class and intra-class relationships in prediction results, which significantly <br>improves parsing accuracy. Empirical evidence has shown that, while maintaining high <br>segmentation accuracy, our lightweight model has substantially reduced the number of <br>parameters, thereby achieving our expected goals. | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.