In the field of computer vision, understanding human objectives is a crucial and chal-
<br>lenging task, as it requires recognizing and comprehending human presence and behavior in
<br>
<br>images or videos. Within this domain, human parsing is an extremely challenging task, as
<br>it necessitates accurately locating the human region and dividing it into multiple semantic
<br>areas. This is a dense prediction task that demands powerful computational capabilities
<br>and high-precision models. Recently, with the continuous development of computer vision
<br>
<br>technologies, human parsing has been widely applied to other tasks related to human ob-
<br>jectives, such as pose estimation, and human image generation. These applications are
<br>
<br>expected to play an increasingly important role in future artificial intelligence research.
<br>
<br>To achieve real-time human parsing tasks on devices with limited computational re-
<br>sources, we have designed and introduced a lightweight human parsing model. We chose
<br>
<br>Resnet18 as the core network structure and simplified the traditional pyramid module used
<br>
<br>to obtain high-definition contextual information, thus significantly reducing the complex-
<br>ity of the model. Additionally, to enhance the parsing accuracy of the model, we integrated
<br>
<br>a spatial attention fusion strategy. Our lightweight model exhibits efficient performance
<br>and achieves high segmentation accuracy on the commonly used dataset for human parsing
<br>tasks, Look into Person (LIP). Although traditional models perform excellently in terms of
<br>segmentation accuracy, their high complexity and abundance of parameters restrict their
<br>use on devices with limited computational resources. To further improve the accuracy of
<br>
<br>our lightweight network, we also implemented knowledge distillation techniques. The tra-
<br>ditional knowledge distillation method uses the Kullback-Leibler (KL) divergence to match
<br>
<br>the prediction probability scores of teacher-student models. However, this approach may
<br>be ineffective at learning useful knowledge when there is a significant difference between
<br>the teacher and student networks. Therefore, we adopted a new distillation standard,
<br>based on inter-class and intra-class relationships in prediction results, which significantly
<br>improves parsing accuracy. Empirical evidence has shown that, while maintaining high
<br>segmentation accuracy, our lightweight model has substantially reduced the number of
<br>parameters, thereby achieving our expected goals.