센서 융합과 강화학습 기법을 이용한 쿼드콥터 드론의 자율비행에 관한 연구

Author(s)
임은수
Alternative Author(s)
Eunsoo Lim
Advisor
구형일
Department
IT융합대학원 IT융합공학과
Publisher
The Graduate School, Ajou University
Publication Year
2022-02
Language
kor
Keyword
강화학습드론센서 융합자율주행
Alternative Abstract
This paper is research about autonomous drone based on sensor fusion and reinforcement learning. 1. In outdoor drones, normally IMU and GPS sensor are used for localization. Each sensor has pros and cons. IMU has a fast update time, however, a lot of noise. GPS can give stable data regardless of time flow. But Its update rate is slow and It is dependent on the environment. So sensor fusion such as Kalman filter, Extended Kalman Filter and Unscented Kalman Filter makes better performance for localization. But in the case of traditional filters, covariances of sensors are fixed as initialized values. The proposed algorithm in this study makes covariances of sensors be adaptive according to the environment. A fuzzy system is chosen to change the value of covariance. 2. Navigation is based on Reinforcement Learning. Among a lot of reinforcement learning algorithms, this paper used the PPO algorithm. PPO algorithm is included in the policy-based algorithm. Usually, an environment with countless variables is perfect for policy-based methods like PPO. It has powerful performance in spite of simple algorithm logic. The entire Flow of the logic is, first, acquiring the current position by using dynamic EKF and 2D lidar data. Second, Earning action value of the drone from the policy neural net through inputting the estimation pose and 2D lidar data. Third, Gather data such as pose estimation and GAE until the size of the batch. Then, Update policy neural net and value neural net. Fourth, repetition of this logic until arriving goal position or limited epoch number. Conclusion of the experiment of that Dynamic EKF, PPO algorithm presents superior to others. In the case of pose estimation, the difference between ground truth and pose estimation by dynamic EKF is the smallest in filters. In addition, In considering computing time, It could be used in a real-time system. Agent on PPO algorithm arrives at goal position within smaller epoch than an agent on DQN algorithm. Especially, the PPO algorithm tuned by using grid search could arrive at goal position in only 9 trials. It means reinforcement learning could make adaptive navigation in an unknown environment.
URI
https://dspace.ajou.ac.kr/handle/2018.oak/21343
Fulltext

Appears in Collections:
Special Graduate Schools > Graduate School of IT Convergence > Department of IT Convergence Engineering > 3. Theses(Master)
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse