AJOU Central Library Repository: 센서 융합과 강화학습 기법을 이용한 쿼드콥터 드론의 자율비행에 관한 연구

BROWSE

Special Graduate Schools Graduate School of IT Convergence Department of IT Convergence Engineering 3. Theses(Master)

센서 융합과 강화학습 기법을 이용한 쿼드콥터 드론의 자율비행에 관한 연구

Author(s): 임은수

Alternative Author(s): Eunsoo Lim

Advisor: 구형일

Department: IT융합대학원 IT융합공학과

Publisher: The Graduate School, Ajou University

Publication Year: 2022-02

Language: kor

Keyword: 강화학습; 드론; 센서 융합; 자율주행

Alternative Abstract: This paper is research about autonomous drone based on sensor fusion and reinforcement learning. 1. In outdoor drones, normally IMU and GPS sensor are used for localization. Each sensor has pros and cons. IMU has a fast update time, however, a lot of noise. GPS can give stable data regardless of time flow. But Its update rate is slow and It is dependent on the environment. So sensor fusion such as Kalman filter, Extended Kalman Filter and Unscented Kalman Filter makes better performance for localization. But in the case of traditional filters, covariances of sensors are fixed as initialized values. The proposed algorithm in this study makes covariances of sensors be adaptive according to the environment. A fuzzy system is chosen to change the value of covariance. 2. Navigation is based on Reinforcement Learning. Among a lot of reinforcement learning algorithms, this paper used the PPO algorithm. PPO algorithm is included in the policy-based algorithm. Usually, an environment with countless variables is perfect for policy-based methods like PPO. It has powerful performance in spite of simple algorithm logic. The entire Flow of the logic is, first, acquiring the current position by using dynamic EKF and 2D lidar data. Second, Earning action value of the drone from the policy neural net through inputting the estimation pose and 2D lidar data. Third, Gather data such as pose estimation and GAE until the size of the batch. Then, Update policy neural net and value neural net. Fourth, repetition of this logic until arriving goal position or limited epoch number. Conclusion of the experiment of that Dynamic EKF, PPO algorithm presents superior to others. In the case of pose estimation, the difference between ground truth and pose estimation by dynamic EKF is the smallest in filters. In addition, In considering computing time, It could be used in a real-time system. Agent on PPO algorithm arrives at goal position within smaller epoch than an agent on DQN algorithm. Especially, the PPO algorithm tuned by using grid search could arrive at goal position in only 9 trials. It means reinforcement learning could make adaptive navigation in an unknown environment.

URI: https://dspace.ajou.ac.kr/handle/2018.oak/21343

Fulltext

Appears in Collections:: Special Graduate Schools > Graduate School of IT Convergence > Department of IT Convergence Engineering > 3. Theses(Master)

Files in This Item:: There are no files associated with this item.

Export: RIS (EndNote); XLS (Excel); XML

Show full item record

qrcode

트윗하기

License

STATISTICS: Total Visit :4,040,317; Total Download :1,962; Today View :6,369

AJOU Central Library Repository는 국립중앙도서관 OAK 보급사업으로 구축되었습니다.

BROWSE

Browse