This paper is research about autonomous drone based on sensor fusion and reinforcement learning.
1. In outdoor drones, normally IMU and GPS sensor are used for localization. Each sensor has pros and cons. IMU has a fast update time, however, a lot of noise. GPS can give stable data regardless of time flow. But Its update rate is slow and It is dependent on the environment. So sensor fusion such as Kalman filter, Extended Kalman Filter and Unscented Kalman Filter makes better performance for localization. But in the case of traditional filters, covariances of sensors are fixed as initialized values. The proposed algorithm in this study makes covariances of sensors be adaptive according to the environment. A fuzzy system is chosen to change the value of covariance.
2. Navigation is based on Reinforcement Learning. Among a lot of reinforcement learning algorithms, this paper used the PPO algorithm. PPO algorithm is included in the policy-based algorithm. Usually, an environment with countless variables is perfect for policy-based methods like PPO. It has powerful performance in spite of simple algorithm logic.
The entire Flow of the logic is, first, acquiring the current position by using dynamic EKF and 2D lidar data. Second, Earning action value of the drone from the policy neural net through inputting the estimation pose and 2D lidar data. Third, Gather data such as pose estimation and GAE until the size of the batch. Then, Update policy neural net and value neural net. Fourth, repetition of this logic until arriving goal position or limited epoch number.
Conclusion of the experiment of that Dynamic EKF, PPO algorithm presents superior to others. In the case of pose estimation, the difference between ground truth and pose estimation by dynamic EKF is the smallest in filters. In addition, In considering computing time, It could be used in a real-time system.
Agent on PPO algorithm arrives at goal position within smaller epoch than an agent on DQN algorithm. Especially, the PPO algorithm tuned by using grid search could arrive at goal position in only 9 trials. It means reinforcement learning could make adaptive navigation in an unknown environment.