Pervasive Artificial Intelligence Research (PAIR) Labs

Computer games Team

Pervasive Artificial Intelligence Research (PAIR) Labs

Computer games Team

Studies of Applications with Deep Reinforcement Learning Technologies

Principal Investigator:Professor I-Chen Wu

—

Summary

Recently, Deep Reinforcement Learning (DRL) has been applied to many AI applications. One of the successful achievements is the AlphaGo Zero, called the Zero method in this project, was presented to learn Go playing without human knowledge and surprisingly surpass all the human players and all the AI programs. This project studies on five main topics for applications with Deep Reinforcement Learning: 1) Continue to research and develop our Go program CGI. 2) Apply the Zero method to other game AI. 3) Research on the combination of the Zero method with exact methods. 4) Research on the AI bot of video games. 5) Research on the random bin picking problem for robotic arms.

Keywords

Deep Reinforcement Learning, Reinforcement Learning, Deep Learning, Monte-Carlo Tree Search, AlphaGo Zero, Computer Games, Go, Video Games, Car Racing, Robotics, Random Bin Picking

Innovations

We propose a novel value network architecture, called a multi-labeled value network, which outputs values for different komi for the game Go, and also lowers the mean squared error.
We propose an approach to strength adjustment for MCTS-based game-playing programs. And perform a theoretical analysis, reaching the result that the adjusted policy is guaranteed to choose moves exceeding a lower bound in strength by using a threshold ratio.
We investigate whether the Zero method can also learn theoretical values and optimal plays for non-deterministic games, and develop the 2×4 Chinese Dark Chess Zero program.
We propose the hyperbolic-tangent decay, which can be applied with stochastic gradient descent.
We propose a new method for state discretization, which can discretize the perception of environmental changes, and generate the state transition diagram.
We propose a new weighted cross entropy method, which can achieve a success rate of nearly 100% in grasping tasks for robotic arms, while DDPG can only achieve 70%.
We propose a new end-to-end hybrid action space DRL method, which can greatly improve the performance of grasping and pushing tasks for robotic arms.

Benefits

By combining the multi-labelled value network with Go programs, we develop the world’s first Go Zero program that can play under different komis. The proposed method has also been published on the IEEE Transactions on Games.
We develop a computer Go lifelong learning system which is the world’s first Go system that is able to provide different strengths from beginners to super-humans. This result is selected for the show of the 2018 Future Tech. A paper for the strength adjustment has also been accepted by the top conference AAAI-19. (acceptance rate is only 1,150/7,095 = 16.2%).
The 2×4 Chinese Dark Chess Zero program we developed is the first Zero program for stochastic games in the world. A paper for this also won the best paper award in TAAI 2018 conference.
The paper of hyperbolic-tangent decay has been accepted by the IEEE WACV 2019 conference.
The proposed distributed end-to-end DRL algorithm has been successfully applied to racing games in an industrial-university joint project.
The new approach to state discretization is expected to be applied to many DRL applications.
The end-to-end hybrid action space DRL method has been accepted by the Infer2Control workshop at the NIPS 2018 conference.

Type 1

Type 2

Type 3

Type 4

1 2 3