Pervasive Artificial Intelligence Research (PAIR) Labs

Robotics and sensing technology Team

Pervasive Artificial Intelligence Research (PAIR) Labs

Robotics and sensing technology Team

Development of Learning from Human Demonstration (LfD) Robotic Systems with Navigation Capability Based on Human Action Recognition

Principal Investigator:Professor Chen-Chien Hsu

—

Summary

This project focuses on the development of a learning from demonstration (LfD) robot system, using deep learning techniques to recognize human actions and facial expressions, as well as employing algorithms such as object tracking and visual simultaneous localization and mapping (VSLAM) to allow robots capable of mobility to act in accordance with human demonstrations in an indoor environment. The first year project has established a facial expression recognition and an action recognition system through deep learning for incorporation into the LfD system framework in the future. Together with the control and motion planning through deep learning approach of the robot arm, a mimic robot system through the demonstrations of a human is established under the LfD system framework. To achieve a mobile LfD system, the first year project also builds an object tracking system on FPGA hardware for evaluating the performance of the feature detection and matching modules that we have designed and implemented.

Keywords

learning from demonstration (LfD), deep learning, generated adversarial network, facial expression recognition, action recognition, reinforcement learning, object tracking, visual simultaneous localization and mapping (VSLAM), FPGA

Innovations

We proposed the use of a hybrid two-stream architecture incorporating a LeNet and partial ResNet to build a facial expression recognition system.
We established an action recognition system based a two-stream I3D architecture, where input data are continuous RGB and optical flow images, respectively. Initial parameters of weights are set by the model pre-trained using the ImageNet database.
A mimic robot system is developed that learns the purpose of an action by investigating the overlapped trajectories of the demonstrated action. Object recognition is provided by a Yolo algorithm, and an RGB-D camera is used for deriving 3D trajectories of an action, based on which the robot arm can act according to the human demonstration.
A novel DNN-based inverse kinematics algorithm is proposed for a large-size humanoid robot. The database for training the DNN is designed by random motor angle data, and the inputs of the network include the motor angle as well as the corresponding expected step sizes.
We develop a hardware-implemented object tracking system on an FPGA platform, where SIFT and matching algorithms are optimized and designed to improve their overall hardware efficiencies. The whole object tracking system is capable of performing in real-time.

Benefits

Based on Real-world Affective Faces (RAF) database, facial recognition of 7 different expressions including anger, disgust, fear, happiness, sadness, neutral and surprise has been achieved. Real-time facial expression recognition is also achieved by extracting images from a webcam.
Based on UCF-101 database, experimental results show that a satisfactory 95.5% of success rate can be reached for the action recognition system. Real-time action recognition is also achieved by extracting a video stream of about 3 seconds from a webcam.
A UR3 6-DOF robotic arm is used to establish a mimic robot system, where objects are recognized to derive 3D trajectories of an action.
Compare with the conventional Jacobian-based inverse kinematics algorithm, the proposed DNN-based inverse kinematics algorithm can reliably and smoothly perform a particular trajectory of an action at singular points, as shown in Figure 3 (a) and (b). It is preferable to use the proposed approach for a large-size humanoid robot because only small errors occurred in large movements.
SIFT detection and matching are implemented on an FPGA to significantly improve their computational efficiency, compared to software platforms including a Nios and a PC, as shown in Table 1.

Type 1

Type 2

Type 3

Type 4

1 2 3