Pervasive Artificial Intelligence Research (PAIR) Labs

Robotics and sensing technology Team

Pervasive Artificial Intelligence Research (PAIR) Labs

Robotics and sensing technology Team

A Deep Learning-based Gesture Interface for Value-Added Location Services

Principal Investigator:Professor Kuo-Chin Fan

—

Summary

A contactless human computer interaction system integrating stationary and mobile interfaces is proposed in this project. The system consists of three major components providing different styles of usage models so that easy and joyful ways of interacting with various applications can be achieved. Users can operate the system in a freehand manner. They can also acquire location-related information by taking pictures of interested signs on streets. Well-classified site-specific images can be obtained automatically when necessary. All usage traces of user interactions are modeled underneath, leading to a more precise user recommendation scheme. We have implemented a unified framework, consisting of all these modules, using deep learning technologies.

Keywords

Interactive System, Contactless Human Computer Interaction, Location Semantics, Freehand Input, Recommendation Scheme

Innovations

A contactless human computer interaction system consisting of a stationary interface, a mobile interface and underneath usage footprint models is proposed, as shown in Figure 1.
We train several deep neural networks for gesture recognition and apply them in a context-aware manner to enhance recognition rate and provide a user-friendly interactive experience. Figure 2 demonstrates some examples.
An easy way for multi-language input is proposed via the virtual keyboard, which enables users to operate in air.
Store-related information can be automatically displayed on mobile devices equipped with cameras by taking a picture on the texts of store signs. We make this scenario possible by text recognizing in complicated street view images. See Figure 3 for an illustration.
We propose an approach to classify unlabeled images related to a specific site and arrange them into pre-defined categories, providing users easy glances at well-classified photos and filtering unrelated photos of the site.
For user recommendation, we design RNN-based models with several new characteristics including variable, feature-enhanced LSTM and sequence planning network architectures.

Benefits

We design the gesture recognition modules by combining 3DCNN mechanism with LSTM, which extract both spatial and temporal features at the same time and obtain more satisfactory recognizing results.
The text recognition technique for street view images developed in this project is robust, despite the fact that store/traffic signs may contain horizontal or vertical texts.
In addition to developing unlabeled image categorizing tools, we also implement a Web interface. By entering one specific location URL of Google map into the Web interface, well-classified images of that particular location will be displayed.
RNN-based movie and tourism recommendation modules have been developed and preliminary results demonstrate their effectiveness. In the dynamic recommendation, e.g. tourism suggestion, we consider both distance and timing factors simultaneously to achieve more reliable results.

Type 1

Type 2

Type 3

Type 4

1 2 3