Development of the Joint Attention with a New Face Tracking Method for Multiple People

Development of the Joint Attention with a New Face Tracking Method for Multiple People

- 指導教授黃漢邦博士研究生簡宏景

- Advisor :Dr.Han-Pang Huang Student :Hung-Jing Jian

Lab. of Robotics., Department of Mechanical Engineering National Taiwan University Taiwan

Abstract:
This thesis aims to develop a system for multiple objects tracking and joint attention between people and robot. We propose a new method (Modified Multi-CAMSHIFT, MMC), which is based on the characteristics of color and shape probability distribution, to solve the tracking problems for multiple objects. The color cue information is calculated by MMC that improves from CAMSHIFT theory. And the shape cue information is calculated by procedure of Scharr kernel mask. Then we calculate out color histogram and orientation histogram respectively, and use the Adaptive Feature Selection for optimal tracking. For judging face or non-face regions, we have included Eyes-pair Fast Extracting. Our proposed MMC is based on adaptive multi-resolution (AMR) framework for reducing computation. The experimental results show that based on all the mechanisms mentioned above, the proposed MMC is a tracking method that performs satisfactory effects.
After finding human faces, we tell the direction of each human face, and research the human-robot interaction between human and robot that is called Joint Attention. We establish joint attention with a human by utilizing both static and dynamic information. As the static information, we extract the edge image of the human face when he/she is gazing at the object. As the dynamic information, the robot uses the optical flow detected when observing a human who is shifting his/her gaze from looking at the robot to looking at another object. The static and dynamic information have complementary characteristics. The static information gives the exact direction of gaze, even though it is difficult to interpret. On the other hand, the dynamic information provides a rough direction but it is easily understandable relationship between the direction of gaze shift and motor output to follow the gaze. We use Support Vector Machine (SVM) for learning model. Utilizing both static and dynamic information acquired from observing a human’s gaze shift enables the robot to efficiently acquire joint attention ability and to naturally interact with the human by SVM. The dynamic information accelerates the learning of joint attention while the static information improves the task performance. From experiment results, the proposed Modified Multi-CAMSHIFT was successfully applied to multiple faces tracking and the development of the Joint Attention.

中文摘要:

本論文主要目的為研究多人人臉追蹤方法以及人形機器人與使用者之間的共同注意力發展。我們發展出一套新型的多人人臉追蹤的方法(Modified Multi-CAMSHIFT, MMC)來實現多物件追蹤，其利用結合色彩跟形狀兩種主要資訊，可以更有效的來找出並追蹤影像中所有人臉的位置。色彩資訊是利用我們所發展的 Modified Multi-CAMSHIFT理論計算而得；形狀資訊是使用Scharr kernel mask求得。再分別計算出兩者的色彩和方向分佈直方圖，代入特徵選擇機制(Adaptive Feature Selection)裡面做最佳化追蹤判斷。為了分辨出人臉區域跟非人臉區域，我們加入雙眼快速取出機制(Eyes-pair Fast Extracting)。我們提出的多人人臉追蹤的方法，都是在適應性多重解析度(Adaptive Multi-Resolution)下進行運算，可以減少影像處理運算量。實驗結果顯示，加入上述種種機制，我們提出的多人人臉追蹤方法 (Modified Multi-CAMSHIFT )是一個效果很好的追蹤方法。

找出人臉後，再進一步來判斷出每個人臉的方向，研究其與機器人之間的互動情形，亦即共同注意力(Joint Attention)。我們使用靜態及動態兩種資訊，來判斷人臉的方向。動態資訊是利用光流(Optical Flow)來觀察計算當使用者的注視方向從看機器人轉移到看另一個目標物時的運動資訊。而靜態資訊為當人臉注視某目標物時，所計算出的人臉邊界影像資訊。靜態和動態資訊有互補的特性，前者雖然演算法很複雜但是可以給予精確的注視方向。另一方面，後者提供粗略資訊但是可以很容易來理解注視方向上的轉移和馬達跟隨著使用者視線轉移輸出之間的關係。學習模式是利用支撐向量機(SVM)，從觀察使用者的視線移動獲得的靜態和動態兩種資訊，使得機器人能夠有效地獲得共同注意力能力和與人自然的互動。動態資訊搭配靜態資訊可以加速共同注意力的獲悉而提升整體的性能。

我們將上述的方法以及理論，成功的實現多人人臉追蹤與共同注意力發展。