US 7,526,361 B2
Robotics visual and auditory system
Kazuhiro Nakadai, Wako (Japan); Hiroshi Okuno, Wako (Japan); and Hiroaki Kitano, Kawagoe (Japan)
Assigned to Honda Motor Co., Ltd., Tokyo (Japan)
Appl. No. 10/506,167
PCT Filed Aug. 30, 2002, PCT No. PCT/JP02/08827
§ 371(c)(1), (2), (4) Date Feb. 09, 2006,
.
Claims priority of application No. 2002-056670 (JP), filed on Mar. 01, 2002.
Prior Publication US 2006/0241808 A1, Oct. 26, 2006
Int. Cl. G06F 19/00 (2006.01)
U.S. Cl. 700—245  [700/258; 700/259; 318/568.1; 901/1] 5 Claims
OG exemplary drawing
 
1. Robotics visual and auditory system comprising:
an audition module including at least a pair of microphones for collecting external sounds;
a face module including a camera for taking images in front of a robot;
a stereo module for extracting a matter by a stereo camera;
a motor control module including a drive motor for rotating the robot in the horizontal direction;
an association module for generating streams by associating events from said audition, said face, said stereo, and said motor control modules; and
an attention control module for conducting attention control based on the stream generated by said association module; characterized in that:
said audition module determines at least one speaker's direction from the sound source separation and localization by grouping based on pitch extraction and harmonic wave structure, based on sound signal from the microphones, and extracts a auditory event;
said face module identifies each speaker from each speaker's face recognition and localization based on the image taken by the camera, and extracts a face event;
said stereo module extracts a stereo event by extraction and localization of a longitudinally long matter based on a disparity extracted from the image taken by the stereo camera;
said motor control module extracts a motor event based on the rotational position of the drive motor; and thereby
said association module determines each speaker's direction based on directional information of sound source localization by the auditory event, face localization by the face event, and matter localization by the stereo event from the auditory, face, stereo, and motor events, generates a auditory, a face, and a stereo streams by connecting the events in the temporal direction using a Kalman filter, and further generates a association stream by associating these;
said attention control module conducts attention control based on said streams, and drive-control of the motor based on a result of planning for the action accompanying those; and
said audition module collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module, and conducts sound source separation by restructuring a wave shape of a sound source.