US 7,567,900 B2
Harmonic structure based acoustic speech interval detection method and device
Tetsu Suzuki, Neyagawa (Japan); Takeo Kanamori, Hirakata (Japan); and Takashi Kawamura, Settu (Japan)
Assigned to Panasonic Corporation, Osaka (Japan)
Appl. No. 10/542,931
PCT Filed Jun. 03, 2004, PCT No. PCT/JP2004/008051
§ 371(c)(1), (2), (4) Date Jul. 21, 2005,
PCT Pub. No. WO2004/111996, PCT Pub. Date Dec. 23, 2004.
Claims priority of application No. 2003-165946 (JP), filed on Jun. 11, 2003.
Prior Publication US 2006/0053003 A1, Mar. 09, 2006
Int. Cl. G10L 15/20 (2006.01); G10L 11/06 (2006.01); G10L 15/00 (2006.01); G10L 17/00 (2006.01)
U.S. Cl. 704—233  [704/248; 704/231; 704/208] 16 Claims
OG exemplary drawing
 
1. A harmonic structure acoustic signal detection method for detecting a segment that includes speech, as a speech segment, from an input acoustic signal which is divided into a plurality of frames with a predetermined period, said harmonic structure acoustic signal detection method comprising:
an acoustic feature extraction step of extracting an acoustic feature using a processor in each frame of the plurality of frames into which the input acoustic signal is divided; and
a segment determination step of evaluating a continuity of the extracted acoustic features and of determining a speech segment according to the evaluated continuity,
wherein said acoustic feature extraction step includes:
a frequency transformation step of frequency-transforming each frame of the plurality of frames to obtain components;
a correlation value calculation step of dividing the components obtained through said frequency transformation step into frequency bands of a predetermined bandwidth and calculating correlation a value between components in predetermined frequency bands in different frames;
a weight calculation step of calculating a weight, in a same frame or between adjacent frames, the calculated weight, when a difference between a maximum value of correlation values and a minimum value of the correlation values is larger than a threshold value, being smaller than the calculated weight when the difference between the maximum value of the correlation values and the minimum value of the correlation values is smaller than the threshold; and
a harmonic structure acoustic feature extraction step of extracting the acoustic feature that is a value of a harmonic structure represented by a number, using a product of the correlation value calculated in said correlation value calculating step and the weight calculated in said weight calculation step, and
wherein, in said segment determination step, the speech segment is determined based on at least one of a correlation value between acoustic features in the same frame and a correlation value between acoustic features in different frames.