| US 7,590,529 B2 | ||
| Method and apparatus for reducing noise corruption from an alternative sensor signal during multi-sensory speech enhancement | ||
| Zhengyou Zhang, Bellevue, Wash. (US); Amarnag Subramanya, Seattle, Wash. (US); James G. Droppo, Duvall, Wash. (US); and Zicheng Liu, Bellevue, Wash. (US) | ||
| Assigned to Microsoft Corporation, Redmond, Wash. (US) | ||
| Filed on Feb. 04, 2005, as Appl. No. 11/50,936. | ||
| Prior Publication US 2006/0178880 A1, Aug. 10, 2006 | ||
| Int. Cl. G10L 21/02 (2006.01); G10L 15/20 (2006.01) | ||
| U.S. Cl. 704—226 [704/233] | 11 Claims |

| 1. A method of determining an estimate for a noise-reduced value representing a portion of a noise-reduced speech signal,
the method comprising:
generating frames of an alternative sensor signal using an alternative sensor other than an air conduction microphone;
generating frames of an air conduction microphone signal;
identifying frames of the alternative sensor signal that contain speech;
determining whether a frame of the alternative sensor signal that contains speech is corrupted by transient noise based in
part on a frame of the air conduction microphone signal, wherein the transient noise is detected more by the alternative sensor
than by the air conduction microphone by determining a value Ft and comparing the value Ft to a threshold value, where the value Ft is determined as:
![]() where K is the number of frequency components in the frequency domain values of the frame of the alternative sensor signal
Bt and the frame of the air conduction microphone signal Yt, H is a channel response for a path from a speaker to the alternative sensor, σw2 is a variance for sensor noise of the alternative sensor, σv2 is variance for ambient noise and σH2 is the variance of a prior model for the channel response H; and
estimating the noise-reduced value based on the frame of the alternative sensor signal if the frame of the alternative sensor
signal is determined to not be corrupted by transient noise.
|