US 11,705,109 B2
Detection of live speech
John Paul Lesso, Edinburgh (GB); and Toru Ido, Tokyo (JP)
Assigned to Cirrus Logic, Inc., Austin, TX (US)
Filed by Cirrus Logic International Semiconductor Ltd., Edinburgh (GB)
Filed on Nov. 6, 2020, as Appl. No. 17/91,316.
Claims priority of provisional application 62/938,377, filed on Nov. 21, 2019.
Prior Publication US 2021/0158797 A1, May 27, 2021
Int. Cl. G10L 15/06 (2013.01); G10L 19/26 (2013.01); G10L 25/78 (2013.01); G10L 25/93 (2013.01)
CPC G10L 15/06 (2013.01) [G10L 19/26 (2013.01); G10L 25/78 (2013.01); G10L 2025/937 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A method of detecting live speech, the method comprising:
receiving a signal containing speech;
obtaining a first component of the received signal in a first frequency band, wherein the first frequency band includes audio frequencies;
obtaining a second component of the received signal in a second frequency band higher than said first frequency band;
detecting modulation of the first component of the received signal;
detecting modulation of the second component of the received signal;
comparing the modulation of the first component of the received signal and the modulation of the second component of the received signal, wherein comparing the modulation of the first component of the received signal and the modulation of the second component of the received signal comprises:
obtaining a first parameter relating to an amount of modulation of the first component of the received signal; and
obtaining a second parameter relating to an amount of modulation of the second component of the received signal; and
determining whether the speech is live speech, depending on a result of comparing the modulation of the first component of the received signal and the modulation of the second component of the received signal, wherein determining that the speech may not be live speech if the modulation of the first component of the received signal differs from the modulation of the second component of the received signal comprises determining that the speech may not be live speech if the first parameter exceeds a first threshold, and the second parameter does not exceed a second threshold.