US 11,810,435 B2
System and method for audio event detection in surveillance systems
Asif Salekin, Charlottesville, VA (US); Zhe Feng, Mountain View, CA (US); and Shabnam Ghaffarzadegan, San Mateo, CA (US)
Assigned to Robert Bosch GmbH, Stuttgart (DE)
Appl. No. 16/976,462
Filed by Robert Bosch GmbH, Stuttgart (DE)
PCT Filed Feb. 20, 2019, PCT No. PCT/EP2019/054196
§ 371(c)(1), (2) Date Aug. 27, 2020,
PCT Pub. No. WO2019/166296, PCT Pub. Date Sep. 6, 2019.
Claims priority of provisional application 62/636,185, filed on Feb. 28, 2018.
Prior Publication US 2021/0005067 A1, Jan. 7, 2021
Int. Cl. G08B 13/16 (2006.01); G06F 16/683 (2019.01); G06N 3/049 (2023.01); G10L 15/26 (2006.01); G10L 25/30 (2013.01); G06N 3/045 (2023.01); G10L 15/16 (2006.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01); G10L 25/81 (2013.01); G10L 25/84 (2013.01); G10L 15/08 (2006.01)
CPC G08B 13/1672 (2013.01) [G06F 16/683 (2019.01); G06N 3/045 (2023.01); G06N 3/049 (2013.01); G10L 15/16 (2013.01); G10L 15/26 (2013.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01); G10L 25/81 (2013.01); G10L 25/84 (2013.01); G10L 2015/088 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for detecting and localizing a target audio event in an audio clip, the method comprising:
receiving, with a processor, an audio clip;
determining, with the processor, a plurality of audio features based on the audio clip;
determining, with the processor, whether the target audio event is present in the audio clip using a first neural network based on the plurality of audio features;
determining, with the processor, in response to determining that the target audio event is present in the audio clip, a plurality of vectors based on (i) the plurality of audio features and (ii) the target audio event, the vectors in the plurality of vectors indicating a correlation between audio features in the plurality of audio features and the target audio event; and
determining, with the processor, a position in time of the target audio event within the audio clip using a second neural network based on the plurality of vectors.