CPC G08B 13/1672 (2013.01) [G06F 16/683 (2019.01); G06N 3/045 (2023.01); G06N 3/049 (2013.01); G10L 15/16 (2013.01); G10L 15/26 (2013.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01); G10L 25/81 (2013.01); G10L 25/84 (2013.01); G10L 2015/088 (2013.01)] | 20 Claims |
1. A method for detecting and localizing a target audio event in an audio clip, the method comprising:
receiving, with a processor, an audio clip;
determining, with the processor, a plurality of audio features based on the audio clip;
determining, with the processor, whether the target audio event is present in the audio clip using a first neural network based on the plurality of audio features;
determining, with the processor, in response to determining that the target audio event is present in the audio clip, a plurality of vectors based on (i) the plurality of audio features and (ii) the target audio event, the vectors in the plurality of vectors indicating a correlation between audio features in the plurality of audio features and the target audio event; and
determining, with the processor, a position in time of the target audio event within the audio clip using a second neural network based on the plurality of vectors.
|