CPC G10L 15/26 (2013.01) [G10L 15/02 (2013.01); G10L 15/04 (2013.01); G10L 17/00 (2013.01); G10L 25/30 (2013.01); G10L 25/78 (2013.01); G10L 2025/783 (2013.01)] | 30 Claims |
1. A computer-program product embodied in a non-transitory machine-readable storage medium storing computer instructions that, when executed by one or more processors, perform operations comprising:
distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads;
executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, wherein each of the plurality of audio processing threads:
partitions a target audio data file of the plurality of audio data files into a plurality of audio data segments,
computes a plurality of embedding values based on the plurality of audio data segments,
generates, via a clustering algorithm, a plurality of clusters of audio data segments based on the plurality of embedding values, and
computes a plurality of embedding signatures for the plurality of tentative speakers participating in the target audio data file based on the plurality of clusters of audio data segments;
generating, via the clustering algorithm, a plurality of clusters of embedding signatures based on the plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files; and
detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.
|