US 11,810,572 B2
	Multi-threaded speaker identification
Xiaozhuo Cheng, Cary, NC (US); Xiaolong Li, Cary, NC (US); and Xu Yang, Cary, NC (US)
Assigned to SAS INSTITUTE INC., Cary, NC (US)
Filed by SAS INSTITUTE INC., Cary, NC (US)
Filed on Jun. 8, 2023, as Appl. No. 18/207,433.
Application 18/207,433 is a continuation in part of application No. 17/994,554, filed on Nov. 28, 2022.
Application 17/994,554 is a continuation of application No. 17/993,385, filed on Nov. 23, 2022.
Application 17/993,385 is a continuation in part of application No. 17/851,264, filed on Jun. 28, 2022, granted, now 11,538,481, issued on Dec. 27, 2022.
Application 17/851,264 is a continuation in part of application No. 17/498,811, filed on Oct. 12, 2021, granted, now 11,373,655, issued on Jun. 28, 2022.
Application 17/498,811 is a continuation in part of application No. 17/370,441, filed on Jul. 8, 2021, granted, now 11,404,053, issued on Aug. 2, 2022.
Application 17/370,441 is a continuation of application No. PCT/CN2021/082572, filed on Mar. 24, 2021.
Application 17/498,811 is a continuation in part of application No. 17/205,871, filed on Mar. 18, 2021, granted, now 11,145,309, issued on Oct. 12, 2021.
Application 17/205,871 is a continuation in part of application No. 17/138,521, filed on Dec. 30, 2020, granted, now 11,049,502, issued on Jun. 29, 2021.
Application 17/138,521 is a continuation of application No. 17/138,445, filed on Dec. 30, 2020, granted, now 11,138,979, issued on Oct. 5, 2021.
Claims priority of provisional application 63/451,892, filed on Mar. 13, 2023.
Claims priority of provisional application 62/991,275, filed on Mar. 18, 2020.
Claims priority of provisional application 63/288,385, filed on Dec. 10, 2021.
Claims priority of provisional application 63/297,002, filed on Jan. 6, 2022.
Prior Publication US 2023/0317083 A1, Oct. 5, 2023
Int. Cl. G10L 17/00 (2013.01); G10L 15/16 (2006.01); G10L 15/26 (2006.01); G10L 15/04 (2013.01); G10L 25/78 (2013.01); G10L 25/30 (2013.01); G10L 15/02 (2006.01)

CPC G10L 15/26 (2013.01) [G10L 15/02 (2013.01); G10L 15/04 (2013.01); G10L 17/00 (2013.01); G10L 25/30 (2013.01); G10L 25/78 (2013.01); G10L 2025/783 (2013.01)]

30 Claims

1. A computer-program product embodied in a non-transitory machine-readable storage medium storing computer instructions that, when executed by one or more processors, perform operations comprising:

distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads;

executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, wherein each of the plurality of audio processing threads:

partitions a target audio data file of the plurality of audio data files into a plurality of audio data segments,

computes a plurality of embedding values based on the plurality of audio data segments,

generates, via a clustering algorithm, a plurality of clusters of audio data segments based on the plurality of embedding values, and

computes a plurality of embedding signatures for the plurality of tentative speakers participating in the target audio data file based on the plurality of clusters of audio data segments;

generating, via the clustering algorithm, a plurality of clusters of embedding signatures based on the plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files; and

detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.