US 11,756,555 B2
	Biometric authentication through voice print categorization using artificial intelligence
Natan Katz, Tel Aviv (IL); and Tal Haguel, Petach Tikva (IL)
Assigned to NICE LTD., Ra'anana (IL)
Filed by NICE LTD., Ra'anana (IL)
Filed on May 6, 2021, as Appl. No. 17/313,040.
Prior Publication US 2022/0358933 A1, Nov. 10, 2022
Int. Cl. G10L 17/06 (2013.01); G10L 17/04 (2013.01); G06N 3/04 (2023.01); G10L 19/00 (2013.01); G10L 17/18 (2013.01)

CPC G10L 17/06 (2013.01) [G06N 3/04 (2013.01); G10L 17/04 (2013.01); G10L 17/18 (2013.01); G10L 19/00 (2013.01)]

20 Claims

1. A biometric authentication system configured to categorize voice prints during a voice authentication, the biometric authentication system comprising:

a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform voice authentication operations which comprise:

receiving an enrollment of a user in the biometric authentication system;

requesting a first voice print comprising a sample of a voice of the user;

receiving the first voice print of the user during the enrollment;

accessing a plurality of categorizations of the voice prints for the voice authentication, wherein each of the plurality of categorizations comprises a portion of the voice prints based on a plurality of similarity scores of distinct voice prints in the portion to a plurality of other voice prints;

processing the first voice print using a neural network comprising an input layer for features from the voice print, a plurality of hidden layers, and an output layer for a classification of the first voice print;

determining, using a hidden layer of the plurality of hidden layers of the neural network, an embedding of the features for the first voice print, wherein the embedding determined using the hidden layer is separate from the classification provided by the output layer;

calculating a first similarity score between the embedding of the features for the first voice print and a plurality of embeddings of the voice prints for other users, wherein the first similarity score is unique for the user and distinct from instances of similarity scores between each of the plurality of embeddings for each of the voice prints;

determining one of the plurality of categorizations of the voice prints for the first voice print based on the first similarity score; and

encoding the first voice print with a value identifying the one of the plurality of categorizations for the embedding.