US 11,705,219 B2
	Deep learning-based variant classifier
Ole Schulz-Trieglaff, Cambridge (GB); Anthony James Cox, Cambridge (GB); and Kai-How Farh, San Mateo, CA (US)
Assigned to Illumina, Inc., San Diego, CA (US); and Illumina Cambridge Limited, Cambridge (GB)
Filed by Illumina, Inc., San Diego, CA (US); and Illumina Cambridge Limited, Cambridge (GB)
Filed on Jan. 14, 2019, as Appl. No. 16/247,487.
Claims priority of provisional application 62/617,552, filed on Jan. 15, 2018.
Prior Publication US 2019/0220704 A1, Jul. 18, 2019
Int. Cl. G16B 40/20 (2019.01); G16B 20/20 (2019.01); G06F 18/214 (2023.01); G06F 18/2431 (2023.01); G06N 3/045 (2023.01); G16B 40/00 (2019.01); G16B 20/00 (2019.01); G06F 9/38 (2018.01); G06N 3/04 (2023.01); G06N 3/084 (2023.01)

CPC G16B 40/20 (2019.02) [G06F 9/3877 (2013.01); G06F 18/2148 (2023.01); G06F 18/2431 (2023.01); G06N 3/04 (2013.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01); G16B 20/00 (2019.02); G16B 20/20 (2019.02); G16B 40/00 (2019.02)]

20 Claims

1. A system for a trained variant classifier, the system including:

numerous processors operating in parallel and coupled to memory;

a convolutional neural network running on the numerous processors, trained on at least 50000 training examples of groups of reads spanning candidate variant sites labeled with true variant classifications of the groups of reads using a backpropagation-based gradient update technique that progressively matches outputs of the convolutional neural network with corresponding ground truth labels;

wherein each of the at least 50000 training examples used in the training includes a group of reads aligned to a reference read, each of the reads including a target base position flanked by or padded to at least 110 bases on each side, each of the at least 110 bases in the reads accompanied by

a corresponding reference base in the reference read,

a base call accuracy score of reading the base,

a strandedness of reading the base,

insertion count of changes adjoining a position of the base, and

deletion flag at the position of the base;

an input module of the convolutional neural network which runs on at least one of the numerous processors and feeds the group of reads for evaluation of the target base position; and

an output module of the convolutional neural network which runs on at least one of the numerous processors and translates analysis by the convolutional neural network into classification scores for likelihood that each candidate variant at the target base position is a true variant or a false variant.