US 11,810,649 B2
Methods for identifying novel gene editing elements
Feng Zhang, Cambridge, MA (US); and David Arthur Scott, Cambridge, MA (US)
Assigned to THE BROAD INSTITUTE, INC., Cambridge, MA (US); and MASSACHUSETTS INSTITUTE OF TECHNOLOGY, Cambridge, MA (US)
Filed by THE BROAD INSTITUTE, INC., Cambridge, MA (US); and MASSACHUSETTS INSTITUTE OF TECHNOLOGY, Cambridge, MA (US)
Filed on Aug. 17, 2017, as Appl. No. 15/679,619.
Claims priority of provisional application 62/376,383, filed on Aug. 17, 2016.
Prior Publication US 2018/0068062 A1, Mar. 8, 2018
Int. Cl. G16B 40/00 (2019.01); G16B 20/00 (2019.01); G16B 20/50 (2019.01); G16B 40/30 (2019.01); G16B 20/30 (2019.01); G06F 18/20 (2023.01); G06F 18/231 (2023.01); G06F 18/2413 (2023.01); G06N 3/088 (2023.01); G06N 7/01 (2023.01)
CPC G16B 40/00 (2019.02) [G06F 18/231 (2023.01); G06F 18/2413 (2023.01); G06F 18/295 (2023.01); G16B 20/00 (2019.02); G16B 20/30 (2019.02); G16B 20/50 (2019.02); G16B 40/30 (2019.02); G06N 3/088 (2013.01); G06N 7/01 (2023.01)] 10 Claims
 
1. A method to identify novel CRISPR effector elements, comprising:
training, by a processor, an unsupervised neural network using a set of known CRISPR locus elements that include spacer and repeat elements of the known CRISPR locus, wherein the trained unsupervised neural network comprises hierarchical clustering;
generating, by the processor, and using the training unsupervised neural network, a preliminary set of CRISPR locus classes by separating a set of known CRISPR loci based, at least in part, on a sequence similarity and/or domain similarity between one or more protein elements of the CRISPR loci;
generating, by the processor, a distance matrix data structure based, at least in part, on cumulative similarities between constituent proteins in each CRISPR locus class, wherein the putative CRISPR loci are classified by applying the hierarchical clustering to the distance matrix;
classifying, by the processor, a CRISPR locus using the unsupervised neural network applied to all or a subset of the known CRISPR locus elements as an initial set of inputs; and
identifying, by the processor, putative novel effector elements in the CRISPR locus, and optionally screening each identified putative novel effector element for one or more biological functions.