US 9,813,467 B1
Real-time alignment and processing of incomplete stream of data
Ryan Barrett, San Francisco, CA (US); Taylor Sittler, San Francisco, CA (US); Krishna Pant, San Jose, CA (US); Zhenghua Li, San Jose, CA (US); Katsuya Noguchi, San Francisco, CA (US); and Nishant Bhat, San Francisco, CA (US)
Assigned to COLOR GENOMICS, INC., Burlingame, CA (US)
Filed by Ryan Barrett, San Francisco, CA (US); Taylor Sittler, San Francisco, CA (US); Krishna Pant, San Jose, CA (US); Zhenghua Li, San Jose, CA (US); Katsuya Noguchi, San Francisco, CA (US); and Nishant Bhat, San Francisco, CA (US)
Filed on Mar. 7, 2017, as Appl. No. 15/452,241.
Claims priority of provisional application 62/304,474, filed on Mar. 7, 2016.
Int. Cl. H04L 29/08 (2006.01); G06F 15/16 (2006.01); H04L 29/06 (2006.01)
CPC H04L 65/4069 (2013.01) [H04L 65/80 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, at a system, a stream of data from a data source, the stream of data including a plurality of reads, wherein each of the plurality of reads corresponds to a client;
while receiving the stream of data and prior to having received all of the plurality of reads:
extracting each of a set of reads from the stream of data;
aligning each of the set of reads to a corresponding portion of a reference data set;
for each particular position of a plurality of particular positions of the reference data set:
identifying a subset of reads of the aligned set of reads, each read in the subset of reads including an identifier that is aligned to the particular position of the reference data set; and
generating a value of a client data set based on the subset of reads, the value corresponding to the particular position, wherein generating the value of the client data set includes generating a value of a client coverage data set corresponding to a proportion and/or quantity of the subset of reads that include an identifier that is aligned to the particular position of the reference data set;
generating at least one variable based on the values of the client data set;
determining, based on the at least one variable, that at least one condition is satisfied; and
in response to determining that the at least one condition is satisfied, routing data that corresponds to the client data set, wherein routing data that corresponds to the client data set includes availing at least part of the client data set to one or more processors for sparse indicator detection.