CPC A23J 3/04 (2013.01) [A23J 3/00 (2013.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01); G16H 20/60 (2018.01)] | 24 Claims |
1. A method of identifying and developing individual proteins having a desired target function for use in a chosen industrial process, the method comprising:
(a) training a computer system to predict whether an individual protein has a preselected target function from one or more structural and/or functional characteristics of the individual protein including at least the individual protein's amino acid sequence, the computer system being trained by a process of machine learning that comprises inputting into the computer system a training data set that contains said characteristic(s) for a plurality of individual proteins known to have the target function and for a plurality of individual proteins known not to have the target function;
(b) applying the computer system trained in step (a) to a source data set that contains said characteristic(s) for each of a plurality of naturally occurring individual proteins for which it is not known whether the proteins have the target function, thereby predicting which of the naturally occurring individual proteins in the source data set have the target function;
(c) identifying or ranking by the computer system the naturally occurring individual proteins predicted in step (b) to have the target function, thereby obtaining a set of protein candidates;
(d) recombinantly expressing and purifying each of the protein candidates;
(e) conducting assays to determine or quantify which of the expressed protein candidates have the target function;
(f) adding structural data and/or assay results for the protein candidates tested in step (e) into the training data set;
(g) selecting one or more of the individual proteins assayed in step (e) as having potential for use in the industrial process if they are determined in step (e) as having the target function above a chosen threshold;
(h) performing additional cycle(s) of steps (a) to (g) until a desired number of individual proteins having potential for use in the industrial process have been selected as having the target function above the threshold; and then
(i) assessing one or more individual proteins selected in step (g) to determine whether it meets desired performance requirements in the industrial process.
|