US 7,599,893 B2
Methods and systems for feature selection in machine learning based on feature contribution and model fitness
Marina Sapir, Mamaroneck, N.Y. (US); Faisal M. Khan, New Rochelle, N.Y. (US); David A. Verbel, New York, N.Y. (US); and Olivier Saidi, Greenwich, Conn. (US)
Assigned to Aureon Laboratories, Inc., Yonkers, N.Y. (US)
Filed on May 22, 2006, as Appl. No. 11/438,789.
Claims priority of provisional application 60/726809, filed on Oct. 13, 2005.
Prior Publication US 2007/0112716 A1, May 17, 2007
Int. Cl. G06F 15/18 (2006.01)
U.S. Cl. 706—12  [706/14; 706/20; 706/47; 382/128; 382/129; 382/133; 382/134; 600/300; 600/301] 26 Claims
OG exemplary drawing
 
1. A method for selecting features for a final prediction rule predictive of an outcome with respect to a medical condition, said method comprising:
performing with a computer-implemented machine learning tool:
(a) generating a prediction rule based on training data for a cohort of patients whose outcomes with respect to said medical condition are at least partially known, wherein for each patient the data comprises measurements for a set of features and the outcome with respect to said medical condition for said patient to the extent known, wherein in a first iteration of (a) said set of features includes n features with n greater than or equal to 3 with n being decremented by one in each subsequent iteration of (a);
(b) determining a fitness value for said prediction rule, wherein said determining a fitness value comprises summing a concordance index (CI) of said prediction rule with a product of a sensitivity and a specificity of said prediction rule;
(c) determining a value of contribution to said prediction rule for each of said features in said set of features;
(d) removing a feature from consideration from said set of features based on the values of contribution, wherein the feature having the lowest value of contribution is removed;
(e) iterating (a)-(d) in order to produce n prediction rules and n fitness values; and
(f) selecting, based on the fitness values for said n prediction rules, one of said n prediction rules as said final prediction rule predictive of the outcome with respect to said medical condition, wherein of said n prediction rules said final prediction rule has the highest predictive ability with respect to the outcome with respect to said medical condition as indicated by said fitness values; and
evaluating data for a patient with a computer implementation of said final prediction rule to produce a value predictive of the patient's outcome with respect to said medical condition.