US 11,816,127 B2
Quality assessment of extracted features from high-dimensional machine learning datasets
Petr Novotny, Mount Kisco, NY (US); Aindrila Basak, Edmonton (CA); Shaikh Shahriar Quader, Scarborough (CA); Horst Cornelius Samulowitz, Armonk, NY (US); and Chad Marston, Bolton, MA (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 26, 2021, as Appl. No. 17/186,116.
Prior Publication US 2022/0292107 A1, Sep. 15, 2022
Int. Cl. G06F 16/26 (2019.01); G06F 16/215 (2019.01); G06N 20/00 (2019.01)
CPC G06F 16/26 (2019.01) [G06F 16/215 (2019.01); G06N 20/00 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented quality determination method, the method comprising:
performing a dimensionality reduction on a high-dimensional dataset to form a dimensional-reduced dataset;
calculating relative proximities between a selected single data-point and a perturbed neighborhood both for the high-dimensional dataset and in an embedding; and
determining, using a machine-learning tool executed on a computing device, a quality of the dimensional-reduced dataset based on the relative proximities via a review of an extracted feature extracted from the dimensional-reduced dataset,
wherein the determining the quality further comprises:
identifying a K-ary neighborhood for the single data-point using an unsupervised nearest-neighbor search algorithm;
amplifying the K-ary neighborhood by approximating a locality of the point by sampling a finite number of points uniformly at a random interval around the single data-point; and
calculating feature distance contributions and feature influence explanations, the feature influence explanations being obtained as a linear correlation between the feature distance contributions and increasing proximity in the neighborhood.