US 11,816,550 B1
	Confidence score generation for boosting-based tree machine learning models
Deepak Gupta, Delhi (IN); Anirban Majumder, Bangalore (IN); Prateek Sircar, Noida (IN); and Rajeev Ramnarain Rastogi, Bangalore (IN)
Assigned to AMAZON TECHNOLOGIES, INC., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jul. 20, 2020, as Appl. No. 16/933,215.
Int. Cl. G06N 20/20 (2019.01); G06N 5/04 (2023.01); G06N 5/01 (2023.01)

CPC G06N 20/20 (2019.01) [G06N 5/01 (2023.01); G06N 5/04 (2013.01)]

17 Claims

1. A method of generating a confidence score for a boosting-based tree machine learning model score comprising:

generating a plurality of bootstrap models using a dataset of a boosting-based tree machine learning model;

determining, for first media data comprising a first plurality of attributes, a plurality of output scores comprising a respective output score for each of the plurality of bootstrap models;

determining, for the first media data, a first standard deviation of the plurality of output scores;

training a confidence score prediction machine learning model to generate confidence scores using a training instance comprising the first standard deviation as a target label;

receiving second media data comprising a second plurality of attributes;

determining, by the boosting-based tree machine learning model for the second media data, a first class score, wherein the first class score indicates a prediction related to the second media data;

determining, by an outlier prediction machine learning model, an outlier prediction score for the second media data using the second plurality of attributes;

inputting the outlier prediction score and the first class score into the confidence score prediction machine learning model trained using the training instance;

generating, by the confidence score prediction machine learning model, a confidence score indicating a confidence in the first class score determined by the boosting-based tree machine learning model for the second media data;

selecting the second media data as recommended media data based on the confidence score being above a confidence score threshold; and

generating first output data comprising a recommendation of the second media data.