CPC G06V 20/49 (2022.01) [G06N 20/00 (2019.01); G06T 7/10 (2017.01); G10L 15/22 (2013.01); G10L 25/63 (2013.01)] | 20 Claims |
1. A method to generate conversation features for recorded conversations, the method comprising:
receiving one or more videos depicting a multiple-user interaction, each of the one or more videos including acoustic data and visual data;
segmenting the one or more videos into multiple utterance segments, from the multiple users, wherein the segmenting is based on identifying utterances from individual users;
receiving label data for two or more of the utterance segments, the label data specifying the conversation features associated with the corresponding utterance segment, wherein the conversation features include the following: facial expressions, body postures or gestures, eye gaze directions, tone, volume, pitch, mel-frequency cepstral coefficients, chroma, and emotions of a user,
wherein the label data for at least a first segment corresponds to an utterance associated with a first of the multiple users and the label data for at least a second segment corresponds to an utterance associated with a second of the multiple users;
storing the label data in association with the utterance segments; and
generating conversation analysis indicators, comprising one or more scores for the conversation, by applying the conversation features to a machine learning system, wherein the conversation analysis indicators include coaching statistics.
|