US 11,756,300 B1
Method and apparatus for summarization of unsupervised video with efficient key frame selection reward functions
Geun Sik Jo, Incheon (KR); Ui Nyoung Yoon, Incheon (KR); and Myung Duk Hong, Incheon (KR)
Assigned to INHA UNIVERISTY RESEARCH AND BUSINESS FOUNDATION, Incheon (KR)
Filed by INHA University Research and Business Foundation, Incheon (KR)
Filed on Apr. 27, 2022, as Appl. No. 17/730,536.
Int. Cl. G06V 10/74 (2022.01); G06V 20/40 (2022.01)
CPC G06V 20/47 (2022.01) [G06V 10/761 (2022.01); G06V 20/41 (2022.01)] 8 Claims
OG exemplary drawing
 
1. An attention-based video summarization method comprising:
extracting frame-level visual features from an input video;
computing an attention weight and representing an importance score as a frame tracking probability for selecting a key frame by using the attention weight;
obtaining a temporal consistency reward function and a representativeness reward function so as to select the key frame, based on a visual similarity distance and temporal distance between key frames, and training an attention-based video summarization network to predict an importance score for selecting a key frame of a video summary by using the temporal consistency reward function and the representativeness reward function;
creating a video summary by selecting a corresponding key frame based on the predicted importance score, evaluating the quality of the created video summary, and performing policy gradient learning for the attention-based video summarization network;
calculating regularization and reconstruction loss for controlling the probability to select a key frame by using the importance score of the selected key frame; and
creating a video summary based on the calculated regularization and reconstruction loss.