US 7,613,365 B2
Video summarization system and the method thereof
Jhing-Fa Wang, Tainan (Taiwan); Jia-Ching Wang, Tainan (Taiwan); and Chen-Yu Chen, Tainan (Taiwan)
Assigned to National Cheng Kung University, Tainan (Taiwan)
Filed on Jul. 14, 2006, as Appl. No. 11/486,122.
Claims priority of application No. 95108210 A (TW), filed on Mar. 10, 2006.
Prior Publication US 2007/0214418 A1, Sep. 13, 2007
Int. Cl. G06K 9/54 (2006.01)
U.S. Cl. 382—305  [382/197; 382/236; 345/951; 725/115] 27 Claims
OG exemplary drawing
 
1. A video summarization method comprising:
providing a video wherein the video has a plurality of sentences and a plurality of frames;
applying a key frame extraction step to the frames of the video to acquire a plurality of key frames, wherein the key frame extraction step comprises:
computing the similarity between each frame to obtain a plurality of similarity values; and
choosing the key frames from the frames, wherein the sum of the similarity values between the key frames is the minimum;
applying a key sentence extraction step to the sentences of the video to acquire a plurality of key sentences, wherein the key sentence extraction step comprises:
converting the sentences into a plurality of corresponding sentence vectors;
computing the distance between each sentence vector to obtain a plurality of distance values;
according to the distance values, dividing the sentences into a plurality of clusters, wherein the clusters are members of a set;
computing the importance of each sentence of each cluster to obtain the importance of each cluster;
applying a splitting step to split a most important member with the highest importance in the cluster into a plurality of new clusters, wherein the new clusters replace the original most important member and join the set as members of the set;
repeating the splitting step until the number of the clusters reaches a predetermined value; and
choosing at least one key sentence from each members of the set, wherein the sum of the importance of the key sentences is the maximum; and
outputting the key frames and the key sentences.