US 11,682,238 B2
Re-timing a video sequence to an audio sequence based on motion and audio beat detection
Jimei Yang, Mountain View, CA (US); Deepali Aneja, Seattle, WA (US); Dingzeyu Li, Seattle, WA (US); Jun Saito, Seattle, WA (US); and Yang Zhou, Amherst, MA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Feb. 12, 2021, as Appl. No. 17/175,441.
Prior Publication US 2022/0261573 A1, Aug. 18, 2022
Int. Cl. G06V 40/20 (2022.01); G06T 7/215 (2017.01); G06V 20/40 (2022.01); G06V 40/10 (2022.01); H04N 5/06 (2006.01); H04N 21/8547 (2011.01); G11B 27/031 (2006.01); G10H 1/36 (2006.01); G11B 27/10 (2006.01); H04N 21/845 (2011.01)
CPC G06V 40/23 (2022.01) [G06T 7/215 (2017.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 40/103 (2022.01); H04N 5/06 (2013.01); H04N 21/8456 (2013.01); H04N 21/8547 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving an input, the input including a video sequence;
detecting motion beats of a moving person in the video sequence by:
identifying a point of interest on the moving person for body tracking,
calculating a mean position of the point of interest in the video sequence,
identifying a greatest distance from the calculated mean position of the point of interest,
generating a ring centered at the calculated mean position having a radius equal to the identified greatest distance from the point of interest, wherein the ring has a specified number of evenly distributed points,
determining position data for the point of interest at each frame of the video sequence, including information indicating a distance of the point from each of the specified number of evenly distributed points,
generating a representation of motion based on the determined position data, and
determining the motion beats for the video sequence using the generated representation of motion;
detecting audio beats in an audio sequence;
modifying the video sequence by matching the detected motions beats in the video sequence to the detected audio beats in the audio sequence; and
outputting the modified video sequence.