US 9,811,761 B2
System, method, and recording medium for detecting video face clustering with inherent and weak supervision
Yu Cheng, Ossining, NY (US); Sharathchandra U. Pankanti, Darien, CT (US); and Nalini K. Ratha, White Plains, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Aug. 28, 2015, as Appl. No. 14/839,424.
Prior Publication US 2017/0061245 A1, Mar. 2, 2017
Int. Cl. G06K 9/62 (2006.01); G06K 9/00 (2006.01)
CPC G06K 9/6218 (2013.01) [G06K 9/00288 (2013.01); G06K 9/00295 (2013.01); G06K 9/00718 (2013.01); G06K 9/6264 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A face clustering system for video face clustering in a video sequence, the system comprising:
an inherent supervision summarization device configured to collect group-level supervision and instance level supervision within a same chunklet based on a user input of face images for a person;
a discriminative projection learning device configured to embed group constraints of the group-level supervision into a transformed space, and configured to generate an embedding space from the original image feature space;
a clustering device, in the embedding space, configured to execute pair-wise based clustering to cluster the video images into different clusters with the instance level supervision collected by the inherent supervision summarization device; and
a face detection and verification device configured to extract a face region from the video sequence and extract shift features from the face region,
wherein the face detection and verification device excludes items in the video sequence that are not a face.
 
12. A face clustering method for video face clustering in a video sequence, the method comprising:
extracting group constraints and pair-wise constraints from the video sequence;
embedding the group constraints into a feature space while generating an embedding space from the original image feature space;
in the generated space, executing pair-wise based clustering to cluster the video images into different clusters; and
extracting a face region from the video sequence and shift features from the face region,
wherein the extracting the face region excludes items in the video sequence that are not a face.