US 11,756,332 B2
Image recognition method, apparatus, device, and computer storage medium
Zhizhi Guo, Beijing (CN); Yipeng Sun, Beijing (CN); Jingtuo Liu, Beijing (CN); and Junyu Han, Beijing (CN)
Assigned to Beijing Baidu Netcom Science and Technology Co., Ltd.
Filed by Beijing Baidu Netcom Science and Technology Co., Ltd., Beijing (CN)
Filed on Mar. 22, 2021, as Appl. No. 17/208,568.
Claims priority of application No. 202010611133.8 (CN), filed on Jun. 30, 2020.
Prior Publication US 2021/0209343 A1, Jul. 8, 2021
Int. Cl. G06V 40/10 (2022.01); G06V 40/16 (2022.01); G06N 3/08 (2023.01); G06F 18/10 (2023.01); G06F 18/20 (2023.01); G06N 3/045 (2023.01); G06V 10/764 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01)
CPC G06V 40/171 (2022.01) [G06F 18/10 (2023.01); G06F 18/29 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06V 10/764 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01)] 12 Claims
OG exemplary drawing
 
1. An image recognition method, comprising:
performing organ recognition on a human face image and marking positions of human facial five sense organs in the human face image, obtaining a marked human face image;
inputting the marked human face image into a backbone network model and performing feature extraction, obtaining defect features of the marked human face image outputted by different convolutional neural network levels of the backbone network model; and
fusing the defect features of different levels that are located in a same area of the human face image, obtaining a defect recognition result of the human face image,
wherein the inputting the marked human face image into a backbone network model and performing feature extraction, obtaining defect features of the marked human face image outputted by different convolutional neural network levels of the backbone network model, comprises:
setting a priori box on the marked human face image in the convolutional neural network level of a target level, wherein the target level is one of the multiple levels in the convolutional neural network in the backbone network model, and the size of the priori box corresponds to the target level; and
determining whether there are human facial defects in the priori box, and outputting a partial human face image in the priori box as a defect feature of the marked human face image outputted by the convolutional neural network level of the target level if it is determined that there are human facial defects in the priori box.