US 11,816,855 B2
Array-based depth estimation
Chenchi Luo, Plano, TX (US); Yingmao Li, Allen, TX (US); Kaimo Lin, Richardson, TX (US); and Youngjun Yoo, Plano, TX (US)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Sep. 21, 2020, as Appl. No. 17/027,106.
Claims priority of provisional application 63/056,999, filed on Jul. 27, 2020.
Claims priority of provisional application 62/972,689, filed on Feb. 11, 2020.
Prior Publication US 2021/0248769 A1, Aug. 12, 2021
Int. Cl. G06T 7/593 (2017.01); H04N 13/271 (2018.01); H04N 13/128 (2018.01); H04N 13/00 (2018.01)
CPC G06T 7/593 (2017.01) [H04N 13/128 (2018.05); H04N 13/271 (2018.05); G06T 2207/10028 (2013.01); H04N 2013/0081 (2013.01)] 21 Claims
OG exemplary drawing
 
9. An apparatus comprising:
at least three imaging sensors; and
at least one processor configured to:
obtain at least three input image frames of a scene using the at least three imaging sensors, the input image frames comprising a reference image frame and multiple non-reference image frames;
generate multiple disparity maps using the input image frames, wherein each disparity map is associated with the reference image frame and different ones of the disparity maps are associated with different ones of the non-reference image frames;
generate multiple confidence maps using the input image frames, wherein each confidence map identifies weights associated with one of the disparity maps; and
generate a depth map of the scene using the disparity maps and the confidence maps;
wherein the imaging sensors are positioned to define different baseline directions, each baseline direction extending between the imaging sensor used to capture the reference image frame and one of the imaging sensors used to capture one of the non-reference image frames; and
wherein, to generate the disparity maps, the at least one processor is configured to:
generate multiple feature maps each identifying features of a different one of the input image frames using first convolutional layers;
perform multiple cross-correlations, each occurring between (i) at least part of the feature map of the reference image frame and (ii) a different one of the feature maps of the non-reference image frames, to produce different sets of correlated feature maps; and
generate the disparity maps based on the sets of correlated feature maps using deconvolutional or upsampling layers and second convolutional layers.