US 11,816,910 B2
Approximate modeling of next combined result for stopping text-field recognition in a video stream
Konstantin Bulatovich Bulatov, Moscow (RU); and Vladimir Viktorovich Arlazarov, Moscow (RU)
Assigned to Smart Engines Service, LLC, Moscow (RU)
Filed by Smart Engines Service, LLC, Moscow (RU)
Filed on Feb. 19, 2021, as Appl. No. 17/180,238.
Claims priority of application No. RU2020122468 (RU), filed on Jul. 7, 2020.
Prior Publication US 2022/0012484 A1, Jan. 13, 2022
Int. Cl. G06K 9/00 (2022.01); G06F 16/901 (2019.01); G06K 9/60 (2006.01); G06V 30/413 (2022.01); G06V 10/20 (2022.01)
CPC G06V 30/413 (2022.01) [G06F 16/9027 (2019.01); G06V 10/20 (2022.01)] 7 Claims
OG exemplary drawing
 
1. A method comprising using at least one hardware processor to:
until a determination to stop processing is made, for each of a plurality of image frames in a video stream,
receive the image frame,
generate a text-recognition result from the image frame, wherein the text-recognition result comprises a vector of class estimations for each of one or more characters,
combine the text-recognition result with an accumulated text-recognition result,
estimate a distance between the accumulated text-recognition result and a next accumulated text-recognition result based on an approximate model of the next accumulated text-recognition result, wherein the distance between the accumulated text-recognition result and the next accumulated text-recognition result is estimated as

OG Complex Work Unit Math
wherein Δn is the estimated distance,
wherein n is a current number of image frames for which text-recognition results have been combined with the accumulated text-recognition result,
wherein δ is an external parameter,
wherein Sn is a number of vectors of class estimations in the accumulated text-recognition result,
wherein K is a number of classes represented in each vector of class estimations in the accumulated text-recognition result, and
wherein Δijk is a contribution to the estimated distance by a class estimation for a k-th class to a j-th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from an i-th image frame, and
determine whether or not to stop the processing based on the estimated distance; and,
after stopping the processing, output a character string based on the accumulated text-recognition result.