US 7,508,984 B2
Language recognition method, system and software
Yoshihisa Ohguro, Yokohama (Japan)
Assigned to Ricoh Company, Ltd., Tokyo (Japan)
Filed on Jul. 30, 2004, as Appl. No. 10/903,131.
Claims priority of application No. 2003-204353 (JP), filed on Jul. 31, 2003.
Prior Publication US 2005/0027511 A1, Feb. 03, 2005
Int. Cl. G06K 9/00 (2006.01); G06F 17/27 (2006.01)
U.S. Cl. 382—181  [704/9] 12 Claims
OG exemplary drawing
 
1. A method of quantifying document image data, comprising:
using a processor to implement the following steps: quantifying a predetermined number of consecutive characters in first document image data into first quantified data based upon layout characteristic information, the first document image data containing character lines, each of the character lines including characters, the layout characteristic information being based upon a minimal circumscribing rectangle around each of the characters, the layout characteristic information including a plurality of parameters, the parameters including a combination of information on a height of the minimal circumscribing rectangle starting from a bottom line in the character line, a height of the minimal circumscribing rectangle, a width of the minimal circumscribing rectangle, a black pixel density in the minimal circumscribing rectangle and a distance between two adjacent ones of the minimal circumscribing rectangles;
converting the first quantified data into symbol series; and
generating a table representing occurrence probabilities of the consecutive characters based upon the symbol series.