US 7,613,602 B2
Structured document processing apparatus, structured document search apparatus, structured document system, method, and program
Takuya Kanawa, Kawasaki (Japan)
Assigned to Kabushiki Kaisha Toshiba, Tokyo (Japan)
Filed on Mar. 24, 2006, as Appl. No. 11/388,131.
Claims priority of application No. 2005-219165 (JP), filed on Jul. 28, 2005.
Prior Publication US 2007/0027671 A1, Feb. 01, 2007
Int. Cl. G06F 17/27 (2006.01); G06F 17/20 (2006.01); G06F 17/30 (2006.01); G06F 17/00 (2006.01)
U.S. Cl. 704—9  [704/1; 707/4; 707/6; 707/102] 20 Claims
OG exemplary drawing
 
1. A structured document processing apparatus comprising:
an acquisition unit configured to acquire a structured document;
a storage unit configured to store a structure model tree which indicates a typical structure of the acquired structured document;
a parsing unit configured to parse the acquired structured document;
an updating unit configured to update the structure model tree to match a structure of the parsed structured document therewith;
a division unit configured to divide the acquired structured document into a plurality of lexical items;
a calculation unit configured to calculate frequency-of-occurrence information indicating locations of each of the lexical items in the acquired structured document;
a broadening unit configured to broaden a range until a lexical item having not less than a frequency of occurrence is present within the range; and
an assignment unit configured to assign a lexical identifier of a lexical item which has a highest frequency of occurrence within the broadened range as a relevant lexical identifier of the lexical item.