| US 7,613,602 B2 | ||
| Structured document processing apparatus, structured document search apparatus, structured document system, method, and program | ||
| Takuya Kanawa, Kawasaki (Japan) | ||
| Assigned to Kabushiki Kaisha Toshiba, Tokyo (Japan) | ||
| Filed on Mar. 24, 2006, as Appl. No. 11/388,131. | ||
| Claims priority of application No. 2005-219165 (JP), filed on Jul. 28, 2005. | ||
| Prior Publication US 2007/0027671 A1, Feb. 01, 2007 | ||
| Int. Cl. G06F 17/27 (2006.01); G06F 17/20 (2006.01); G06F 17/30 (2006.01); G06F 17/00 (2006.01) | ||
| U.S. Cl. 704—9 [704/1; 707/4; 707/6; 707/102] | 20 Claims |

| 1. A structured document processing apparatus comprising:
an acquisition unit configured to acquire a structured document;
a storage unit configured to store a structure model tree which indicates a typical structure of the acquired structured document;
a parsing unit configured to parse the acquired structured document;
an updating unit configured to update the structure model tree to match a structure of the parsed structured document therewith;
a division unit configured to divide the acquired structured document into a plurality of lexical items;
a calculation unit configured to calculate frequency-of-occurrence information indicating locations of each of the lexical
items in the acquired structured document;
a broadening unit configured to broaden a range until a lexical item having not less than a frequency of occurrence is present
within the range; and
an assignment unit configured to assign a lexical identifier of a lexical item which has a highest frequency of occurrence
within the broadened range as a relevant lexical identifier of the lexical item.
|