US 7,490,116 B2
Identifying history of modification within large collections of unstructured data
Dwayne A. Carson, Mendon, Mass. (US); Donato Buccella, Watertown, Mass. (US); and Michael Smolsky, Brookline, Mass. (US)
Assigned to Verdasys, Inc., Waltham, Mass. (US)
Filed on Dec. 17, 2003, as Appl. No. 10/738,924.
Claims priority of provisional application 60/442464, filed on Jan. 23, 2003.
Prior Publication US 2004/0167921 A1, Aug. 26, 2004
Int. Cl. G06F 17/30 (2006.01)
U.S. Cl. 707—205  [707/6] 18 Claims
OG exemplary drawing
 
1. A method for maintaining a representation of a history of operations performed on document files in a data processing environment, the method comprising:
sensing access events that involve accessing one or more digital assets at a user client computer device, the step of sensing access events being carried out by a monitor process located within an operating system kernel of the user client computer device;
in response to sensing an access event involving an operation on an existing document file, determining a relationship descriptor that depends on the sensed access event;
in response to sensing an access event in which a new document file is created, comparing contents of the new document file with contents of existing document files contained in a database to measure a percentage of similar content between the contents of the new document file and the contents of the existing document files contained in the database to determine a relationship descriptor for the new document file, the relationship descriptor quantifying a degree by which at least one of the existing document files was modified to create the new document file; and
creating an entry in the representation that contains the relationship descriptor.