US 7,502,732 B2
Compressing messages on a per semantic component basis while maintaining a degree of human readability
Sharad Mathur, Redmond, Wash. (US); and Gregory P. Baribault, Kirkland, Wash. (US)
Assigned to Microsoft Corporation, Redmond, Wash. (US)
Filed on Dec. 09, 2005, as Appl. No. 11/299,125.
Application 09/781823 is a division of application No. 11/040548, filed on Jan. 21, 2005.
Application 11/299125 is a continuation of application No. 09/781823, filed on Feb. 12, 2001, granted, now 7,010,478.
Prior Publication US 2006/0089831 A1, Apr. 27, 2006
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/27 (2006.01)
U.S. Cl. 704—9  [704/4; 704/7] 14 Claims
OG exemplary drawing
 
2. A computing system having access to a text message that contains a plurality of semantic components, the computing system comprising:
one or more physical computer-readable media storing computer-executable instructions which, when executed by the computer system, implement a method for compressing the text message on a per semantic component basis to form a compressed message while maintaining a degree of human readability, wherein the method includes:
an act of accessing the text message;
an act of parsing the text message into the plurality of semantic components; and
for at least some of the plurality of semantic components, performing a step for differentiating between each of the parsed semantic components and selecting a corresponding compression method, if any, to be used for each corresponding semantic component when compressing the semantic component for inclusion in the compressed message, taking into consideration the specific attributes of each semantic component in selecting a compression method appropriate for each semantic component so as to optimize the text compression on a per semantic component basis so that the more important information is included in the compressed message;
wherein differentiating between the parsed semantic components includes determining whether each semantic component is considered to be a natural language component having natural language expressions, wherein selection of the compression method to use for each corresponding semantic component is based at least in part on whether said corresponding semantic component is determined to be a natural language component, and wherein semantic components determined to be natural language components are treated differently, using different compression techniques during compression, than semantic components that are determined to not be natural language components;
such that compression of semantic components determined to be natural language components includes obtaining a plurality of versions of compressed content and determining which of the plurality of versions provides a greatest amount of content without exceeding a threshold limit, and such that compression of semantic components determined to not be natural language components includes using customized compression including at least one of replacing text with substitute text, removing at least one header in a message, deleting text and replacing at least one name with an initial.