Linguistically intelligent text compression
First Claim
Patent Images
1. A method of processing a body of text to generate compression options, comprising:
- performing a linguistic analysis on the body of text to obtain a plurality of different tokens, comprising one of a word and a number, in the body of text;
after performing the linguistic analysis, automatically generating a plurality of correct compression options for each of a plurality of different corresponding tokens in the body of text to compress the body of text, each of the correct compression options comprising a different, correct compressed form of the corresponding tokens in the body of text wherein generating at least one of the compressed forms comprises reducing a number of letters or numbers in the corresponding token so the compressed form includes some, but not all, letters or numbers in the corresponding tokens;
wherein automatically generating a plurality of correct compression options comprises automatically subjecting each of the tokens in the body of text to different sets of compression rules in a predetermined order to obtain the plurality of correct compression options, such that the plurality of correct compression options reflect varying degrees of compression of a same token in the body of text; and
selecting one of the plurality of correct compression options for each of the plurality of different tokens in the body of text and outputting a compressed form of the body of text as the selected compression option for each token.
1 Assignment
0 Petitions
Accused Products
Abstract
A text processor processes text in a message. The text processor generates a plurality of compressed forms of components of the message. The processor performs a linguistic analysis on the body of text to obtain a linguistic output indicative of linguistic components of the body of text. The processor then generates the plurality of compressed forms that can be used to compress the body of text. The plurality of compressed forms are generated based on the linguistic output. The invention can be implemented as a method of generating the compressed forms and as an apparatus.
23 Citations
16 Claims
-
1. A method of processing a body of text to generate compression options, comprising:
-
performing a linguistic analysis on the body of text to obtain a plurality of different tokens, comprising one of a word and a number, in the body of text; after performing the linguistic analysis, automatically generating a plurality of correct compression options for each of a plurality of different corresponding tokens in the body of text to compress the body of text, each of the correct compression options comprising a different, correct compressed form of the corresponding tokens in the body of text wherein generating at least one of the compressed forms comprises reducing a number of letters or numbers in the corresponding token so the compressed form includes some, but not all, letters or numbers in the corresponding tokens; wherein automatically generating a plurality of correct compression options comprises automatically subjecting each of the tokens in the body of text to different sets of compression rules in a predetermined order to obtain the plurality of correct compression options, such that the plurality of correct compression options reflect varying degrees of compression of a same token in the body of text; and selecting one of the plurality of correct compression options for each of the plurality of different tokens in the body of text and outputting a compressed form of the body of text as the selected compression option for each token. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A message handler receiving a message and generating compression options indicative of different forms of a portion of a body of text in the message, the message handler comprising:
-
a linguistic analyzer linguistically configured to analyze the body of text and provide a linguistic analysis having leaf nodes representing individual tokens in the body of text; a compression form generator configured to automatically generate a plurality of different compressed forms of at least a plurality of tokens represented by the leaf nodes in the linguistic analysis, the plurality of different compressed forms each representing a correct compressed form of a corresponding individual token, wherein generating at least one of the compressed forms comprises reducing a number of letters or numbers in the corresponding token so the compressed form includes some, but not all, letters or numbers in the corresponding tokens, wherein the compression form generator generates the plurality of different compressed forms by automatically subjecting each of the tokens in the body of text to different sets of compression rules in a predetermined order to obtain the plurality of correct compressed forms, such that the plurality of correct compressed forms reflect varying degrees of compression of a same token in the body of text; and a compressor configured to generate an output indicative of selected ones of the plurality of different compressed forms for the individual tokens in the body of text.
-
Specification