Compression of logs of language data
First Claim
Patent Images
1. A method of compressing a log of linguistic data, the log having a plurality of linguistic strings, each string being including at least two tokens, the method comprising:
- applying a compression operation to each string;
determining if any two strings match each other after the compression operation; and
removing one of the two matching strings from the log.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for compressing query logs is provided. Multiple levels of user-specifiable compression include character-based compression, token-based compression, and subsumption. An efficient method for performing subsumption is also provided. The compressed query logs are then used to train a statistical process such as a help function for a computer operating system.
14 Citations
25 Claims
-
1. A method of compressing a log of linguistic data, the log having a plurality of linguistic strings, each string being including at least two tokens, the method comprising:
-
applying a compression operation to each string;
determining if any two strings match each other after the compression operation; and
removing one of the two matching strings from the log. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for compressing a query log having a plurality of linguistic strings, each string having a plurality of tokens, the system comprising:
-
an input for receiving a raw query log;
memory for storing the raw query log;
a processor for applying at least one compression operation to each string, and for scanning the modified strings to determine if any match each other so that one of the matching strings can be removed; and
an output for providing a compressed query log once the removal is complete. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
Specification