GRAMMAR COMPRESSION
First Claim
1. A method for compressing a grammar, the method comprising:
- receiving a grammar, the grammar comprising a plurality of rules and the rules comprising a plurality of token classes;
parsing the grammar to identify the plurality of rules within the grammar and the plurality of token classes within the plurality of rules;
identifying, from the plurality of token classes, two or more unimportant token classes that are eligible for compression;
analyzing the two or more unimportant classes to identify at least one subset of two or more unimportant token classes as a candidate subset for compression;
merging the two or more unimportant token classes from the candidate subset to generate a merged token class; and
substituting the merged token class in the grammar for the two or more unimportant token classes from the candidate subset to generate a compressed grammar.
2 Assignments
0 Petitions
Accused Products
Abstract
Compression of extensive, rule-based grammars used to facilitate search queries is provided herein. Rule-based grammars includes a list of rules that each comprise a sequence of token classes. Each token class is a logical grouping of tokens, and each token is a string of characters. A grammar is parsed to identify rules and token classes. Unimportant token classes are identified and sets of unimportant token classes are merged to generated merged token classes. A compressed grammar is generated by substituting the merged token classes into the grammar for corresponding unimportant token classes used to generate the merged token classes.
-
Citations
20 Claims
-
1. A method for compressing a grammar, the method comprising:
-
receiving a grammar, the grammar comprising a plurality of rules and the rules comprising a plurality of token classes; parsing the grammar to identify the plurality of rules within the grammar and the plurality of token classes within the plurality of rules; identifying, from the plurality of token classes, two or more unimportant token classes that are eligible for compression; analyzing the two or more unimportant classes to identify at least one subset of two or more unimportant token classes as a candidate subset for compression; merging the two or more unimportant token classes from the candidate subset to generate a merged token class; and substituting the merged token class in the grammar for the two or more unimportant token classes from the candidate subset to generate a compressed grammar. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more computer-storage media embodying computer-useable instructions that, when employed by a computing device, cause the computing device to perform a method comprising:
-
receiving a grammar usable by a search engine to route search queries to corresponding domains of information to find and return information for the search queries, the grammar comprising a plurality of rules, each rule comprising a sequence of token classes; parsing the grammar to identify the plurality of rules and token classes; identifying, from the token classes, two or more unimportant token classes that are eligible for compression and at least one important token class that is not eligible for compression; breaking at least one rule into a plurality of sub-rules based on important token classes, wherein each sub-rule includes a portion of the token classes from the at least one rule; analyzing the plurality of sub-rules to identify at least one set of sub-rules as compression candidates; analyzing the unimportant token classes in the at least one set of sub-rules to identify two or more unimportant token classes for compression; merging the two or more unimportant token classes from the at least one set of sub-rules to generate a merged token class; and generating a compressed grammar by substituting the merged token class in the grammar for the two or more unimportant token classes that were merged to generate the merged token class.
-
-
9. One or more computer-storage media embodying computer-useable instructions that, when employed by a computing device, cause the computing device to perform a method comprising:
-
receiving a grammar usable by a search engine to route search queries to corresponding domains of information to find and return information for the search queries, the grammar comprising a plurality of rules, each rule comprising a sequence of token classes used to describe search queries, each token class comprising a logical grouping of tokens, each token comprising a string of one or more characters; parsing the grammar to identify the plurality of rules and token classes; eliminating, from the grammar, any duplicate rules identified from parsing the grammar; assigning a score to each rule indicative of an importance of each rule to the grammar, wherein the score for each rule is based at least in part on the frequency with which each rule corresponds with search queries contained in query logs; identifying one or more rules as important rules based on the one or more rules having a high score indicative of a high importance to the grammar; removing the one or more important rules from consideration for compression. identifying, from the token classes, two or more unimportant token classes that are eligible for compression and at least one important token class that is not eligible for compression; breaking at least one rule into a plurality of sub-rules based on important token classes, wherein each sub-rule begins and ends with an important token class and wherein a beginning token class and ending token class in each rule is treated as an important token class for purposes of breaking each rule into the plurality of sub-rules; identifying one or more sub-rules containing only important token classes; removing the one or more sub-rules containing only important token classes from consideration for compression; eliminating, from the grammar, any duplicate sub-rules identified; analyzing the plurality of sub-rules to identify at least one set of sub-rules as compression candidates; analyzing the unimportant token classes in the at least one set of sub-rules to identify two or more unimportant token classes for compression; merging the two or more unimportant token classes from the at least one set of sub-rules to generate a merged token class; substituting the merged token class in the grammar for the two or more unimportant token classes that were merged to generate the merged token class; and eliminating any duplicate sub-rules and any duplicate rules after substituting the merged token classes in the grammar to generate a compressed grammar. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification