Method and system for generating grammar rules
First Claim
1. A method of generating domain-specific grammar rules using a computer system having data processing logic, the method comprising:
- parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure;
extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words;
extracting a frequency of each n-gram in each document;
extracting a frequency of each n-gram in the plurality of documents;
assigning a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents;
determining which of the extracted n-grams are in each identified key term;
assigning a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term; and
generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities.
0 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system, including a natural language parser (3) for parsing documents of a document space (1) to identify key terms of each document based on linguistic structure, and for parsing a search query to determine the search term, a feature extractor (4) for determining an importance score for terms of the document space (1) based on distribution of the terms in the document space (1), an index term generator (5) for generating index terms using the key terms identified by the parser (3) and the extractor (4) and having an importance score above a threshold level, and a query clarifier (16) for selecting from the index terms, on the basis of the search term, index terms for selecting at least one document from the document space (1). A speech recognition engine (12) is used to generate the query, and a bi-gram language module (6) generates grammar rules for the speech recognition engine (12) using the index terms.
-
Citations
20 Claims
-
1. A method of generating domain-specific grammar rules using a computer system having data processing logic, the method comprising:
-
parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure; extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words; extracting a frequency of each n-gram in each document; extracting a frequency of each n-gram in the plurality of documents; assigning a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents; determining which of the extracted n-grams are in each identified key term; assigning a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term; and generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities. - View Dependent Claims (2, 3, 4, 15, 16, 17, 18)
-
-
5. An extraction system for generating domain-specific grammar rules, the extraction system including a computer system having data processing logic configured to provide:
-
a parser for parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure; a feature extractor for; extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words; extracting a frequency of each n-gram in each document; extracting a frequency of each n-gram in the plurality of documents; assigning a novelty score to each of the n-grams in corresponding documents, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents, determining which of the extracted n-grams are in each identified key term, and assigning a weight to each key term based on the novelty scores assigned to the extracted n-grams in the key term; and a grammar generator for generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities. - View Dependent Claims (7, 8, 9, 10, 19)
-
-
6. A machine-readable non-transitory medium having stored thereon instructions for generating domain-specific grammar rules comprising machine executable code which when executed by at least one machine, causes the machine to:
-
parse a plurality of documents stored in a digital document database on a computer-accessible storage media to identify key terms of each document based on sentence structure; extract a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words; extract a frequency of each n-gram in each document; extract a frequency of each n-gram in the plurality of documents; assign a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents; determine which of the extracted n-grams are in each identified key term; assign a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term; and generate the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities. - View Dependent Claims (11, 12, 13, 14, 20)
-
Specification