×

Method and system for generating grammar rules

  • US 8,793,261 B2
  • Filed: 10/17/2001
  • Issued: 07/29/2014
  • Est. Priority Date: 10/17/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. An information retrieval method for use with documents, including:

  • parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure;

    extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words;

    extracting a frequency of each n-gram in each document;

    extracting a frequency of each n-gram in the plurality of documents;

    assigning a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents;

    determining which of the extracted n-grams are in each identified key term;

    assigning a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term;

    generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities;

    determining an importance score for each said key terms in each document based on how many of the documents include the key term and the frequency of the key term in the document;

    parsing a search query to determine at least one search term wherein said search query is spoken and converted into text data representing the at least one search term by said speech recognition engine;

    matching said at least one search term against the key terms of the documents to select a subset of the key terms and determine matching documents corresponding to the subset of the key terms;

    generating a document fitness value for each matching document based on a subset of the importance scores corresponding to the subset of the key terms;

    ranking said matching documents according to their fitness values; and

    presenting said matching documents according to said ranking.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×