Method and system for generating grammar rules

US 9,183,204 B2
Filed: 06/23/2014
Issued: 11/10/2015
Est. Priority Date: 10/17/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method of generating domain-specific grammar rules using a computer system having data processing logic, the method comprising:

parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure;

extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words;

extracting a frequency of each n-gram in each document;

extracting a frequency of each n-gram in the plurality of documents;

assigning a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents;

determining which of the extracted n-grams are in each identified key term;

assigning a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term; and

generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information retrieval system, including a natural language parser (3) for parsing documents of a document space (1) to identify key terms of each document based on linguistic structure, and for parsing a search query to determine the search term, a feature extractor (4) for determining an importance score for terms of the document space (1) based on distribution of the terms in the document space (1), an index term generator (5) for generating index terms using the key terms identified by the parser (3) and the extractor (4) and having an importance score above a threshold level, and a query clarifier (16) for selecting from the index terms, on the basis of the search term, index terms for selecting at least one document from the document space (1). A speech recognition engine (12) is used to generate the query, and a bi-gram language module (6) generates grammar rules for the speech recognition engine (12) using the index terms.

Citations

20 Claims

1. A method of generating domain-specific grammar rules using a computer system having data processing logic, the method comprising:
- parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure;
  
  extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words;
  
  extracting a frequency of each n-gram in each document;
  
  extracting a frequency of each n-gram in the plurality of documents;
  
  assigning a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents;
  
  determining which of the extracted n-grams are in each identified key term;
  
  assigning a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term; and
  
  generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities.
- View Dependent Claims (2, 3, 4, 15, 16, 17, 18)
- - 2. A method as claimed in claim 1, wherein a natural language parser executes said parsing, and said key terms are linguistically important terms of each document.
  - 3. A method as claimed in claim 2, wherein said parser generates key-centered phrase structure frames for sentences of each document, and generates at least one frame relation graph that is parsed to determine the frames representative of the sentences of each document, said frames including said key terms.
  - 4. A method as claimed in claim 1, wherein generating the grammar rules comprises generating a list of phrases including said key terms and said respective weights, and inputting said list as a bi-gram array with said weights representing said probabilities, to generate said grammar rules for said speech recognition engine.
  - 15. A method as claimed in claim 1, wherein the novelty score is determined on the basis of:
    - p_ij, which is the probability of the occurrence of n-gram i in document j determined from the extracted frequency of the n-gram i in the document j;
      
      q_ij, which is the probability of occurrence of the n-gram i elsewhere in said documents determined from the extracted frequency of the n-gram i in the plurality of documents and the extracted frequency of the n-gram i in the document j;
      
      t_ij, which is the probability of occurrence of the n-gram i in said documents determined from the extracted frequency of the n-gram i in the plurality of documents;
      
      S_j, which is the total count of n-grams in the document j; and
      
      S, which is Σ
      
      S_j,if p_ij≧
      
      q_ij.
  - 16. A method as claimed in claim 15, wherein the novelty score is determined to be zero if p_ij<
    - q_ij.
  - 17. A method as claimed in claim 16, wherein the novelty score is determined on the basis of the following:
  - 18. A method as claimed in claim 1, wherein the plurality of n-grams extracted from each document are of the same length n.

5. An extraction system for generating domain-specific grammar rules, the extraction system including a computer system having data processing logic configured to provide:
- a parser for parsing a plurality of documents stored in a digital document database on computer-accessible storage media to identify key terms of each document based on sentence structure;
  
  a feature extractor for;
  
  extracting a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words;
  
  extracting a frequency of each n-gram in each document;
  
  extracting a frequency of each n-gram in the plurality of documents;
  
  assigning a novelty score to each of the n-grams in corresponding documents, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents,determining which of the extracted n-grams are in each identified key term, and assigning a weight to each key term based on the novelty scores assigned to the extracted n-grams in the key term; and
  
  a grammar generator for generating the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities.
- View Dependent Claims (7, 8, 9, 10, 19)
- - 7. A system as claimed in claim 5, wherein a natural language parser executes said parsing, and said key terms are linguistically important terms of each document.
  - 8. A system as claimed in claim 7, wherein said parser generates key-centered phrase structure frames for sentences of each document, and generates at least one frame relation graph that is parsed to determine the frames representative of the sentences of each document, said frames including said key terms.
  - 9. A system as claimed in claim 5, wherein the novelty score is determined on the basis of
  - 10. A system as claimed in claim 5 wherein generating the grammar rules comprises generating a list of phrases including said key terms and said, and inputting said list as a bi-gram array with said weights representing said probabilities, to generate said grammar rules for said speech recognition engine.
  - 19. A system as claimed in claim 5, wherein the plurality of n-grams extracted from each document are of the same length n.

6. A machine-readable non-transitory medium having stored thereon instructions for generating domain-specific grammar rules comprising machine executable code which when executed by at least one machine, causes the machine to:
- parse a plurality of documents stored in a digital document database on a computer-accessible storage media to identify key terms of each document based on sentence structure;
  
  extract a plurality of n-grams from each document, wherein one or more of the n-grams include spaces and partial words;
  
  extract a frequency of each n-gram in each document;
  
  extract a frequency of each n-gram in the plurality of documents;
  
  assign a novelty score to each of the n-grams in each corresponding document, said novelty score representing and being based on the extracted frequency of the n-gram in the document and the extracted frequency of the n-gram in the plurality of documents;
  
  determine which of the extracted n-grams are in each identified key term;
  
  assign a weight to each key term based the novelty scores assigned to at the extracted n-grams in the key term; and
  
  generate the domain-specific grammar rules for a speech recognition engine, said grammar rules including said key terms in association with respective probabilities based on the weights of the key terms, wherein the key terms define phrases that are likely to be spoken from the plurality of documents, and the grammar rules define which of the phrases are likely to follow others of the phrases with the likelihoods defined by the probabilities.
- View Dependent Claims (11, 12, 13, 14, 20)
- - 11. A machine readable medium as claimed in claim 6, wherein a natural language parser executes said parsing, and said key terms are linguistically important terms of each document.
  - 12. A machine readable medium as claimed in claim 11, wherein said parser generates key-centered phrase structure frames for sentences of each document, and generates at least one frame relation graph that is parsed to determine the frames representative of the sentences of each document, said frames including said key terms.
  - 13. A machine readable medium as claimed in claim 6, wherein the novelty score is determined on the basis of
  - 14. A machine readable medium as claimed in claim 6 wherein generating the grammar rules comprises generating a list of phrases including said key terms and said, and inputting said list as a bi-gram array with said weights representing said probabilities, to generate said grammar rules for said speech recognition engine.
  - 20. A machine-readable non-transitory medium as claimed in claim 6, wherein the plurality of n-grams extracted from each document are of the same length n.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Telstra Corporation Limited (Telstra Group Ltd.)
Original Assignee
Telstra Corporation Limited (Telstra Group Ltd.)
Inventors
Jiang, Jason, Starkie, Bradford Craig, Raskutti, Bhavani Laxman
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
Daye, Chelcie

Application Number

US14/311,979
Publication Number

US 20150019205A1
Time in Patent Office

505 Days
Field of Search

707/750, 704/9
US Class Current

1/1
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/3334   Selection or weighting of t...

G06F 16/3344   using natural language anal...

G06F 16/93   Document management systems

G06F 40/205   Parsing

G06F 40/40   Processing or translation o...

G10L 15/00   Speech recognition G10L17/0...

Method and system for generating grammar rules

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for generating grammar rules

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links