System and method for speech activated navigation

US 20030078781A1
Filed: 10/24/2001
Published: 04/24/2003
Est. Priority Date: 10/24/2001
Status: Active Grant

First Claim

Patent Images

1. A computer readable storage medium encoded with instructions, which when loaded into a digital computational device establishes a speech-activated Web browsing system, the system comprising:

a speech recognition system which is designed to recognize an input utterance against a list of candidate textual titles;

means for extracting textual titles from a given Web page, each of said textual title representing a content item which can be rendered on screen when a command associated with said textual title is activated;

means for generating a grammar for said textual titles extracted from said Web page;

means for applying said grammar to said speech recognition system;

wherein said speech recognition system processes the input utterance and determines its confidence level against a pre-set confidence level;

wherein if the confidence level of the input utterance is higher than said pre-set confidence level, then the textual title corresponding to the input utterance is recognized and the command associated with the textual title corresponding to the input utterance is activated;

wherein if the confidence level of the input utterance is lower than said pre-set confidence level, then said speech recognition system passes the input utterance through an open vocabulary recognizer, which is designed to transcribe arbitrary statements or texts that do not match any particular grammar path; and

means for taking an output of said open vocabulary recognizer as an input to an Internet search engine, said Internet search engine returning a Web page containing a number of hits of Internet link options.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention discloses a system and method for speech-activated navigating or browsing via a speech control interface used in a speech-activated multifunctional communications system. In one embodiment, the invention provides an approach to extend speech-activated navigation by linking an output of an open vocabulary recognizer to an Internet search engine in order that a user may have more options to search information related to his spoken commands. In another embodiment, the invention provides a means to enable the user to orally navigate a database via a speech control interface wherein the selections and associated selection criteria are organized into a hierarchical view menu. In another embodiment, the invention provides an approach with high flexibility and accuracy to recognize the user'"'"'s command using a new grammar structure and a matching score system.

Citations

19 Claims

1. A computer readable storage medium encoded with instructions, which when loaded into a digital computational device establishes a speech-activated Web browsing system, the system comprising:
- a speech recognition system which is designed to recognize an input utterance against a list of candidate textual titles;
  
  means for extracting textual titles from a given Web page, each of said textual title representing a content item which can be rendered on screen when a command associated with said textual title is activated;
  
  means for generating a grammar for said textual titles extracted from said Web page;
  
  means for applying said grammar to said speech recognition system;
  
  wherein said speech recognition system processes the input utterance and determines its confidence level against a pre-set confidence level;
  
  wherein if the confidence level of the input utterance is higher than said pre-set confidence level, then the textual title corresponding to the input utterance is recognized and the command associated with the textual title corresponding to the input utterance is activated;
  
  wherein if the confidence level of the input utterance is lower than said pre-set confidence level, then said speech recognition system passes the input utterance through an open vocabulary recognizer, which is designed to transcribe arbitrary statements or texts that do not match any particular grammar path; and
  
  means for taking an output of said open vocabulary recognizer as an input to an Internet search engine, said Internet search engine returning a Web page containing a number of hits of Internet link options.

2. A method for extending speech-activated Web browsing using a speech control interface, the method comprising the steps of:
- extracting textual titles from a given Web page, each of said textual title representing a content item which can be rendered on screen when a command associated with said textual title is activated;
  
  generating a grammar for said extracted textual titles, wherein each textual title is associated with a grammar path;
  
  entering an input utterance to a speech recognition system which is designed to recognize the input utterance against a list of candidate textual titles;
  
  applying said grammar to said speech recognition system;
  
  processing the input utterance and determining its confidence level against a pre-set confidence level;
  
  if the confidence level the input utterance is higher than said pre-set confidence level, then recognizing the textual title corresponding to the input utterance and activating the command associated with the textual title corresponding to the input utterance;
  
  if the confidence level of the input utterance is lower than said pre-set confidence level, passing said input utterance through an open vocabulary recognizer, which is designed to transcribe arbitrary statements or texts that do not match any particular grammar path;
  
  taking an output of said open vocabulary recognizer as an input to an Internet search engine; and
  
  returning a Web page containing a number of hits of Internet link options.

3. A computer readable storage medium encoded with instructions, which when loaded into a digital computational device establishes a speech-activated browsing system, the system comprising:
- a speech recognition system which is used to recognize input utterance against a list of selections;
  
  means for dynamically generating a set of grammars from a hierarchical menu showing said selections and associated selection criteria, each of said grammar reflecting a selection criterion;
  
  a database embodying the same list of selections and the associated selection criteria present in said hierarchical menu, each selection representing an entry in said database, and each entry being associated with a selection criterion that said entry satisfies;
  
  means for applying a particular selection criterion against all entries of said database; and
  
  means for returning a list of database entries that satisfy said particular selection criterion.
- View Dependent Claims (4, 5)
- - 4. The system of claim 3, wherein said grammars for each different selection criterion are embedded in a larger grammar or a set of grammars which is constructed in such a manner that either a sequence of spoken commands or a fluent statement may be referenced to all database entries with desired property.
  - 5. The system of claim 4, wherein said sequence of spoken commands or said fluent statement may be referenced to a list of candidate selections if said sequence of spoken commands or said fluent statement does not exactly match a certain selection criterion.

6. In a speech-activated communications system having a speech recognition unit, a central processing unit, and a database containing a list of items, said list of items being represented by a hierarchical menu, each item in said database being referenced to a selection and an associated selection criterion present in said hierarchical menu, a speech-activated browsing system comprising:
- means for generating one or more grammars, reflecting different selection criteria which may be applied against entries of said database, means for applying a particular selection criterion against all entries of said database; and
  
  means for returning a list of database entries that satisfy said particular selection criterion. wherein said grammars for each different selection criterion are embedded in a larger grammar or a set of grammars which is constructed in such a manner that either a sequence of spoken commands or a fluent statement may be referenced to a set of database entries with desired property; and
  
  wherein said sequence of spoken commands or said fluent statement may be referenced to a list of candidate selections if said sequence of spoken commands or said fluent statement does not match a certain selection criterion.

7. A method for browsing a hierarchical menu by spoken commands, wherein said hierarchical menu contains a list of selections with associated selection criteria, said method comprising the steps of:
- entering a set of selection criteria by an input utterance;
  
  generating a set of grammars, each of said grammar reflecting a specific selection criterion;
  
  applying said grammars against all entries of a database embodying the same list of selections and the same selection criteria present in said hierarchical menu, wherein each selection represents an entry in said database, and wherein each entry is associated with a selection criterion that said entry satisfies;
  
  returning a list of entries that satisfy said selection criteria;
  
  wherein said grammars for each different selection criterion are embedded in a larger grammar or a set of grammars which is constructed in such a manner that either a sequence of spoken commands or a fluent statement may be referenced to a set of database entries with desired property; and
  
  wherein said sequence of spoken commands or said fluent statement may be referenced into a set of candidate selections if said sequence of spoken commands or said fluent statement does not match a certain selection criterion.

8. A method for browsing a hierarchical menu by spoken commands, wherein said hierarchical menu contains a list of selections with associated selection criteria, said method comprising the steps of:
- entering a sequence of selection criteria by an input utterance;
  
  generating a set of grammars, each of said grammar reflecting a specific selection criterion;
  
  applying a first grammar against all entries of a database embodying the same list of selections and the same selection criteria present in said hierarchical menu, each selection representing an entry in said database, and each entry being associated with a selection criterion that said entry satisfies;
  
  applying a second grammar to the database entries that satisfy the first selection criterion represented by said first grammar;
  
  applying a third grammar to the database entries that satisfy the second selection criterion represented by said second grammar;
  
  repeating the steps of applying grammars until the last grammar is applied;
  
  returning a list of entries that satisfy said selection criteria. wherein said grammars for each different selection criterion are embedded in a larger grammar or a set of grammars which is constructed in such a manner that either a sequence of spoken commands or a fluent statement may be referenced to a set of database entries with desired property.
- View Dependent Claims (9)
- - 9. The method of claim 8, wherein said sequence of spoken commands or said fluent statement may be referenced to a list of candidate selections if said sequence of spoken commands or said fluent statement does not match a certain selection criterion.

10. A computer readable storage medium encoded with instructions, which when loaded into a digital computational device establishes a speech-activated browsing system, the system comprising:
- a database storing a list of content items organized by content title, each content title representing a database entry;
  
  a speech recognizer transcribing an utterance into a digital signal representing a textual statement;
  
  means for generating a grammar structure comprising various grammar paths, wherein every single word of each candidate content title that may be recognized is assigned to a distinct grammar path;
  
  means for applying every single word of an output of said speech recognizer against each content title;
  
  means for computing a matching score for each candidate content title;
  
  means for mapping a content title with highest matching score into a command acceptable by a server which delivers the content item represented by said content title with highest matching score.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10, wherein every single word may be uttered any number of times and intermixed in order with any other word in said grammar structure.
  - 12. The system of claim 10, wherein said matching score may be defined by one or more parameters to ensure reliability.
  - 13. The system of claim 12, wherein said one or more parameters are selected from a group consisting of:
    - number of times that a word appears in a content title;
      
      relative position that a word appears in a content title;
      
      relative order that a word appears in a content title;
      
      length of candidate content title;
      
      a fraction of words in an output of said speech processor that could match any word in a specific content title;
      
      or said speech processor'"'"'s confidence of each word.
  - 14. The system of claim 10, wherein said each candidate content title is selected from the whole database entries or a list of database entries that satisfy a specific criterion which excludes certain content titles.

15. A method for browsing a content database by spoken commands, said content database containing a list of content items organized by content title, each content title representing a database entry, said method comprising the steps of:
- generating a grammar comprising various grammar paths, wherein every single word of each candidate content title that may be recognized is assigned to a distinct grammar path;
  
  recognizing an utterance by a speech recognition system that uses said grammar;
  
  computing a matching score for each candidate content title; and
  
  mapping a content title with highest matching score into a command acceptable by a server which delivers the content item represented by said content title with highest matching score.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, wherein said matching score is defined by multiple parameters.
  - 17. The method of claim 15, wherein said matching score is defined by one or more parameters selected from a group consisting of:
    - number of times that a word appears in a content title;
      
      relative position that a word appears in a content title;
      
      relative order that a word appears in a content title;
      
      length of candidate content title;
      
      a fraction of words in an output of said speech recognition system that could match any word in a specific content title;
      
      or said speech recognizer'"'"'s confidence of each word.
  - 18. The method of claim 15, wherein said each candidate content title is selected from the entire database entries or a list of database entries that satisfy a specific criterion which excludes certain content titles.

19. A method for browsing a content database by spoken commands, said content database containing a list of content items organized by content title, each content title representing a database entry, said method comprising the steps of:
- generating a grammar comprising various grammar paths, wherein every single word of a list of candidate content titles that may be recognized is assigned to a distinct grammar path, and wherein all words in said list of candidate content titles are optionally processed into a form that matches the form of words used in said grammar;
  
  matching each key word of an output of a speech recognizer to said list of candidate content titles, said speech recognizer transcribing an audio signal into digital signal representing said spoken commands;
  
  computing a matching score for each candidate content title; and
  
  mapping a content title with highest matching score into a command acceptable by a server which delivers the content represented by said content title with highest matching score;
  
  wherein said matching score is defined by one or more parameters selected from a group consisting of;
  
  number of times that a word appears in a content title;
  
  relative position that a word appears in a content title;
  
  relative order that a word appears in a content title;
  
  length of candidate content title;
  
  a fraction of words in an output of said speech recognizer that could match any word in a specific content title;
  
  or said speech recognizer'"'"'s confidence of each word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Promptu Systems Corporation
Original Assignee
Promptu Systems Corporation
Inventors
Dubreuil, Jerome, Julia, Luc E., Bing, Jehan G.

Granted Patent

US 7,222,073 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270.1
CPC Class Codes

G10L 15/26 Speech to text systems G10L...

System and method for speech activated navigation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for speech activated navigation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links