Language model selection for speech-to-text conversion

US 9,495,127 B2
Filed: 12/22/2010
Issued: 11/15/2016
Est. Priority Date: 12/23/2009
Status: Active Grant

First Claim

Patent Images

1. A method of converting speech to text, comprising:

generating a language model by analyzing textual content for a first web page to determine a topic of the first web page, determining other pages determined to be directed to the same topic of the first web page, and analyzing content of queries previously submitted to the other pages determined to be directed to the same topic of the first web page, wherein the queries previously submitted to the other pages include queries submitted to respective search capabilities of at least some of the other pages;

receiving, at a computer server system and from an electronic device, sound information from a user of the electronic device, and a context identifier that indicates a context within which the user provided the sound information;

using the context identifier to select the generated language model from among a plurality of language models;

converting speech in the sound information to text using the selected language model; and

providing the text for use by the electronic device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device.

233 Citations

21 Claims

1. A method of converting speech to text, comprising:
- generating a language model by analyzing textual content for a first web page to determine a topic of the first web page, determining other pages determined to be directed to the same topic of the first web page, and analyzing content of queries previously submitted to the other pages determined to be directed to the same topic of the first web page, wherein the queries previously submitted to the other pages include queries submitted to respective search capabilities of at least some of the other pages;
  
  receiving, at a computer server system and from an electronic device, sound information from a user of the electronic device, and a context identifier that indicates a context within which the user provided the sound information;
  
  using the context identifier to select the generated language model from among a plurality of language models;
  
  converting speech in the sound information to text using the selected language model; and
  
  providing the text for use by the electronic device.
- View Dependent Claims (2, 3, 4)
- - 2. The method claim 1, wherein the sound information was provided to the electronic device upon identifying that the user selected a selectable virtual control displayed along with a virtual keyboard interface on the electronic device, the user selection having caused the electronic device to begin listening for spoken input using an application programmed to convert user spoken and typed input into text to be provided to other applications on the electronic device, andwherein an operating system of the electronic device makes available, to the user of the electronic device, the virtual keyboard interface and a mechanism for speaking input for entering data to multiple applications on the electronic device, and provides, to a particular one of the multiple applications determined to represent the context within which the user provided the sound information, text that corresponds to a user input.
  - 3. The method of claim 1, wherein the context identifier identifies a topic for a web page that was being presented by the electronic device when the sound information was input by the user, andwherein using the context identifier to select the generated language model comprises selecting the generated language model based on a match between the topic identified by the context identifier and the topic of the first web page and the other pages.
  - 4. The method of claim 1, wherein the context identifier identifies a web page that was being presented by the electronic device when the sound information was input by the user, andwherein using the context identifier to select the generated language model comprises selecting the generated language model based on a match between the web page that was being presented by the electronic device and the first web page or one of the other pages.

5. A method of converting speech to text, comprising:
- generating a plurality of language models by analyzing textual content for a web page to determine a topic of the web page, determining other pages determined to be directed to the same topic of the web page, and analyzing textual content of queries previously submitted by a plurality of users to the other pages determined to be directed to the same topic of the web page, wherein the queries previously submitted by the plurality of users to the other pages include queries submitted to respective search capabilities of at least some of the other pages;
  
  receiving, at a computer server system and from an electronic device, sound information spoken by a user of the electronic device, and a context identifier of the web page, wherein the web page was being presented by the electronic device when the sound information was spoken by the user;
  
  selecting, using the context identifier and from among the plurality of language models, a language model appropriate for the context identifier;
  
  converting speech in the sound information to text using the selected language model; and
  
  providing the text for use by the electronic device.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5, further comprising selecting the other pages by a clustering analysis on a graph having pages as nodes that are connected to each other by queries for which the other pages are responsive.
  - 7. The method of claim 6, wherein a web page is determined to be responsive to a query if the web page is a top n ranked search result for the query in a set of ranked search results relevant to the query, wherein n is a predetermined integer.

8. A system comprising:
- a data processing apparatus; and
  
  storage coupled to the data processing apparatus storing code that when executed by the data processing apparatus causes the data processing apparatus to perform operations comprising;
  
  generating a language model by analyzing textual content for a first web page to determine a topic of the first web page, determining other pages determined to be directed to the same topic of the first web page, and analyzing content of queries previously submitted to the other pages determined to be directed to the same topic of the first web page, wherein the queries previously submitted to the other pages include queries submitted to respective search capabilities of at least some of the other pages;
  
  receiving, at a computer server system and from an electronic device, sound information from a user of the electronic device, and a context identifier that indicates a context within which the user provided the sound information;
  
  using the context identifier to select the generated language model from among a plurality of language models;
  
  converting speech in the sound information to text using the selected language model; and
  
  providing the text for use by the electronic device.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the context identifier identifies a field of a form in which input on the electronic device is received that corresponds to the sound information.
  - 10. The system of claim 8, wherein the context identifier identifies a web page that was being presented by the electronic device when the sound information was input by the user.
  - 11. The system of claim 8, wherein the sound information was provided to the electronic device upon identifying that the user selected a selectable virtual control displayed along with a virtual keyboard interface on the electronic device, the user selection having caused the electronic device to begin listening for spoken input using an application programmed to convert user spoken and typed input into text to be provided to other applications on the electronic device.
  - 12. The system of claim 11, wherein generating the language model comprises generating the language model by analyzing textual content for queries to which the first web page, and the other pages are responsive.
  - 13. The system of claim 12, wherein determining the other pages comprises selecting other pages that are related to the first web page by a clustering analysis on a graph having pages as nodes that are connected to each other by queries for which the other pages are responsive.
  - 14. The system of claim 13, wherein a web page is determined to be responsive to a query if the web page is a top n ranked search result for the query in a set of ranked search results relevant to the query, wherein n is a predetermined integer.

15. A computer-readable storage device encoded with a computer program product, the computer program product including instructions that, when executed, cause data processing apparatus to perform operations comprising:
- generating a language model by analyzing textual content for a first web page to determine a topic of the first web page, determining other pages determined to be directed to the same topic of the first web page, and analyzing content of queries previously submitted to the other pages determined to be directed to the same topic of the first web page, wherein the queries previously submitted to the other pages include queries submitted to respective search capabilities of at least some of the other pages;
  
  receiving, at a computer server system and from an electronic device, sound information from a user of the electronic device, and a context identifier that indicates a context within which the user provided the sound information;
  
  using the context identifier to select the generated language model from among a plurality of language models;
  
  converting speech in the sound information to text using the selected language model; and
  
  providing the text for use by the electronic device.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer-readable storage device of claim 15, wherein the context identifier identifies a field of a form in which input on the electronic device is received, which input corresponds to the sound information.
  - 17. The computer-readable storage device of claim 15, wherein the context identifier identifies a web page that was being presented by the electronic device when the sound information was input by the user.
  - 18. The computer-readable storage device of claim 15, wherein the sound information was provided to the electronic device upon identifying that the user selected a selectable virtual control displayed along with a virtual keyboard interface on the electronic device, the user selection having caused the electronic device to begin listening for spoken input using an application programmed to convert user spoken and typed input into text to be provided to other applications on the electronic device.
  - 19. The computer-readable storage device of claim 18, wherein generating the language model comprises generating the language model by analyzing textual content for queries to which the first web page, and the other pages are responsive.
  - 20. The computer-readable storage device of claim 19, wherein determining the other pages comprises selecting other pages that are related to the first web page by a clustering analysis on a graph having pages as nodes that are connected to each other by queries for which the other pages are responsive.
  - 21. The computer-readable storage device of claim 20, wherein a web page is determined to be responsive to a query if the web page is a top n ranked search result for the query in a set of ranked search results relevant to the query, wherein n is a predetermined integer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ballinger, Brandon M., Schalkwyk, Johan, Cohen, Michael H., Allauzen, Cyril Georges Luc
Primary Examiner(s)
Ortiz Sanchez, Michael

Application Number

US12/976,920
Publication Number

US 20110153324A1
Time in Patent Office

2,155 Days
Field of Search

704/231, 704/251, 704/246, 704/250, 704/E15.003, 704/E15.019
US Class Current

1/1
CPC Class Codes

G06F 3/04886   by partitioning the display...

G06F 3/167   Audio in a user interface, ...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/58   Use of machine translation,...

G10L 15/005   Language recognition

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Language model selection for speech-to-text conversion

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

233 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Language model selection for speech-to-text conversion

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

233 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links