Automatic language model update

US 10,410,627 B2
Filed: 03/15/2018
Issued: 09/10/2019
Est. Priority Date: 04/03/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining, by a server-side, updater module of a search system, for each of one or more terms, word count data indicating a number of times that a term has occurred within one or more real-time textual information streams within a predetermined period of time;

after obtaining, for each of the one or more terms, word count data indicating a number of times that the term has occurred within the one or more real-time textual information streams within the predetermined period of time, receiving, by a search engine of the search system, a query including one or more particular terms from a mobile device or a digital assistant device;

in response to receiving, by the search engine of the search system, the query including one or more particular terms from the mobile device or the digital assistant device, transmitting (i) one or more search results associated with the query that are identified by the search engine of the search system, and (ii) particular word count data indicating the number of times that the particular term has occurred within the one or more real-time textual information streams within the predetermined period of time, for use by a speech recognition model trainer that is included on the mobile device or the digital assistant device in updating a language model that is used by an automated speech recognizer that is included on the mobile device or the digital assistant device; and

updating, by the speech recognition model trainer that is included on the mobile device or the digital assistant device, statistical information associated with the language model based at least on the particular word count, to favor one or more words that were received by the search engine within a recent time period over one or more words that were not received by the search engine within the recent time period.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.

90 Citations

17 Claims

1. A computer-implemented method comprising:
- obtaining, by a server-side, updater module of a search system, for each of one or more terms, word count data indicating a number of times that a term has occurred within one or more real-time textual information streams within a predetermined period of time;
  
  after obtaining, for each of the one or more terms, word count data indicating a number of times that the term has occurred within the one or more real-time textual information streams within the predetermined period of time, receiving, by a search engine of the search system, a query including one or more particular terms from a mobile device or a digital assistant device;
  
  in response to receiving, by the search engine of the search system, the query including one or more particular terms from the mobile device or the digital assistant device, transmitting (i) one or more search results associated with the query that are identified by the search engine of the search system, and (ii) particular word count data indicating the number of times that the particular term has occurred within the one or more real-time textual information streams within the predetermined period of time, for use by a speech recognition model trainer that is included on the mobile device or the digital assistant device in updating a language model that is used by an automated speech recognizer that is included on the mobile device or the digital assistant device; and
  
  updating, by the speech recognition model trainer that is included on the mobile device or the digital assistant device, statistical information associated with the language model based at least on the particular word count, to favor one or more words that were received by the search engine within a recent time period over one or more words that were not received by the search engine within the recent time period.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein one or more of the real-time textual information streams comprises recently received search queries.
  - 3. The method of claim 1, wherein one or more of the real-time textual information streams comprises transcriptions of live television broadcasts.
  - 4. The method of claim 1, wherein the one or more terms comprises a proper subset of all of the terms that are determined to occur in the real-time textual information streams within the predetermined period of time.
  - 5. The method of claim 1, comprising using, by the automated speech recognizer that is included on the mobile device or the digital assistant device, the generated, language model in automatically transcribing a subsequently received utterance that includes the particular term.
  - 6. The method of claim 1, comprising monitoring the one or more real-time textual information streams.

7. A search system comprising:
- a processor configured to execute computer program instructions; and
  
  a computer storage medium encoded with the computer program instructions that, when executed by the processor, cause the system to perform operations comprising;
  
  obtaining, by a server-side, updater module of the search system, for each of one or more terms, word count data indicating a number of times that a term has occurred within one or more real-time textual information streams within a predetermined period of time;
  
  after obtaining, for each of the one or more terms, word count data indicating a number of times that the term has occurred within the one or more real-time textual information streams within the predetermined period of time, receiving, by a search engine of the search system, a query including one or more particular terms from a mobile device or a digital assistant device;
  
  in response to receiving, by the search engine of the search system, the query including one or more particular terms from the mobile device or the digital assistant device, transmitting (i) one or more search results associated with the query that are identified by the search engine of the search system, and (ii) particular word count data indicating the number of times that the particular term has occurred within the one or more real-time textual information streams within the predetermined period of time, for use by a speech recognition model trainer that is included on the mobile device or the digital assistant device in updating a language model that is used by an automated speech recognizer that is included on the mobile device or the digital assistant device; and
  
  updating, by the speech recognition model trainer that is included on the mobile device or the digital assistant device, statistical information associated with the language model based at least on the particular word count, to favor one or more words that were received by the search engine within a recent time period over one or more words that were not received by the search engine within the recent time period.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein one or more of the real-time textual information streams comprises recently received search queries.
  - 9. The system of claim 7, wherein one or more of the real-time textual information streams comprises transcriptions of live television broadcasts.
  - 10. The system of claim 7, wherein the one or more terms comprises a proper subset of all of the terms that are determined to occur in the real-time textual information streams within the predetermined period of time.
  - 11. The system of claim 7, wherein the operations comprise using, by the automated speech recognizer that is included on the mobile device or the digital assistant device, the generated, language model in automatically transcribing a subsequently received utterance that includes the particular term.
  - 12. The system of claim 7, wherein the operations comprise monitoring the one or more real-time textual information streams.

13. A non-transitory computer-readable storage device encoded with a computer program, the computer program comprising instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
- obtaining, by a server-side, updater module of a search system, for each of one or more terms, word count data indicating a number of times that a term has occurred within one or more real-time textual information streams within a predetermined period of time;
  
  after obtaining, for each of the one or more terms, word count data indicating a number of times that the term has occurred within the one or more real-time textual information streams within the predetermined period of time, receiving, by a search engine of the search system, a query including one or more particular terms from a mobile device or a digital assistant device;
  
  in response to receiving, by the search engine of the search system, the query including one or more particular terms from the mobile device or the digital assistant device, transmitting (i) one or more search results associated with the query that are identified by the search engine of the search system, and (ii) particular word count data indicating the number of times that the particular term has occurred within the one or more real-time textual information streams within the predetermined period of time for use by a speech recognition model trainer that is included on the mobile device or the digital assistant device in updating a language model that is used by an automated speech recognizer that is included on the mobile device or the digital assistant device; and
  
  updating, by the speech recognition model trainer that is included on the mobile device or the digital assistant device, statistical information associated with the language model based at least on the particular word count, to favor one or more words that were received by the search engine within a recent time period over one or more words that were not received by the search engine within the recent time period.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The device of claim 13, wherein one or more of the real-time textual information streams comprises recently received search queries.
  - 15. The device of claim 13, wherein one or more of the real-time textual information streams comprises transcriptions of live television broadcasts.
  - 16. The device of claim 13, wherein the one or more terms comprises a proper subset of all of the terms that are determined to occur in the real-time textual information streams within the predetermined period of time.
  - 17. The device of claim 13, wherein the operations comprise using, by the automated speech recognizer that is included on the mobile device or the digital assistant device, the generated, language model in automatically transcribing a subsequently received utterance that includes the particular term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Cohen, Michael H., Baluja, Shumeet, Moreno Mengibar, Pedro J.
Primary Examiner(s)
Serrou, Abdelali

Application Number

US15/922,154
Publication Number

US 20180204565A1
Time in Patent Office

544 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0635   updating or merging of old ...

Automatic language model update

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

90 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Automatic language model update

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

90 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others