Methods and systems for providing speech recognition systems based on speech recordings logs

US 8,494,853 B1
Filed: 01/04/2013
Issued: 07/23/2013
Est. Priority Date: 01/04/2013
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving one or more data logs, wherein the one or more data logs comprise at least one or more recordings of spoken queries;

transcribing the one or more recordings of spoken queries;

identifying within transcriptions of the one or more recordings of spoken queries transcriptions having an occurrence exceeding a threshold, wherein the threshold is based on a comparison of the transcriptions with previous transcribed queries;

processing, by a computing device, recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model;

based on a comparison of the processing using the language model with the processing using the acoustic model, identifying, from the one or more data logs, one or more recordings of spoken queries corresponding to transcriptions deemed to be due to noise and a remainder of the one or more recordings of spoken queries;

generating one or more modified data logs including the remainder of the recordings of spoken queries; and

providing the one or more modified data logs and associated transcriptions of the one or more recordings of spoken queries within the one or more modified data logs as a training data set to update one or more acoustic models for particular languages.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Examples of methods and systems for providing speech recognition systems based on speech recordings logs are described. In some examples, a method may be performed by a computing device within a system to generate modified data logs to use as a training data set for an acoustic model for a particular language. A device may receive one or more data logs that comprise at least one or more recordings of spoken queries and transcribe the recordings. Based on comparisons, the device may identify any transcriptions that may be indicative of noise and may remove those transcriptions indicative of noise from the data logs. Further, the device may remove unwanted transcriptions from the data logs and the device may provide the modified data logs as a training data set to one or more acoustic models for particular languages.

Citations

20 Claims

1. A method, comprising:
- receiving one or more data logs, wherein the one or more data logs comprise at least one or more recordings of spoken queries;
  
  transcribing the one or more recordings of spoken queries;
  
  identifying within transcriptions of the one or more recordings of spoken queries transcriptions having an occurrence exceeding a threshold, wherein the threshold is based on a comparison of the transcriptions with previous transcribed queries;
  
  processing, by a computing device, recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model;
  
  based on a comparison of the processing using the language model with the processing using the acoustic model, identifying, from the one or more data logs, one or more recordings of spoken queries corresponding to transcriptions deemed to be due to noise and a remainder of the one or more recordings of spoken queries;
  
  generating one or more modified data logs including the remainder of the recordings of spoken queries; and
  
  providing the one or more modified data logs and associated transcriptions of the one or more recordings of spoken queries within the one or more modified data logs as a training data set to update one or more acoustic models for particular languages.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - determining a confidence score based on processing recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model; and
      
      removing from the one or more data logs given recordings of spoken queries based on the confidence score.
  - 3. The method of claim 1, further comprising identifying the transcriptions based also on a length of the transcriptions.
  - 4. The method of claim 1, further comprising removing one or more transcriptions of the one or more recordings of spoken queries that include one or more numerical sequences.
  - 5. The method of claim 1, further comprising removing transcriptions comprising a uniform resource locator (URL).
  - 6. The method of claim 1, wherein receiving one or more data logs further comprises receiving up to a threshold amount of data logs from one or more applications.
  - 7. The method of claim 1, wherein receiving one or more data logs further comprises receiving an amount of speech queries based on a gender of a user.

8. A computer readable medium having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions comprising:
- receiving one or more data logs, wherein the one or more data logs comprise at least one or more recordings of spoken queries;
  
  transcribing the one or more recordings of spoken queries;
  
  identifying within transcriptions of the one or more recordings of spoken queries transcriptions having an occurrence exceeding a threshold, wherein the threshold is based on a comparison of the transcriptions with previous transcribed queries;
  
  processing recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model;
  
  based on a comparison of the processing using the language model with the processing using the acoustic model, identifying, from the one or more data logs, one or more recordings of spoken queries corresponding to transcriptions deemed to be due to noise and a remainder of the one or more recordings of spoken queries;
  
  generating one or more modified data logs including the remainder of the recordings of spoken queries; and
  
  providing the one or more modified data logs and associated transcriptions of the one or more recordings of spoken queries within the one or more modified data logs as a training data set to update one or more acoustic models for particular languages.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer readable medium of claim 8, wherein the functions further comprise:
    - determining a confidence score based on processing recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model; and
      
      removing from the one or more data logs given recordings of spoken queries based on the confidence score.
  - 10. The computer readable medium of claim 8, wherein the functions further comprise identifying the transcriptions based also on a length of the transcriptions.
  - 11. The computer readable medium of claim 8, wherein the functions further comprise removing one or more transcriptions of the one or more recordings of spoken queries containing one or more numerical sequences.
  - 12. The computer readable medium of claim 8, wherein the functions further comprise removing transcriptions comprising a uniform resource locator (URL).
  - 13. The computer readable medium of claim 8, wherein the function of receiving one or more data logs further comprises receiving up to a threshold amount of data logs from one or more applications.
  - 14. The computer readable medium of claim 8, wherein the functions further comprise receiving an amount of speech queries based on a gender of a user.

15. A system, comprising:
- at least one processor; and
  
  data storage comprising program instructions executable by the at least one processor to cause the at least one processor to perform functions comprising;
  
  receiving one or more data logs, wherein the one or more data logs comprise at least one or more recordings of spoken queries;
  
  transcribing the one or more recordings of spoken queries;
  
  identifying within transcriptions of the one or more recordings of spoken queries transcriptions having an occurrence exceeding a threshold, wherein the threshold is based on a comparison of the transcriptions with previous transcribed queries;
  
  processing recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model;
  
  based on a comparison of the processing using the language model with the processing using the acoustic model, identifying, from the one or more data logs, one or more recordings of spoken queries corresponding to transcriptions deemed to be due to noise and a remainder of the one or more recordings of spoken queries;
  
  generating one or more modified data logs including the remainder of the recordings of spoken queries; and
  
  providing the one or more modified data logs and associated transcriptions of the one or more recordings of spoken queries within the one or more modified data logs as a training data set to update one or more acoustic models for particular languages.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the functions further comprise:
    - determining a confidence score based on processing recordings of spoken queries corresponding to the identified transcriptions using both a language model and an acoustic model; and
      
      removing from the one or more data logs given recordings of spoken queries based on the confidence score.
  - 17. The system of claim 15, wherein the functions further comprise identifying the transcriptions based also on a length of the transcriptions.
  - 18. The system of claim 15, wherein the functions further comprise removing one or more transcriptions of the one or more recordings of spoken queries that include one or more numerical sequences.
  - 19. The system of claim 15, wherein the functions further comprise removing transcriptions comprising a uniform resource locator (URL).
  - 20. The system of claim 15, wherein the functions further comprise receiving up to a threshold amount of data logs from one or more applications.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Weinstein, Eugene, Mengibar, Pedro J. Moreno
Primary Examiner(s)
Chawan, Vijay B

Application Number

US13/734,296
Time in Patent Office

200 Days
Field of Search

704/235, 704/270.1, 704/270, 704/251, 704/231, 704/277, 704/257, 704/244, 704/255, 704/240, 704/260, 704/275, 704/233, 434/167, 434/308, 715/234
US Class Current

704/235
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/32   Multiple recognisers used i...

G10L 2015/0636   Threshold criteria for the ...

Methods and systems for providing speech recognition systems based on speech recordings logs

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for providing speech recognition systems based on speech recordings logs

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links