Realtime acoustic adaptation using stability measures

US 8,515,750 B1
Filed: 09/19/2012
Issued: 08/20/2013
Est. Priority Date: 06/05/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile;

receiving a stability measure for a segment of the transcription;

determining that the stability measure for the segment satisfies a threshold;

in response to determining that the stability measure for the segment satisfies the threshold, triggering a real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment;

receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile; and

outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile;
  
  receiving a stability measure for a segment of the transcription;
  
  determining that the stability measure for the segment satisfies a threshold;
  
  in response to determining that the stability measure for the segment satisfies the threshold, triggering a real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment;
  
  receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile; and
  
  outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the segment comprises a word, sub-word, or group of words.
  - 3. The method of claim 1, wherein the speech session is longer than 1 minute in duration.
  - 4. The method of claim 1, wherein the stability measure is based on one or more of an age metric, a right context metric, and a regression.
  - 5. The method of claim 1, wherein triggering an update of the speaker adaptation profile comprises adding the segment to an adaptation queue.
  - 6. The method of claim 1, further comprising:
    - receiving the first portion of the speech session; and
      
      decoding the first portion of the speech session to generate the transcription of the first portion of the speech session using the speaker adaptation profile.
  - 7. The method of claim 1, further comprising determining the stability measure of the segment.
  - 8. The method of claim 1, further comprising:
    - receiving the second portion of the speech session; and
      
      decoding the second portion of the speech session to generate the transcription of the second portion of the speech session using the updated speaker adaptation profile.
  - 9. The method of claim 1, further comprising updating the speaker adaptation profile.
  - 10. The method of claim 9, wherein updating the speaker adaptation profile comprises modifying an acoustic model.
  - 11. The method of claim 1, wherein the stability measure represents a probability.
  - 12. The method of claim 1, wherein the speech session comprises one utterance.

13. A computer-implemented method comprising:
- receiving a first portion of a speech session;
  
  decoding the first portion of the speech session to generate a transcription of the first portion of the speech session using a speaker adaptation profile;
  
  identifying a segment of the transcription of the first portion of the speech session;
  
  determining a stability measure of the segment;
  
  determining that the stability measure for the segment satisfies a threshold;
  
  in response to determining that the stability measure for the segment satisfies the threshold, triggering real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment;
  
  receiving a second portion of the speech session;
  
  decoding the second portion of the speech session to generate a transcription of the second portion of the speech session using the updated speaker adaptation profile; and
  
  outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.

14. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile;
  
  receiving a stability measure for a segment of the transcription;
  
  determining that the stability measure for the segment satisfies a threshold;
  
  in response to determining that the stability measure for the segment satisfies the threshold, triggering a real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment;
  
  receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile; and
  
  outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, wherein the speech session is longer than 1 minute in duration.
  - 16. The system of claim 14, wherein the stability measure is based on one or more of an age metric, a right context metric, and a regression.
  - 17. The system of claim 14, further comprising updating the speaker adaptation profile.
  - 18. The system of claim 17, wherein updating the speaker adaptation profile comprises modifying an acoustic model.
  - 19. The system of claim 14, wherein the speech session is one utterance.

20. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving a first portion of a speech session;
  
  decoding the first portion of the speech session to generate a transcription of the first portion of the speech session using a speaker adaptation profile;
  
  identifying a segment of the transcription of the first portion of the speech session;
  
  determining a stability measure of the segment;
  
  determining that the stability measure for the segment satisfies a threshold;
  
  in response to determining that the stability measure for the segment satisfies the threshold, triggering real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment;
  
  receiving a second portion of the speech session;
  
  decoding the second portion of the speech session to generate a transcription of the second portion of the speech session using the updated speaker adaptation profile; and
  
  outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lei, Xin, Aleksic, Petar
Primary Examiner(s)
Chawan, Vijay B

Application Number

US13/622,576
Time in Patent Office

335 Days
Field of Search

704/270.1, 704/251, 704/235, 704/243, 704/244, 704/260, 704/270, 704/211, 704/231, 704/254, 704/245, 704/246, 704/233, 704/236, 379/88.02, 379/88.01, 709/231, 709/224
US Class Current

704/235
CPC Class Codes

G10L 15/07   to the speaker

G10L 15/26   Speech to text systems G10L...

G10L 17/14   Use of phonemic categorisat...

Realtime acoustic adaptation using stability measures

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Realtime acoustic adaptation using stability measures

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links