Realtime acoustic adaptation using stability measures
First Claim
1. A computer-implemented method comprising:
- receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile;
receiving a stability measure for a segment of the transcription;
determining that the stability measure for the segment satisfies a threshold;
in response to determining that the stability measure for the segment satisfies the threshold, triggering a real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment;
receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile; and
outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile; receiving a stability measure for a segment of the transcription; determining that the stability measure for the segment satisfies a threshold; in response to determining that the stability measure for the segment satisfies the threshold, triggering a real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment; receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile; and outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
receiving a first portion of a speech session; decoding the first portion of the speech session to generate a transcription of the first portion of the speech session using a speaker adaptation profile; identifying a segment of the transcription of the first portion of the speech session; determining a stability measure of the segment; determining that the stability measure for the segment satisfies a threshold; in response to determining that the stability measure for the segment satisfies the threshold, triggering real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment; receiving a second portion of the speech session; decoding the second portion of the speech session to generate a transcription of the second portion of the speech session using the updated speaker adaptation profile; and outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.
-
-
14. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile; receiving a stability measure for a segment of the transcription; determining that the stability measure for the segment satisfies a threshold; in response to determining that the stability measure for the segment satisfies the threshold, triggering a real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment; receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile; and outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving a first portion of a speech session; decoding the first portion of the speech session to generate a transcription of the first portion of the speech session using a speaker adaptation profile; identifying a segment of the transcription of the first portion of the speech session; determining a stability measure of the segment; determining that the stability measure for the segment satisfies a threshold; in response to determining that the stability measure for the segment satisfies the threshold, triggering real-time update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment; receiving a second portion of the speech session; decoding the second portion of the speech session to generate a transcription of the second portion of the speech session using the updated speaker adaptation profile; and outputting a set of transcriptions comprising the transcription of the first portion of the speech session and the transcription of the second portion of the speech session.
-
Specification