Method and apparatus for identifying acoustic background environments based on time and speed to enhance automatic speech recognition

US 8,762,143 B2
Filed: 05/29/2007
Issued: 06/24/2014
Est. Priority Date: 05/29/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

analyzing acoustic features of a received audio signal from a caller of a communication device;

identifying, based on a previously recorded time and a previously recorded speed of the caller of the communication device, in combination with the acoustic features, a repeating pattern of meta-data associated with the acoustic features;

classifying a background environment of the caller based on the acoustic features and the repeating pattern of meta-data, to yield a background environment classification;

prompting the caller to perform one of;

speaking more slowly, speaking more clearly, and moving to a quieter location based on the background environment classification;

selecting an acoustic model matched to the background environment classification from a plurality of acoustic models, each of the plurality of acoustic models being generated for a particular predefined background environment classification;

and performing speech recognition on the received audio signal using the acoustic model.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.

24 Citations

View as Search Results

22 Claims

1. A method comprising:
- analyzing acoustic features of a received audio signal from a caller of a communication device;
  
  identifying, based on a previously recorded time and a previously recorded speed of the caller of the communication device, in combination with the acoustic features, a repeating pattern of meta-data associated with the acoustic features;
  
  classifying a background environment of the caller based on the acoustic features and the repeating pattern of meta-data, to yield a background environment classification;
  
  prompting the caller to perform one of;
  
  speaking more slowly, speaking more clearly, and moving to a quieter location based on the background environment classification;
  
  selecting an acoustic model matched to the background environment classification from a plurality of acoustic models, each of the plurality of acoustic models being generated for a particular predefined background environment classification;
  
  and performing speech recognition on the received audio signal using the acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the background environment classification comprises one of office, airport, street, vehicle, train and home.
  - 3. The method of claim 2, wherein the background environment is classified based on two levels comprising a first level from the listing of background environments and a second, finer, level based on specific geographic location.
  - 4. The method of claim 3, further comprising:
    - classifying the background environment periodically throughout a call and selecting new acoustic models that match new background environment classifications for the call.
  - 5. The method of claim 4, wherein when a first acoustic model is selected in a call based on the new background environment classifications a second acoustic model is selected to substitute the first, and the method further comprises:
    - starting the second acoustic model with an initial state associated with a previous state where the first acoustic model ended.
  - 6. The method of claim 1, further comprising:
    - classifying a first background environment in a call and thereafter classifying a second background environment; and
      
      transitioning from a first acoustic model associated with the first background environment to a second acoustic model associated with the second background environment by;
      
      starting the second acoustic model at an initial state similar to an ending state of the first acoustic model when the first acoustic model and the second acoustic model have similar structure; and
      
      applying a morphing algorithm to the transition from the first acoustic model to the second acoustic model if the first acoustic model and the second acoustic model have dissimilar structures.
  - 7. The method of claim 6, wherein applying the morphing algorithm further comprises applying the morphing algorithm at a phone level.
  - 8. The method of claim 1, wherein the acoustic features comprise one of estimates of background energy, signal-to-noise ratio, and spectral characteristics of the background environment.
  - 9. The method of claim 1, where the meta-data comprises one of global positioning system coordinates, elevation, automatic number identification information, computing device identification number (comprised of an internet protocol address or MAC address), uniform resource locator address, individual environmental habits, personal profile information, time, and rate of movement.
  - 10. The method of claim 1, wherein the meta-data comprises personal information associated with the caller and comprises probabilities that the caller is in a particular background environment.
  - 11. The method of claim 1, wherein speech recognition is applied to provide speech transcription of the audio signal.

12. A system comprising:
- a processor;
  
  and a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising;
  
  analyzing acoustic features of a received audio signal from a caller of a communication device;
  
  identifying, based on a previously recorded time and a previously recorded speed of the caller of the communication device, in combination with the acoustic features, a repeating pattern of meta-data associated with the acoustic features;
  
  classifying a background environment of the caller based on the acoustic features and the repeating pattern of meta-data to yield a background environment classification;
  
  prompting the caller to perform one of;
  
  speaking more slowly, speaking more clearly, and moving to a quieter location based on the background environment classification;
  
  selecting an acoustic model matched to the background environment classification from a plurality of acoustic models, each of the plurality of acoustic models being generated for a particular background environment; and
  
  performing speech recognition on the received audio signal using the acoustic model.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The system of claim 12, wherein the background environment classification comprises one of office, airport, street, vehicle, train and home.
  - 14. The system of claim 13, wherein the background environment is classified based on two levels comprising a first level from the listing of background environments in claim 13 and a second, finer, level based on specific geographic location.
  - 15. The system of claim 14, the computer-readable storage medium having additional instructions stored which result in the operations further comprising:
    - classifying the background environment periodically throughout a call and selecting new acoustic models that match new background environment classifications for the call.
  - 16. The system of claim 15, wherein when a first acoustic model is selected in a call based on the new background environment classifications a second acoustic model is selected to substitute the first, the computer-readable storage medium has additional instructions stored which result in the operations further comprising:
    - starting the second acoustic model with an initial state associated with a previous state where the first acoustic model ended.
  - 17. The system of claim 12, the computer-readable storage medium having additional instructions which result in the operations further comprising:
    - classifying a first background environment in a call and thereafter classifying a second background environment;
      
      transitioning from a first acoustic model associated with the first background environment to a second acoustic model associated with the second background environment by;
      
      starting the second acoustic model at an initial state similar to an ending state of the first acoustic model when the first acoustic model and the second acoustic model have similar structure; and
      
      applying a morphing algorithm to the transition from the first acoustic model to the second acoustic model when the first acoustic model and the second acoustic model have dissimilar structures.
  - 18. The system of claim 17, wherein applying the morphing algorithm further comprises applying the morphing algorithm at a phone level.
  - 19. The system of claim 12, wherein the acoustic features comprise one of estimates of background energy, signal-to-noise ratio, and spectral characteristics of the background environment.
  - 20. The system of claim 12, where the meta-data comprises one of global positioning system coordinates, elevation, automatic number identification information, computing device identification number (such as an internet protocol address or MAC address), uniform resource locator address, individual environmental habits, personal profile information, time, and rate of movement.
  - 21. The system of claim 12, wherein the meta-data comprises personal information associated with the caller and comprises probabilities that the caller is in a particular background environment.

22. A computer readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- analyzing acoustic features of a received audio signal from a caller of a communication device;
  
  identifying, based on a previously recorded time and a previously recorded speed of the caller of the communication device, in combination with the acoustic features, a repeating pattern of meta-data associated with the acoustic features;
  
  classifying a background environment of the caller based on the acoustic features and the repeating pattern of meta-data, to yield a background environment classification;
  
  prompting the caller to perform one of;
  
  speaking more slowly, speaking more clearly, and moving to a quieter location based on the background environment classification;
  
  selecting an acoustic model matched to the background environment classification from a plurality of acoustic models, each of the plurality of acoustic models being generated for a particular background environment classification;
  
  and performing speech recognition as the received audio signal using the acoustic model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Gilbert, Mazin
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US11/754,814
Publication Number

US 20080300871A1
Time in Patent Office

2,583 Days
Field of Search

704/246, 704/270.1, 704/278
US Class Current

704/233
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/07   to the speaker

G10L 15/08   Speech classification or se...

G10L 15/20   Speech recognition techniqu...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 21/0216   characterised by the method...

G11B 27/034   on discs G11B27/036, G11B27...

Method and apparatus for identifying acoustic background environments based on time and speed to enhance automatic speech recognition

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for identifying acoustic background environments based on time and speed to enhance automatic speech recognition

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links