Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same

US 9,224,394 B2
Filed: 03/23/2010
Issued: 12/29/2015
Est. Priority Date: 03/24/2009
Status: Active Grant

First Claim

Patent Images

1. A method for implementing an interactive automated system, comprising:

using a telematics processing system located in proximity to a person, receiving in a first interaction an indication of intent from the person and, thereafter, in a second interaction that is separate from the first interaction, receiving from the person a spoken utterance associated with the indicated intent, wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the person;

processing the spoken utterance of the second interaction using the processing system;

transmitting the processed speech information to a remote data center using a wireless link;

analyzing the transmitted processed speech information to scale and end-point the speech utterance;

based upon the indicated intent, selecting at least one optimal speech recognition engine from a set of speech recognition engines;

converting the analyzed speech information into packet data format;

using an internet-protocol transport network, transporting the packet speech information to the selected at least one optimal speech recognition engine for translation of the converted speech information into text format;

retrieving recognition results and an associated confidence score from the selected at least one optimal speech recognition engine;

continuing an automated dialog with the person if the confidence score meets or exceeds a predetermined threshold for the best match; and

selecting at least one alternative speech recognition engine to translate the converted speech information into text format if the confidence score is low such that it is below a predetermined threshold for the best match.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for implementing a server-based speech recognition system for multi-modal automated interaction in a vehicle includes receiving, by a vehicle driver, audio prompts by an on-board human-to-machine interface and a response with speech to complete tasks such as creating and sending text messages, web browsing, navigation, etc. This service-oriented architecture is utilized to call upon specialized speech recognizers in an adaptive fashion. The human-to-machine interface enables completion of a text input task while driving a vehicle in a way that minimizes the frequency of the driver'"'"'s visual and mechanical interactions with the interface, thereby eliminating unsafe distractions during driving conditions. After the initial prompting, the typing task is followed by a computerized verbalization of the text. Subsequent interface steps can be visual in nature, or involve only sound.

39 Citations

View as Search Results

29 Claims

1. A method for implementing an interactive automated system, comprising:
- using a telematics processing system located in proximity to a person, receiving in a first interaction an indication of intent from the person and, thereafter, in a second interaction that is separate from the first interaction, receiving from the person a spoken utterance associated with the indicated intent, wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the person;
  
  processing the spoken utterance of the second interaction using the processing system;
  
  transmitting the processed speech information to a remote data center using a wireless link;
  
  analyzing the transmitted processed speech information to scale and end-point the speech utterance;
  
  based upon the indicated intent, selecting at least one optimal speech recognition engine from a set of speech recognition engines;
  
  converting the analyzed speech information into packet data format;
  
  using an internet-protocol transport network, transporting the packet speech information to the selected at least one optimal speech recognition engine for translation of the converted speech information into text format;
  
  retrieving recognition results and an associated confidence score from the selected at least one optimal speech recognition engine;
  
  continuing an automated dialog with the person if the confidence score meets or exceeds a predetermined threshold for the best match; and
  
  selecting at least one alternative speech recognition engine to translate the converted speech information into text format if the confidence score is low such that it is below a predetermined threshold for the best match.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the at least one alternative speech recognition engine is agent-assisted.
  - 3. The method of claim 1, wherein the selected at least one optimal speech recognition engine is not local.
  - 4. The method of claim 1, wherein the automated dialog is continued with the person prior to, or subsequent to, receiving the recognition results in an asynchronous manner.
  - 5. The method of claim 1, wherein the automated dialog is continued with the person subsequent to receiving the recognition results in a synchronous manner.
  - 6. The method of claim 1, further comprising logging the packet data and recognition results for subsequent analysis.
  - 7. The method of claim 1, wherein the processing system is located on-board a vehicle.
  - 8. The method of claim 7, wherein vehicle location information is also transported with the packet speech information to the selected at least one optimal speech recognition engine.
  - 9. The method of claim 8, further comprising logging the vehicle location information for subsequent analysis.
  - 10. The method of claim 1, wherein the intent includes at least one of:
    - texting;
      
      browsing;
      
      navigation; and
      
      social networking.
  - 11. The method of claim 1, wherein the indication of intent is non-verbal.

12. A method for implementing an interactive automated system, comprising:
- using a telematics processing system located on-board a vehicle, receiving in a first interaction an indication of intent from a vehicle driver and, thereafter, in a second interaction that is separate from the first interaction, receiving from the vehicle driver a spoken utterance associated with the indicated intent wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the vehicle driver;
  
  processing the spoken utterance of the second interaction using the processing system;
  
  transmitting the processed speech information to a remote data center using a wireless link;
  
  analyzing the transmitted processed speech information to scale and end-point the speech utterance;
  
  based upon the indicated intent, selecting at least one optimal speech recognition engine from a set of speech recognition engines;
  
  converting the analyzed speech information into packet data format;
  
  using an internet-protocol transport network, transporting the packet speech information and vehicle location information to the selected at least one optimal speech recognition for translation of the converted speech information into text format;
  
  retrieving recognition results and an associated confidence score from the selected at least one optimal speech recognition engine;
  
  continuing an automated dialog with the vehicle driver if the confidence score meets or exceeds a pre-determined threshold for the best match; and
  
  selecting at least one alternative speech recognition engine that is agent-assisted to translate the converted speech information into text format if the confidence score is low such that it is below a pre-determined threshold for the best match.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The method of claim 12, wherein the selected at least one optimal speech recognition engine is not local.
  - 14. The method of claim 12, wherein the automated dialog is continued with the vehicle driver prior to, or subsequent to, receiving the recognition results in an asynchronous manner.
  - 15. The method of claim 12, wherein the automated dialog is continued with the vehicle driver subsequent to receiving the recognition results in a synchronous manner.
  - 16. The method of claim 12, wherein the intent includes at least one of:
    - texting;
      
      browsing;
      
      navigation; and
      
      social networking.
  - 17. The method of claim 12, further comprising logging the packet data, recognition results, and vehicle location information for subsequent analysis.
  - 18. The method of claim 12, wherein the indication of intent is non-verbal.

19. An interactive automated speech recognition system, comprising:
- a telematics processing system located in proximity to a person;
  
  a remote data center;
  
  a wireless link that transmits processed speech information from the processing system to the remote data center,wherein the processing system;
  
  receives in a first interaction an indication of intent from the person and, thereafter, in a second interaction that is separate from the first interaction, receives from the person a spoken utterance associated with the indicated intent wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the person; and
  
  processes the spoken utterance of the second interaction and transmits the processed speech information to the remote data center using the wireless link, wherein the transmitted processed speech information is analyzed to scale and end-point the speech utterance and is converted into packet data format;
  
  at least one optimal speech recognition engine selected to translate the converted speech information into text format, the at least one optimal speech recognition engine being selected from a set of speech recognition engines based upon the indicated intent;
  
  an internet protocol transport network that transports the converted speech information to the selected at least one optimal speech recognition engine; and
  
  wherein the at least one optimal speech recognition engine produces recognition results and an associated confidence score and, based upon the confidence score;
  
  an automated dialog is continued with the person if the confidence score meets or exceeds a pre-determined threshold for the best match;
  
  orat least one alternative speech recognition engine is selected to translate the converted speech information into text format if the confidence score is low such that it is below a pre-determined threshold for the best match.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The system of claim 19, wherein the at least one alternative speech recognition engine is agent-assisted.
  - 21. The system of claim 19, wherein the selected at least one optimal speech recognition engine is not local.
  - 22. The system of claim 19, wherein the automated dialog is continued with the person prior to, or subsequent to, receiving the recognition results in an asynchronous manner.
  - 23. The system of claim 19, wherein the automated dialog is continued with the person subsequent to receiving the recognition results in a synchronous manner.
  - 24. The system of claim 19, wherein the packet data and recognition results are logged for subsequent analysis.
  - 25. The system of claim 19, wherein the processing system is located on-board a vehicle.
  - 26. The system of claim 25, wherein vehicle location information is also transported with the packet speech information to the selected at least one optimal speech recognition engine.
  - 27. The system of claim 26, wherein the vehicle location information is logged for subsequent analysis.
  - 28. The system of claim 19, wherein the intent includes at least one of:
    - texting;
      
      browsing;
      
      navigation; and
      
      social networking.
  - 29. The system of claim 19, wherein the indication of intent is non-verbal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sirius XM Connected Vehicle Services Inc. (Liberty Media Corporation)
Original Assignee
Sirius XM Connected Vehicle Services Inc. (Liberty Media Corporation)
Inventors
Schalk, Thomas Barton, Saenz, Leonel, Burch, Barry
Primary Examiner(s)
ADESANYA, OLUJIMI A

Application Number

US12/729,573
Publication Number

US 20100250243A1
Time in Patent Office

2,107 Days
Field of Search

704/270.1
US Class Current

1/1
CPC Class Codes

G06F 16/9537   Spatial or temporal depende...

G06F 3/167   Audio in a user interface, ...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links