Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same
First Claim
1. A method for implementing an interactive automated system, comprising:
- using a telematics processing system located in proximity to a person, receiving in a first interaction an indication of intent from the person and, thereafter, in a second interaction that is separate from the first interaction, receiving from the person a spoken utterance associated with the indicated intent, wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the person;
processing the spoken utterance of the second interaction using the processing system;
transmitting the processed speech information to a remote data center using a wireless link;
analyzing the transmitted processed speech information to scale and end-point the speech utterance;
based upon the indicated intent, selecting at least one optimal speech recognition engine from a set of speech recognition engines;
converting the analyzed speech information into packet data format;
using an internet-protocol transport network, transporting the packet speech information to the selected at least one optimal speech recognition engine for translation of the converted speech information into text format;
retrieving recognition results and an associated confidence score from the selected at least one optimal speech recognition engine;
continuing an automated dialog with the person if the confidence score meets or exceeds a predetermined threshold for the best match; and
selecting at least one alternative speech recognition engine to translate the converted speech information into text format if the confidence score is low such that it is below a predetermined threshold for the best match.
8 Assignments
0 Petitions
Accused Products
Abstract
A system and method for implementing a server-based speech recognition system for multi-modal automated interaction in a vehicle includes receiving, by a vehicle driver, audio prompts by an on-board human-to-machine interface and a response with speech to complete tasks such as creating and sending text messages, web browsing, navigation, etc. This service-oriented architecture is utilized to call upon specialized speech recognizers in an adaptive fashion. The human-to-machine interface enables completion of a text input task while driving a vehicle in a way that minimizes the frequency of the driver'"'"'s visual and mechanical interactions with the interface, thereby eliminating unsafe distractions during driving conditions. After the initial prompting, the typing task is followed by a computerized verbalization of the text. Subsequent interface steps can be visual in nature, or involve only sound.
39 Citations
29 Claims
-
1. A method for implementing an interactive automated system, comprising:
-
using a telematics processing system located in proximity to a person, receiving in a first interaction an indication of intent from the person and, thereafter, in a second interaction that is separate from the first interaction, receiving from the person a spoken utterance associated with the indicated intent, wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the person; processing the spoken utterance of the second interaction using the processing system; transmitting the processed speech information to a remote data center using a wireless link; analyzing the transmitted processed speech information to scale and end-point the speech utterance; based upon the indicated intent, selecting at least one optimal speech recognition engine from a set of speech recognition engines; converting the analyzed speech information into packet data format; using an internet-protocol transport network, transporting the packet speech information to the selected at least one optimal speech recognition engine for translation of the converted speech information into text format; retrieving recognition results and an associated confidence score from the selected at least one optimal speech recognition engine; continuing an automated dialog with the person if the confidence score meets or exceeds a predetermined threshold for the best match; and selecting at least one alternative speech recognition engine to translate the converted speech information into text format if the confidence score is low such that it is below a predetermined threshold for the best match. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for implementing an interactive automated system, comprising:
-
using a telematics processing system located on-board a vehicle, receiving in a first interaction an indication of intent from a vehicle driver and, thereafter, in a second interaction that is separate from the first interaction, receiving from the vehicle driver a spoken utterance associated with the indicated intent wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the vehicle driver; processing the spoken utterance of the second interaction using the processing system; transmitting the processed speech information to a remote data center using a wireless link; analyzing the transmitted processed speech information to scale and end-point the speech utterance; based upon the indicated intent, selecting at least one optimal speech recognition engine from a set of speech recognition engines; converting the analyzed speech information into packet data format; using an internet-protocol transport network, transporting the packet speech information and vehicle location information to the selected at least one optimal speech recognition for translation of the converted speech information into text format; retrieving recognition results and an associated confidence score from the selected at least one optimal speech recognition engine; continuing an automated dialog with the vehicle driver if the confidence score meets or exceeds a pre-determined threshold for the best match; and selecting at least one alternative speech recognition engine that is agent-assisted to translate the converted speech information into text format if the confidence score is low such that it is below a pre-determined threshold for the best match. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. An interactive automated speech recognition system, comprising:
-
a telematics processing system located in proximity to a person; a remote data center; a wireless link that transmits processed speech information from the processing system to the remote data center, wherein the processing system; receives in a first interaction an indication of intent from the person and, thereafter, in a second interaction that is separate from the first interaction, receives from the person a spoken utterance associated with the indicated intent wherein the spoken utterance associated with the indicated intent matches intended text of a type of intent available to the person; and processes the spoken utterance of the second interaction and transmits the processed speech information to the remote data center using the wireless link, wherein the transmitted processed speech information is analyzed to scale and end-point the speech utterance and is converted into packet data format; at least one optimal speech recognition engine selected to translate the converted speech information into text format, the at least one optimal speech recognition engine being selected from a set of speech recognition engines based upon the indicated intent; an internet protocol transport network that transports the converted speech information to the selected at least one optimal speech recognition engine; and wherein the at least one optimal speech recognition engine produces recognition results and an associated confidence score and, based upon the confidence score; an automated dialog is continued with the person if the confidence score meets or exceeds a pre-determined threshold for the best match;
orat least one alternative speech recognition engine is selected to translate the converted speech information into text format if the confidence score is low such that it is below a pre-determined threshold for the best match. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification