Apparatus and methods for implementing voice enabling applications in a coverged voice and data network environment

US 20040107108A1
Filed: 08/25/2003
Published: 06/03/2004
Est. Priority Date: 02/26/2001
Status: Active Grant

First Claim

Patent Images

1. A method of implementing voice-enabled applications in a converged voice and data network environment, the method comprising of the steps:

a. entering human voice data into the converged voice and data network;

b. converting the voice data into a digital format for speech processing at a later time;

c. providing a control mechanism to interface at least one network-based voice enabled application with at least one speech processing function using a speech application interface;

d. performing at least one speech processing function using the voice of the end user; and

e. taking an action with respect to the end user depending on the result from the speech processing of the voice data, said action being taken from the group consisting of allowing the end user to enter into a secured transaction, awarding the end user with a prize, penalizing the end user by disallowing entry, and providing feedback to the end user to communicate the result.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Human speech is transported through a voice and data converged Internet network to recognize its content, verify the ifentity of the speaker, or to verify the content of a spoken phrase by utilizing the Internet protocol to transmit voice packets. The voice data (4) entered is processed and transmitted in the same way as Internet data packets over converged voice and data IP networks. A voice-enabled application isends a message (5), which is decoded by the speech API (2) and the appriopriate control and synchronization information is issued (7) to the data preparation module (9) and to the speech engine (3). Standard voice over IP includes a speech compression algorithm and the use of RTP (Real Time Protocol), enabling additional processing of the human voice anywhere in the network to perform speaker verification, with or without the knowledge of the speaker.

Citations

17 Claims

1. A method of implementing voice-enabled applications in a converged voice and data network environment, the method comprising of the steps:
- a. entering human voice data into the converged voice and data network;
  
  b. converting the voice data into a digital format for speech processing at a later time;
  
  c. providing a control mechanism to interface at least one network-based voice enabled application with at least one speech processing function using a speech application interface;
  
  d. performing at least one speech processing function using the voice of the end user; and
  
  e. taking an action with respect to the end user depending on the result from the speech processing of the voice data, said action being taken from the group consisting of allowing the end user to enter into a secured transaction, awarding the end user with a prize, penalizing the end user by disallowing entry, and providing feedback to the end user to communicate the result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said step of entering the human voice data into the network is performed by entering a human voice from a telephone device, transmitting the voice through the Public Switched Telephone Network and is converting into Voice over Internet Protocol (VoIP) packets by a VoIP Gateway.
  - 3. The method of claim 1, wherein said step of entering the human voice data into the network is performed by entering a human voice signal from a telephone device without the voice going through a Public Switched Telephone Network and converting the voice signal into VoIP packets by an Internet access device.
  - 4. The method of claim 1, wherein said step of entering the human voice data into the network is performed by entering a human voice signal from a computer connected to the Internet using a microphone connected to a sound card or a USB device.
  - 5. The method of claim 1, wherein said step of entering the human voice data into the network is performed by entering a human voice signal from specialized hardware that functions as a combination of an Internet computer and an Internet access device
  - 6. The method of claim 5, further comprising performing at least one additional function selected from the group consisting of a dial up modem, a DSL modem, a cable modem, said function being used to enter the human voice data into the Internet.
  - 7. The method of claim 1, wherein said step of entering the human voice data into the network is accomplished by a wireless connection selected from the group consisting of a mobile phone, a hand held device, a pocket PC, a personal tablet device, and a specialized hardware for a wireless Internet connection which may or may not perform additional functions.
  - 8. The method of claim 1, wherein said step of providing a control mechanism to interface the network voice-enabled application with at least one speech processing function using a speech application interface, further comprises using a command and response message-based protocol for communication between the speech processing function and the converged voice and data network.
  - 9. The method of claim 1, whereas in said step of performing at least one speech processing function using the voice of the end user is accomplished in a non-intrusive manner in a way that is transparent to the end user.

10. A method of implementing voice-enabled applications in a converged voice and data network environment, comprising:
- a. entering human voice data from a speaker into the converged voice and data network for later processing and acoustic matching;
  
  b. non-intrusively processing the voice data from an end user while the user is speaking into the voice and data network;
  
  C. preparing the voice data for the speech processing function with a front-end processing module to separate the pauses in the speech from the voice data and utilizing a voice feature extraction module to ready the speech for a processing algorithm from a speech engine;
  
  d. synchronizing generated control information with a data exchange between a data preparation module and the speech engine through a speech application interface;
  
  e. processing the voice features by the speech engine to perform at least one combination of the speech processing and pattern recognition algorithms implemented by the speech engine. f. providing feedback to the end user to communicate the result from the speech processing; and
  
  g. taking an action responsive to the result from the processing of the voice.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method of claim 10, wherein the human voice is entered by a data preparation module implemented on a device selected from the group consisting of a computer, and a specialized hardware, for buffering the data and passing it directly to the data preparation module without transmission over the network and a speech engine implemented as a part of a centralized server.
  - 12. The method of claim 10, further comprising the implementation of a network speech appliance on the computer, or on the specialized hardware, with the speech application interface communicating with the voice-enabled application on the network using a message-based mechanism.
  - 13. The method of claim 10, further comprising the implementation of the network speech appliance on a wireless device with the speech application interface communicating with the voice-enabled application on the network using the message-based mechanism.
  - 14. The method of claim 10, wherein said step of taking an action depending on the result from the speech processing of the voice further comprises:
    - a. a dialog with the user with the aim of acquiring additional voice data from the user, if the first voice data entry does not secure sufficient accuracy from the speech processing algorithms;
      
      b. a dialog with the user with the aim of acquiring additional voice data from the user in order to perform additional speech processing using a different algorithm from the speech engine for a separate purpose if desired;
      
      c. indicating to the user the result of the speech processing operation; and
      
      d. repeating steps a, b and c above, if the voice enabled application requires it.
  - 15. The method of claim 10, wherein the step of providing the feedback to the end user is accomplished by a method selected from the group consisting of:
    - a. using a graphical interface on a screen of device selected from the group consisting of a computer, a mobile phone, a personal handled device and a specialized hardware, said graphical interface for displaying written messages as prompts or results from the speech processing operation;
      
      b. using audio feedback by playing the dialog messages to the end user over a telephone device; and
      
      c. using a combination of a graphical interface from a computer screen with an audio feedback from a VoIP telephone connection.
  - 16. The method of claim 10, wherein said step of processing the voice features by the speech engine further may be accomplished by performing a function selected from the group consisting of training a voiceprint for a new user, performing voice ID verification or identification of a user, performing speech recognition, and performing spoken text verification.
  - 17. The method of claim 16, wherein a combination of the steps can be implemented simultaneously using the same sample of the voice of the user by performing voice ID verification on a prompted text that varies with each prompt and simultaneously performing spoken text verification to ensure the end user is saying the asked-for sequence of words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Elizabeth A. Rohwer
Original Assignee
Elizabeth A. Rohwer
Inventors
Rohwer, Elizabeth A

Granted Patent

US 7,805,310 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/275
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

H04M 2201/41   using speaker recognition

H04M 2201/42   Graphical user interfaces

H04M 7/006   Networks other than PSTN/IS...

Apparatus and methods for implementing voice enabling applications in a coverged voice and data network environment

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and methods for implementing voice enabling applications in a coverged voice and data network environment

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links