Communications device responsive to spoken commands and methods of using same

US 5,749,072 A
Filed: 12/28/1995
Issued: 05/05/1998
Est. Priority Date: 06/03/1994
Status: Expired due to Fees

First Claim

Patent Images

1. A communications device, comprising:

an interface for allowing a user to access a communications channel according a control signal; and

a speech-recognition system for producing the control signal in response to a spoken command, the speech-recognition system including;

a feature extractor for extracting a plurality of features from the spoken command; and

a classifier for generating a discriminant signal according to a polynomial expansion having a form ##EQU4## wherein x_j represents the plurality of features, y represents the discriminant signal, w_i represents a coefficient, g_ji represents an exponent, and i, j, m and n are integers;

wherein the control signal is based on the discriminant signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A communications device (20) that is responsive to voice commands is provided. The communications device (20) can be a two-way radio, cellular telephone, PDA, or pager. The communications device (20) includes an interface (22) for allowing a user to access a communications channel according a control signal and a speech-recognition system (24) for producing the control signal in response to a voice command. Included in the speech recognition system (24) are a feature extractor (26) and one or more classifiers (28) utilizing polynomial discriminant functions.

352 Citations

24 Claims

1. A communications device, comprising:
- an interface for allowing a user to access a communications channel according a control signal; and
  
  a speech-recognition system for producing the control signal in response to a spoken command, the speech-recognition system including;
  
  a feature extractor for extracting a plurality of features from the spoken command; and
  
  a classifier for generating a discriminant signal according to a polynomial expansion having a form ##EQU4## wherein x_j represents the plurality of features, y represents the discriminant signal, w_i represents a coefficient, g_ji represents an exponent, and i, j, m and n are integers;
  
  wherein the control signal is based on the discriminant signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The communications device of claim 1, wherein the polynomial expansion has a form ##EQU5## wherein a₀ represents a zero-order coefficient, b_i represents a first-order coefficient, and c_ij represents a second-order coefficient.
  - 3. The communications device of claim 1, wherein the interface includes a device selected from a group consisting of:
    - a two-way radio, a telephone, a PDA, and a pager.
  - 4. The communications device of claim 1, wherein the spoken command is a word selected from a group consisting of a digit between 0-9, "page", "send", and "help".
  - 5. The communications device of claim 1, wherein the speech-recognition system further comprises:
    - a pre-processor, operatively associated with the feature extractor, for transforming an audio signal using signal processing techniques into a sequence of data vectors that represent the spoken command and from which the plurality of features are extracted.
  - 6. The communications device of claim 1, wherein the plurality of features are selected from a group consisting of:
    - cepstral coefficients, first-order derivatives of cepstral coefficients, and word-level features.

7. A communications device, comprising:
- a pre-processor for transforming an audio signal into a sequence of data vectors;
  
  extraction means for extracting a plurality of feature frames from the sequence of data vectors;
  
  a plurality of classifiers for generating a plurality of discriminant signals, each of the plurality of classifiers designating a different spoken command and generating a discriminant signal according to a polynomial expansion having a form ##EQU6## wherein x_j represents a feature frame, y represents the discriminant signal, w_i represents a coefficient, g_ji represents an exponent, and i, j, m and n are integers;
  
  an accumulator for generating a plurality of accumulated discriminant signals, the accumulator generating each of the plurality of accumulated discriminant signals by summing ones of the plurality of discriminant signals produced by a respective one of the plurality of classifiers;
  
  a selector for selecting a largest accumulated discriminant signal from the plurality of accumulated discriminant signals; and
  
  a two-way audio interface for transmitting and receiving data across a communications channel according a control signal, the control signal being a function of the largest accumulated discriminant signal.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The communications device of claim 7, wherein the extraction means includes:
    - a feature extractor for extracting a sequence of feature frames from the sequence of data vectors; and
      
      a speech activity detector for selecting from the sequence of feature frames the plurality of feature frames representing a spoken command.
  - 9. The communications device of claim 7, wherein the extraction means includes:
    - a speech activity detector for selecting from the sequence of data vectors a vector sub-sequence representing a spoken command; and
      
      a feature extractor for extracting a plurality of feature frames from the vector sub-sequence.
  - 10. The communications device of claim 7, wherein the polynomial expansion has a form ##EQU7## wherein a₀ represents a zero-order coefficient, b_i represents a first-order coefficient, and c_ij represents a second-order coefficient.
  - 11. The communications device of claim 7, wherein the two-way audio interface includes a device selected from a group consisting of:
    - a two-way radio, a telephone, a PDA, and a pager.
  - 12. The communications device of claim 7, wherein the audio signal represents a spoken command selected from a group consisting of a digit between 0-9, "page", "send", and "help".
  - 13. The communications device of claim 7, wherein each of the plurality of feature frames includes a plurality of features selected from a group consisting of:
    - cepstral coefficients, first-order derivatives of cepstral coefficients, and word-level features.

14. A two-way handheld communications device, comprising:
- a microphone for generating an audio signal;
  
  an A/D converter for digitizing the audio signal to produce a digitized audio signal;
  
  a pre-processor for transforming the digitized audio signal into a sequence of data vectors;
  
  a speech activity detector for producing a vector sub-sequence representing a spoken command, the speech activity detector continuously receiving the sequence of data vectors and including in the vector sub-sequence those of the sequence of data vectors having an energy-level that exceeds a background noise threshold;
  
  a feature extractor for extracting a sequence of feature frames from the vector sub-sequence;
  
  a plurality of classifiers for generating a plurality of discriminant signals, each of the plurality of classifiers designating a different spoken command and generating a discriminant signal according to a polynomial expansion having a form ##EQU8## wherein x_j represents a feature frame, y represents the discriminant signal, w_i represents a coefficient, g_ji represents an exponent, and i, j, m and n are integers;
  
  a plurality of accumulators for generating a plurality of accumulated discriminant signals, each of the accumulators summing ones of the plurality of discriminant signals produced by a respective one of the plurality of classifiers;
  
  a selector for selecting a largest accumulated discriminant signal from the plurality of accumulated discriminant signals; and
  
  a two-way audio interface for transmitting and receiving data across a radio channel according a control signal, the control signal being a function of the largest accumulated discriminant signal.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The two-way handheld communications device of claim 14, wherein the polynomial expansion has a form ##EQU9## wherein a₀ represents a zero-order coefficient, b_i represents a first-order coefficient, and c_ij represents a second-order coefficient.
  - 16. The two-way handheld communications device of claim 14, wherein the two-way audio interface includes a device selected from a group consisting of:
    - a two-way radio, a telephone, a PDA, and a pager.
  - 17. The two-way handheld communications device of claim 14, wherein the spoken command is a word selected from a group consisting of a digit between 0-9, "page", "send", and "help".
  - 18. The two-way handheld communications device of claim 14, wherein the speech activity detector detects boundaries of the spoken command by determining energy-level transitions across the background noise threshold.
  - 19. The two-way handheld communications device of claim 18, wherein the speech activity detector associates an end-of-word boundary with a negative energy-level transition if the energy-level remains below the background noise threshold during a subsequent predetermined interval.

20. A method for controlling access to a communications channel, comprising the following steps:
- receiving a spoken command;
  
  extracting a plurality of features from the spoken command;
  
  generating a discriminant signal based on a polynomial expansion having a form ##EQU10## wherein x_j represents the plurality of features, y represents the discriminant signal, w_i represents a coefficient, g_ji represents an exponent, and i, j, m and n are integers; and
  
  accessing the communications channel according the discriminant signal.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The method of claim 20, wherein the step of generating includes the following sub-step:
    - basing the discriminant signal on a second-order polynomial expansion having a form ##EQU11## wherein a₀ represents a zero-order coefficient, b_i represents a first-order coefficient, and c_ij represents a second-order coefficient.
  - 22. The method of claim 20, further comprising the following step:
    - selecting the spoken command from a group consisting of a digit between 0-9, "page", "send", and "help".
  - 23. The method of claim 20, further comprising the step of:
    - transforming an audio signal using signal processing techniques into a sequence of data vectors that represent the spoken command and from which the plurality of features are extracted.
  - 24. The method of claim 20, wherein the step of extracting includes the following sub-step:
    - generating the plurality of features selected from a group consisting of;
      
      cepstral coefficients, first-order derivatives of cepstral coefficients, and word-level features.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Levendel, Gil E., Wang, Shay-Ping Thomas, Mazurkiewicz, Theodore
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Dorvil, Richemond

Application Number

US08/579,714
Time in Patent Office

859 Days
Field of Search

395/2.41, 395/2.79, 395/2.84, 395/2.62, 395/2.6, 395/2.64, 395/2.63, 395/2.65, 395/2.66, 395/2.49, 704/275, 704/245, 704/270, 704/253, 704/255, 704/251, 704/254, 704/256, 704/257, 704/239, 704/240
US Class Current

704/275
CPC Class Codes

G06F 18/2453   non-linear, e.g. polynomial...

G06N 3/045   Combinations of networks

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/10   using distance or distortio...

G10L 15/16   using artificial neural net...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

Communications device responsive to spoken commands and methods of using same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

352 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Communications device responsive to spoken commands and methods of using same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

352 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links