Method and Apparatus for Automatically Determining Speaker Characteristics for Speech-Directed Advertising or Other Enhancement of Speech-Controlled Devices or Services

US 20080103761A1
Filed: 10/31/2007
Published: 05/01/2008
Est. Priority Date: 10/31/2002
Status: Active Grant

First Claim

Patent Images

1. A method for automatically determining speaker characteristics, comprising the steps of:

using a spoken utterance to specify an input or action, wherein text that corresponds to said spoken utterance, and its associated meaning or interpretation, comprises primary information conveyed by said utterance;

using said spoken utterance to convey non-text information concerning any of a speaker'"'"'s gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, wherein said non-text information comprises secondary information; and

using said primary information and said secondary information to direct behavior of a controlled system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In addition to conveying primary information, human speech also conveys information concerning the speaker'"'"'s gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, which is referred to as secondary information. Disclosed herein are both the means of automatic discovery and use of such secondary information to direct other aspects of the behavior of a controlled system. One embodiment of the invention comprises an improved method to determine, with high reliability, the gender of an adult speaker. A further embodiment of the invention comprises the use of this information to display a gender-appropriate advertisement to the user of an information retrieval system that uses a cell phone as the input and output device. The invention is not limited to gender and such secondary information can include, for example, any of information concerning the speaker'"'"'s age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics.

210 Citations

13 Claims

1. A method for automatically determining speaker characteristics, comprising the steps of:
- using a spoken utterance to specify an input or action, wherein text that corresponds to said spoken utterance, and its associated meaning or interpretation, comprises primary information conveyed by said utterance;
  
  using said spoken utterance to convey non-text information concerning any of a speaker'"'"'s gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, wherein said non-text information comprises secondary information; and
  
  using said primary information and said secondary information to direct behavior of a controlled system.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein said step of using said primary information and said secondary information determines gender of a speaker.
  - 3. The method of claim 2, further comprising the step of:
    - displaying a gender-appropriate advertisement to said speaker.

4. An apparatus for automatically determining speaker characteristics, comprising:
- a speech input device;
  
  a primary information extraction module for receiving utterances from said speech input device and comprising an automatic speech recognition system module (ASR);
  
  a secondary information extraction module for receiving utterance from said speech input device and comprising an automatic speech characteristics module (ASC) that estimates or extracts explicit or implicit speech indicators of interest; and
  
  a controlled system for using primary and secondary information extracted, respectively, by said ASR and ASC, to produce a system action or response as a system output.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The apparatus of claim 4, wherein said controlled system uses said primary and secondary information for any of performing searches, vending products or services to customers, and/or reporting customer behavior and/or independent sources of customer personal characteristics or information.
  - 6. The apparatus of claim 4, said secondary information comprising any of:
    - gender, age, socioeconomic status, accent, language spoken, emotional state, phone rate (speech rate), power spectral density estimates, or other personal characteristics;
      
      wherein said secondary information is either categorical or continuous, may include merely presumed or possible informants of desired personal characteristics, and may also contribute to, or make use of, primary information.
  - 7. The apparatus of claim 4, said primary information extraction module further comprising:
    - a meaning extraction/semantic analysis/natural language understanding (NLU) module.
  - 8. The apparatus of claim 4, further comprising:
    - a non-speech input that is provided to said controlled system.

9. An apparatus for automatically associating speaker characteristics with speaker behaviors, comprising:
- a speech input device;
  
  a secondary information extraction module for receiving utterance from said speech input device and comprising an automatic speech characteristics module (ASC) that estimates or extracts explicit or implicit speech indicators of interest; and
  
  a learning module for recording both said secondary information and user behavior, and for analyzing said secondary information and user behavior to determine relationships between speech and said behavior, and/or speech and speaker personal characteristics.
- View Dependent Claims (10)
- - 10. The apparatus of claim 9, said learning module analyzing said secondary information and user behavior, and/or said secondary information and speaker personal characteristics, by means that comprise any of:
    - multivariate correlation analysis and linear regression;
      
      mutual information statistics;
      
      clustering and decision trees;
      
      perceptrons, neural networks, support vector machines, and linear classifiers;
      
      and/or other methods for statistical analysis and/or data mining, for the discovery of relations between said secondary information and user behavior, and/or said secondary information and speaker personal characteristics.

11. A method for gender classification based upon speech, comprising the steps of:
- processing an utterance (utt) as a sequence of frames;
  
  classifying each frame of speech as voiced (V), unvoiced (U), or silence (S), with unvoiced and silence frames discarded;
  
  using an autocorrelation algorithm to extract a pitch estimate for every frame in the utterance;
  
  to obtain an estimate for the utt'"'"'s pitch frequency (F₀), histograming the F₀values for the frames in the utt, and selecting a greatest peak, or mode of the histogram; and
  
  comparing pitch with a threshold to decide if the speech is from a male or female speaker.

12. A method for gender classification based upon speech, comprising the steps of:
- performing center-clipping on each of a plurality of signal frames;
  
  keeping a first half of a Fast Fourier Transform (FFT)-size buffer of a resulting center-clipped frame and zeroing out a second half of said buffer;
  
  taking a forward FFT;
  
  computing a squared magnitude of said FFT;
  
  taking an inverse FFT of a squared magnitude spectrum to effect frame autocorrelation;
  
  searching for a highest peak in said autocorrelation;
  
  classifying each frame of speech as voiced (V), unvoiced (U), or silence (S), with unvoiced and silence frames discarded;
  
  if voiced, finding pitch for said frame from the peak'"'"'s position;
  
  incorporating the pitch in a histogram;
  
  determining pitch for an entire utterance by employing a histogram method; and
  
  comparing pitch with a threshold to decide if the speech is from a male or female speaker.

13. A method for discriminating between adults and children by spoken utterances, comprising the steps of:
- analyzing a given utterance on a frame-by-frame basis;
  
  classifying each frame of speech as voiced (V), unvoiced (U), or silence (S), with unvoiced and silence frames discarded;
  
  providing two probability distribution functions (pdfs) of a distribution of spectral peaks, one for adults, one for children;
  
  dividing an utterance in question into frames, where unvoiced and silence frames are discarded;
  
  for each frame, computing a Hamming-windowed FFT, and then computing a squared magnitude of each FFT coefficient;
  
  finding a spectral peak index of a maximum of said coefficients within said FFT;
  
  from a sequence of spectral peak indices, computing two quantities $\begin{matrix} \log P_{utterance} (adult) = \sum_{1 \leq i \leq N} \log p_{A} (z_{i}) & (1) \\ \log P_{utterance} (child) = \sum_{1 \leq i \leq N} \log p_{C} (z_{i}) . & (2) \end{matrix}$ from said two values, computing their difference
  Δ
  
  =log P_utterance(child)−
  
  log P_utterance(adult).
  
  (3) comparing said difference quantity with an experimentally determined threshold to classify said utterance;
  
  child if the quantity exceeds a threshold, adult if it does not.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Promptu Systems Corporation
Original Assignee
Promptu Systems Corporation
Inventors
Printz, Harry, Gulati, Vikas

Granted Patent

US 8,793,127 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 16/95   Retrieval from the web

G06F 16/9535   Search customisation based ...

G06Q 30/02   Marketing; Price estimation...

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 17/26   Recognition of special voic...

G10L 2015/025   Phonemes, fenemes or fenone...

Method and Apparatus for Automatically Determining Speaker Characteristics for Speech-Directed Advertising or Other Enhancement of Speech-Controlled Devices or Services

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

210 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Method and Apparatus for Automatically Determining Speaker Characteristics for Speech-Directed Advertising or Other Enhancement of Speech-Controlled Devices or Services

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

210 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others