System and method for detection and analysis of speech

US 8,078,465 B2
Filed: 01/23/2008
Issued: 12/13/2011
Est. Priority Date: 01/23/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

capturing an audio recording from a language environment of a key child,segmenting the audio recording into a plurality of segments;

identifying a segment ID for each of the plurality of segments, the segment ID identifying a source for audio in the segment, wherein segmenting the audio recording into the plurality of segments and identifying the segment ID for each of the plurality of segments comprises using a Minimum Duration Gaussian Mixture Model (MD-GMM), and wherein the segments identified using the MD-GMM are at least a minimum duration D, and any segments with a duration longer than 2*D are broken down into several segments with a duration between D and 2*D;

estimating key child segment characteristics based in part on at least one of the plurality of key child segments, wherein the key child segment characteristics are estimated independent of content of the plurality of key child segments, wherein the content is the meaning of the plurality of key child segments;

determining at least one metric associated with the language environment using the key child segment characteristics; and

outputting the at least one metric.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Certain aspects and embodiments of the present invention are directed to systems and methods for monitoring and analyzing the language environment and the development of a key child. A key child'"'"'s language environment and language development can be monitored without placing artificial limitations on the key child'"'"'s activities or requiring a third party observer. The language environment can be analyzed to identify words, vocalizations, or other noises directed to or spoken by the key child, independent of content. The analysis can include the number of responses between the child and another, such as an adult and the number of words spoken by the child and/or another, independent of content of the speech. One or more metrics can be determined based on the analysis and provided to assist in improving the language environment and/or tracking language development of the key child.

93 Citations

View as Search Results

26 Claims

1. A method comprising:
- capturing an audio recording from a language environment of a key child,segmenting the audio recording into a plurality of segments;
  
  identifying a segment ID for each of the plurality of segments, the segment ID identifying a source for audio in the segment, wherein segmenting the audio recording into the plurality of segments and identifying the segment ID for each of the plurality of segments comprises using a Minimum Duration Gaussian Mixture Model (MD-GMM), and wherein the segments identified using the MD-GMM are at least a minimum duration D, and any segments with a duration longer than 2*D are broken down into several segments with a duration between D and 2*D;
  
  estimating key child segment characteristics based in part on at least one of the plurality of key child segments, wherein the key child segment characteristics are estimated independent of content of the plurality of key child segments, wherein the content is the meaning of the plurality of key child segments;
  
  determining at least one metric associated with the language environment using the key child segment characteristics; and
  
  outputting the at least one metric.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13)
- - 2. The method of claim 1, further comprising:
    - identifying a plurality of adult segments from the plurality of segments, each of the plurality of adult segments having the adult as the segment ID;
      
      estimating adult segment characteristics based in part on at least one of a plurality of adult segments, wherein the adult segment characteristics are estimated independent of content of the plurality of adult segments; and
      
      wherein determining at least one metric associated with the language environment comprises using the adult segment characteristics.
  - 3. The method of claim 2 wherein adult segment characteristics comprise at least one of:
    - a word count;
      
      a duration of speech;
      
      a vocalization count; and
      
      a parentese count.
  - 4. The method of claim 2 wherein the at least one metric comprises at least one of:
    - number of key child vocalizations in a pre-set time period;
      
      number of conversational turns, wherein the conversational turns comprise a sound from one of the adult or key child and a response to the sound from one of the adult or key child; and
      
      number of adult words directed to the key child in a pre-set time period.
  - 5. The method of claim 1 wherein using the MD-GMM comprises:
    - performing a first segmentation and a first segment ID using a first MD-GMM, the first MD-GMM comprising a plurality of models;
      
      generating a second MD-GMM by modifying at least one of the plurality of models; and
      
      segmenting the audio recording into the plurality of segments and identifying the segment ID for each of the plurality of segments using the second MD-GMM.
  - 6. The method of claim 5 wherein the plurality of models comprises a key child model, an electronic device model, and an adult model, wherein:
    - the key child model comprises criteria associated with sounds from a child;
      
      the electronic device model comprises criteria associated with sounds from an electronic device; and
      
      the adult model comprises criteria associated with sounds from adults.
  - 7. The method of claim 6, further comprising at least one of:
    - modifying the key child model using an age-dependent key child model, wherein the age-dependent key child model comprises criteria associated with sounds from children of a plurality of ages;
      
      modifying the electronic device model;
      
      modifying at least one of the key child model and the adult model using a loudness/clearness detection model, wherein the loudness/clearness detection model comprises a Likelihood Ratio Test; and
      
      modifying at least one of the key child model and the adult model using a parentese model, wherein the parentese model comprises complexity levels associated with sounds of adults.
  - 8. The method of claim 1, further comprising:
    - classifying each of the plurality of key child segments into one of;
      
      vocalizations;
      
      cries;
      
      vegetative sounds; and
      
      fixed signal sounds; and
      
      wherein the key child segment characteristics are estimated using only key child segments classified into at least one of vocalizations and cries.
  - 9. The method of claim 8 wherein classifying each of the plurality of key child segments comprises using at least one of rule-based analysis and statistical processing.
  - 10. The method of claim 1 wherein key child segment characteristics comprise at least one of:
    - duration of cries;
      
      number of squeals;
      
      number of growls;
      
      presence of canonical syllables;
      
      number of canonical syllables;
      
      presence of repetitive babbles;
      
      number of repetitive babbles;
      
      presence of protophones;
      
      number of protophones;
      
      duration of protophones;
      
      presence of phoneme-like sounds;
      
      number of phoneme-like sounds;
      
      duration of phoneme-like sounds;
      
      presence of phonemes;
      
      number of phonemes;
      
      duration of phonemes;
      
      word count; and
      
      vocalization count.
  - 12. The method of claim 1 wherein the MD-GMM comprises a key child model;
    - modifying the key child model using an age-dependent key child model; and
      
      wherein segmenting the audio recording into the plurality of segments and identifying the segment ID for at least one of the plurality of segments using the MD-GMM comprises using the MD-GMM comprising the modified key child model.
  - 13. The method of claim 12 wherein the age-dependent key child model comprises:
    - a first model group comprising characteristics of sounds of children of a first age; and
      
      a second model group comprising characteristics of sounds of children of a second age.

11. A method comprising:
- capturing an audio recording from a language environment of a key child;
  
  segmenting the audio recording into a plurality of segments and identifying a segment ID for at least one of the plurality of segments using a Minimum Duration Gaussian Mixture Model (MD-GMM), the segment ID identifying a key child), wherein the segments identified using the MD-GMM are at least a minimum duration D, and any segments with a duration longer than 2*D are broken down into several segments with a duration between D and 2*D;
  
  estimating key child segment characteristics based in part on the at least one of the plurality of segments, wherein the key child segment characteristics are estimated independent of content of the plurality of segments, wherein the content is the meaning of the plurality of key child segments;
  
  determining at least one metric associated with the language environment using the key child segment characteristics; and
  
  outputting the at least one metric, wherein the key child characteristics comprise a number of vowels and a number of consonants in the at least one of the plurality of segments, wherein the determining at least one metric associated with the language environment using the key child segment characteristics comprises comparing the number of vowels an number of consonants in the at least one of the plurality of segments to attributes associated with a native language of the key child to determine a total number of words spoken by the key child.

14. A system comprising:
- a recorder adapted to capture audio recordings from a language environment of a key child;
  
  a processor-based device, wherein the recorder provides the audio recordings to the processor-based device, and the processor-based device comprising an application having an audio engine adapted to segment the audio recording into a plurality of segments and identify a segment ID for each of the plurality of segments, wherein at least one of the plurality of segments is associated with a key child segment ID, wherein the audio engine segments the audio recording and identifies a segment ID for each of the plurality of segments using a Minimum Duration Gaussian Mixture Model (MD-GMM), and wherein the segments identified using the MD-GMM are at least a minimum duration D, and any segments with a duration longer than 2*D are broken down into several segments with a duration between D and 2*D, the audio engine being further adapted to;
  
  estimate key child segment characteristics based on the at least one of the plurality of segments, wherein the audio engine estimates key child segment characteristics independent of content of the at least one of the plurality of segments, wherein the content is the meaning of the plurality of key child segments;
  
  determine at least one metric associated with the language environment using the key child segment characteristics; and
  
  output the at least one metric to an output device.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The system of claim 14 wherein the audio engine uses the MD-GMM by:
    - performing a first segmentation and a first segment ID using a first MD-GMM, the first MD-GMM comprising a plurality of models;
      
      generating a second MD-GMM by modifying at least one of the plurality of models; and
      
      segmenting the audio recording into the plurality of segments and identifying the segment ID for each of the plurality of segments using the second MD-GMM.
  - 16. The system of claim 15 wherein the plurality of models comprise a key child model, an electronic device model, and an adult model.
  - 17. The system of claim 16, further comprising at least one of:
    - the audio engine adapted to modify the key child model using an age-dependent key child model, the age-dependent key child model comprising;
      
      a first model group comprising characteristics of sounds of children of a first age; and
      
      a second model group comprising characteristics of sounds of children of a second age;
      
      the audio engine adapted to modify the electronic device model, the electronic device model comprising criteria associated with sounds generated by an electronic device;
      
      the audio engine adapted to modify at least one of the key child model and the adult model using a loudness/clearness detection model, the loudness/clearness detection model comprising a Likelihood Ratio Test; and
      
      the audio engine adapted to modify at least one of the key child model and the adult model using a parentese model, the parentese model comprising a complexity level of speech associated with adult sounds.
  - 18. The system of claim 14 wherein the audio engine uses the MD-GMM by:
    - scoring each of the plurality of segments using log-likelihood scoring and a plurality of models; and
      
      analyzing the scored plurality of segments to assign the segment ID to each of the plurality of segments.
  - 19. The system of claim 14 wherein the MD-GMM comprises a plurality of models, each model comprising criteria associated with sounds and sources of sounds, the plurality of models comprising at least one of:
    - a key child model comprising criteria associated with sounds from the key child;
      
      an adult model comprising criteria associated with sounds from an adult;
      
      a noise model comprising criteria associated with sounds attributable to noise;
      
      an electronic device model comprising criteria associated with sounds from an electronic device;
      
      an other child model comprising criteria associated with sounds from a child other than the key child;
      
      an age-dependent key child model comprising criteria associated with sounds from key children of a plurality of ages; and
      
      a parentese model comprising a complexity level of characteristics of sounds of adults.
  - 20. The system of claim 19 wherein the audio engine is adapted to:
    - use the other child model to identify at least one of the plurality of segments comprising sounds from a child other than the key child; and
      
      assign an other child segment ID to the identified at least one of the plurality of segments.
  - 21. The system of claim 19 wherein the audio engine is adapted to:
    - use the noise model to identify at least one of the plurality of segments comprising sounds from noise; and
      
      assign a noise segment ID to the identified at least one of the plurality of segments.
  - 22. The system of claim 19 wherein the audio engine is adapted to:
    - use the key child model to identify at least one of the plurality of segments comprising sounds with characteristics associated with the sounds from the key child; and
      
      assign the key child segment ID to the identified at least one of the plurality of segments.
  - 23. The system of claim 19 wherein the audio engine is adapted to:
    - use the adult model to identify at least one of the plurality of segments comprising sounds from an adult; and
      
      assign an adult segment ID to the identified at least one of the plurality of segments.
  - 24. The system of claim 19 wherein the audio engine is adapted to:
    - use the electronic model to identify at least one of the plurality of segments comprising sounds having criteria associated with electronic device sounds, the criteria associated with electronic device sounds comprising at least one of;
      
      duration longer than a pre-set period; and
      
      a series of segments having a pre-set source pattern; and
      
      assign a noise segment ID to the identified at least one of the plurality of segments.
  - 25. The system of claim 19 wherein the age-dependent key child model comprises:
    - a first model group comprising criteria of sounds of children of a first age; and
      
      a second model group comprising criteria of sounds of children of a second age; and
      
      wherein the audio engine is adapted to;
      
      select one of the first model group and the second model group based on information associated with the key child;
      
      use the selected model group to identify at least one of the plurality of segments comprising sounds having characteristics of the selected model group; and
      
      assign the key child segment ID to the identified at least one of the plurality of segments.
  - 26. The system of claim 19 wherein the audio engine is adapted to:
    - use the parentese model to identify at least one of the plurality of segments comprising sounds having the complexity level of characteristics; and
      
      assign an adult segment ID to the identified at least one of the plurality of segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LENA Foundation
Original Assignee
LENA Foundation
Inventors
Paul, Terrance, Xu, Dongxin, Yapenel, Umit, Gray, Sharmistha
Primary Examiner(s)
Wozniak, James S.
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US12/018,647
Publication Number

US 20080235016A1
Time in Patent Office

1,420 Days
Field of Search

704 1- 9, 704231-257, 704270-271, 704E17001-E17016, 704E15001-E1505, 704E11001-E11007
US Class Current

704/254
CPC Class Codes

G06N 20/00   Machine learning

G10L 15/14   using statistical models, e...

G10L 25/78   Detection of presence or ab...

System and method for detection and analysis of speech

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

93 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for detection and analysis of speech

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

93 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links