System and method for distinguishing source from unconstrained acoustic signals emitted thereby in context agnostic manner

US 9,558,762 B1
Filed: 07/03/2012
Issued: 01/31/2017
Est. Priority Date: 07/03/2011
Status: Active Grant

First Claim

Patent Images

1. A system for distinguishing between a plurality of acoustic sources generating sound in the form of unconstrained acoustic signals captured therefrom, comprising:

a transformation unit applying a spectrographic transformation upon each time-captured segment of unconstrained acoustic signal generated by one of a plurality of distinct acoustic sources, said transformation unit generating a spectral vector for each said segment;

a sparse decomposition unit coupled to said transformation unit, said sparse decomposition unit selectively executing in at least a training system mode a simultaneous sparse approximation upon a joint corpus of spectral vectors for a plurality of unconstrained acoustic signal segments from at least a subset of the distinct acoustic sources, at least one of said spectral vectors generated by the spectrographic transformation, said sparse decomposition unit generating at least one sparse decomposition defined in a multi-dimensional space for each said spectral vector in terms of a representative set of decomposition atoms;

a discriminant reduction unit coupled to said sparse decomposition unit, said discriminant reduction unit being executable during the training system mode to down-select from said representative set of decomposition atoms an optimal combination of atoms for cooperatively distinguishing acoustic signals emitted by different ones of the distinct acoustic sources; and

,a classification unit coupled to said sparse decomposition unit, said classification unit being executable in a classification system mode to;

project a spectral vector of an input acoustic signal segment onto said multi-dimensional space to generate a sparse decomposition therefor as a coefficient weighted sum of said representative set of decomposition atoms,discover for said sparse decomposition of an input acoustic signal segment a degree of similarity relative to each of the distinct acoustic sources, anddetermine one of the distinct acoustic sources to have generated the input acoustic signal segment as sound, according to the degree of similarity.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are provided for distinguishing between a plurality of sources based upon unconstrained acoustic signals captured therefrom. A spectrographic transformation is applied to time-captured segments of acoustic signals to generate a spectral vector for each. A selectively executed sparse decomposition includes in a training system mode simultaneous sparse approximation upon a joint corpus of spectral vectors for a plurality of acoustic signal segments from distinct sources. At least one sparse decomposition is executed for each spectral vector in terms of a representative set of decomposition atoms. Discriminant reduction executes during the training system mode to down-select from the representative set an optimal combination of atoms for cooperatively distinguishing acoustic signals emitted by different distinct sources. Classification is subsequently executed upon the sparse decomposition of an input acoustic signal segment unit to discover a degree of correlation for the input acoustic signal segment relative to each distinct source.

57 Citations

View as Search Results

32 Claims

1. A system for distinguishing between a plurality of acoustic sources generating sound in the form of unconstrained acoustic signals captured therefrom, comprising:
- a transformation unit applying a spectrographic transformation upon each time-captured segment of unconstrained acoustic signal generated by one of a plurality of distinct acoustic sources, said transformation unit generating a spectral vector for each said segment;
  
  a sparse decomposition unit coupled to said transformation unit, said sparse decomposition unit selectively executing in at least a training system mode a simultaneous sparse approximation upon a joint corpus of spectral vectors for a plurality of unconstrained acoustic signal segments from at least a subset of the distinct acoustic sources, at least one of said spectral vectors generated by the spectrographic transformation, said sparse decomposition unit generating at least one sparse decomposition defined in a multi-dimensional space for each said spectral vector in terms of a representative set of decomposition atoms;
  
  a discriminant reduction unit coupled to said sparse decomposition unit, said discriminant reduction unit being executable during the training system mode to down-select from said representative set of decomposition atoms an optimal combination of atoms for cooperatively distinguishing acoustic signals emitted by different ones of the distinct acoustic sources; and
  
  ,a classification unit coupled to said sparse decomposition unit, said classification unit being executable in a classification system mode to;
  
  project a spectral vector of an input acoustic signal segment onto said multi-dimensional space to generate a sparse decomposition therefor as a coefficient weighted sum of said representative set of decomposition atoms,discover for said sparse decomposition of an input acoustic signal segment a degree of similarity relative to each of the distinct acoustic sources, anddetermine one of the distinct acoustic sources to have generated the input acoustic signal segment as sound, according to the degree of similarity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The system as recited in claim 1, wherein said discriminant reduction unit includes a Support Vector Machine (SVM) portion programmably implemented therein, said SVM portion pair-wise comparing the distinct acoustic sources in sparse decomposition to selectively determine said optimal combination of atoms for each said pair-wise comparison.
  - 3. The system as recited in claim 2, wherein said SVM portion determines for each said pair-wise comparison of sources a two-dimensional decision subspace defined by a corresponding pair of optimal atoms;
    - and, said classification unit executes a non-parametric voting process iteratively mapping corresponding portions of said input acoustic signal segment sparse decomposition to each said decision subspace.
  - 4. The system as recited in claim 3, wherein at least one said acoustic signal segment is of known distinct acoustic source prior to initiation of the training system mode, said sparse decomposition and discriminant reduction units thereby executing in the training system mode to identify a distinct class corresponding to the known distinct acoustic source.
  - 5. The system as recited in claim 3, wherein none of said acoustic signal segments is of known distinct acoustic source prior to initiation of the training system mode, said sparse decomposition and discriminant reduction units thereby executing in the training system mode to cluster together similar ones of said segments.
  - 6. The system as recited in claim 3, wherein a plurality of sub-segments are delineated within each said segment;
    - wherein said sparse decomposition unit generates over each said sub-segment a parametric mean of said sparse decompositions, each said sub-segment parametric mean being defined in terms of said representative set of decomposition atoms; and
      
      wherein said simultaneous sparse approximation and parametric mean are carried out according to a greedy adaptive decomposition (GAD) process.
  - 7. The system as recited in claim 6, wherein:
    - said spectrographic transformation includes a Short-Time-Fourier-Transform (STFT) process, and said spectral vectors are defined in a time-frequency domain; and
      
      ,said sparse decompositions are each defined in a cepstral-frequency domain as a coefficient weighted sum of said representative set of atoms.
  - 8. The system as recited in claim 7, wherein said segments of acoustic signals include time-captured audio recordings of speech, and the distinct acoustic sources include individual speakers.
  - 9. The system as recited in claim 8, wherein said GAD process references a Gabor type dictionary for representation of said sparse decomposition as a sparse adaptive tiling of a C-F plane.
  - 10. The system as recited in claim 7, wherein said segments of acoustic signals include time-captured audio recordings of speech;
    - and, the distinct acoustic sources include distinct groups of speakers, each distinct group of speakers having a predetermined shared attribute selected from the group consisting of;
      
      common language, common gender, common ethnicity, common idiosyncrasies, common verbal tendencies, and common exhibited stress level.
  - 11. The system as recited in claim 7, wherein said segments of acoustic signals include time-captured audio recordings of non-verbal sounds emitted by sources selected from the group consisting of:
    - non-human creatures, inanimate objects, machinery, and natural phenomena.
  - 12. The system as recited in claim 7, wherein at least one of the transformation unit, sparse decomposition unit, discriminant reduction unit, or classification unit is implemented as part of a mobile communication device, andwherein the mobile communication device captures parts or all of the unconstrained acoustic signals.

13. A method for distinguishing between a plurality of acoustic sources generating sound in the form of unconstrained acoustic signals captured therefrom, comprising the steps of:
- applying a spectrographic transformation upon a plurality of time-captured segments of unconstrained acoustic signals to generate a spectral vector for each said segment, said unconstrained acoustic signals generated by one of a plurality of distinct acoustic sources;
  
  selectively executing in a processor a sparse decomposition of each said spectral vector, said sparse decomposition including in a training system mode a simultaneous sparse approximation upon a joint corpus of spectral vectors for a plurality of unconstrained acoustic signal segments from at least a subset of the distinct acoustic sources, at least one of said spectral vectors generated by the spectrographic transformation, executing at least one sparse decomposition defined in an multi-dimensional space for each said spectral vector in terms of a representative set of decomposition atoms;
  
  executing discriminant reduction in a processor during the training system mode to down-select from said representative set of decomposition atoms an optimal combination of atoms for cooperatively distinguishing acoustic signals emitted by different ones of the distinct acoustic sources; and
  
  ,executing classification upon said sparse decomposition of an input acoustic signal segment during a classification system mode, said classification including executing a processor to;
  
  project a spectral vector of said input acoustic signal segment onto said multi-dimensional space to generate a sparse decomposition therefor as a coefficient weighted sum of said representative set of decomposition atoms,discover a degree of similarity for said input acoustic signal segment relative to each of the distinct acoustic sources, anddetermine one of the distinct acoustic sources to have generated the input acoustic signal segment as sound, according to the degree of similarity.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 14. The method as recited in claim 13, wherein said discriminant reduction includes carrying out a Support Vector Machine (SVM) process pair-wise comparing the distinct acoustic sources in sparse decomposition to selectively determine said optimal combination of atoms for each said pair-wise comparison.
  - 15. The method as recited in claim 14, wherein said SVM process determines for each said pair-wise comparison of sources a two-dimensional decision subspace defined by a corresponding pair of optimal atoms;
    - and, said classification includes a non-parametric voting process iteratively mapping corresponding portions of said input acoustic signal segment sparse decomposition to each said decision subspace.
  - 16. The method as recited in claim 15, wherein at least one said acoustic signal segment is of known distinct acoustic source prior to initiation of the training system mode, said sparse decomposition and discriminant reduction units thereby executing in the training system mode to identify a distinct class corresponding to the known distinct acoustic source.
  - 17. The method as recited in claim 15, wherein none of said acoustic signal segments is of known distinct acoustic source prior to initiation of the training system mode, said sparse decomposition and discriminant reduction units thereby executing in the training system mode to cluster together similar ones of said segments.
  - 18. The method as recited in claim 15, wherein a plurality of sub-segments are delineated within each said segment;
    - wherein a parametric mean of said sparse decompositions over each said sub-segment is generated, each said sub-segment parametric mean being defined in terms of said representative set of decomposition atoms; and
      
      ,wherein said simultaneous sparse approximation and parametric mean are carried out according to a greedy adaptive decomposition (GAD) process.
  - 19. The method as recited in claim 18, wherein:
    - said spectrographic transformation includes a Short-Time-Fourier-Transform (STFT) process, and said spectral vectors are defined in a time-frequency domain; and
      
      ,said sparse decompositions are each defined in a cepstral-frequency domain to generate a coefficient-weighted sum of said representative set of atoms.
  - 20. The method as recited in claim 19, wherein said segments of acoustic signals include time-captured audio recordings of speech, and the distinct acoustic sources include individual speakers.
  - 21. The method as recited in claim 20, wherein said GAD process references a Gabor type dictionary for representation of said sparse decomposition as a sparse adaptive tiling of a C-F plane.
  - 22. The method as recited in claim 19, wherein said segments of acoustic signals include time-captured audio recordings of speech;
    - and, the distinct acoustic sources include distinct groups of speakers, each distinct group of speakers having a predetermined shared attribute selected from the group consisting of;
      
      common language, common gender, common ethnicity, common idiosyncrasies, common verbal tendencies, and common exhibited stress level.
  - 23. The method as recited in claim 19, wherein said segments of acoustic signals include time-captured audio recordings of non-verbal sounds emitted by sources selected from the group consisting of:
    - non-human creatures, inanimate objects, machinery, and natural phenomena.
  - 24. The method as recited in claim 19,wherein the capturing of at least one of the input unconstrained acoustic signals is performed by a mobile communication device, andwherein at least one of the unit operations of spectrographic transformation, sparse decomposition, discriminant reduction, and classification is performed by said mobile communication device.
  - 25. A non-transitory computer readable medium storing a computer program that when executed causes a processor to perform the method of claim 19.

26. A system for distinguishing a source from unconstrained acoustic signals captured thereby in context-agnostic manner, comprising:
- (a) a transformation unit applying a Short-Time-Fourier-Transform (STFT) process upon each time-captured segment of unconstrained acoustic signal generated by one of a plurality of distinct acoustic sources, said transformation unit generating a spectral vector defined in a time-frequency plane for each said segment;
  
  (b) a training unit coupled to said transformation unit, said training unit including;
  
  a cepstral decomposition portion executing a simultaneous sparse approximation upon a joint corpus of spectral vectors for a plurality of unconstrained acoustic signal segments from at least a subset of the distinct acoustic sources, at least one of said spectral vectors generated by the STFT process, said simultaneous sparse approximation including a greedy adaptive decomposition (GAD) process referencing a Gabor dictionary, said cepstral decomposition portion generating for each said spectral vector in said joint corpus at least one cepstral decomposition defined on a cepstral-frequency plane as a coefficient weighted sum of a representative set of decomposition atoms; and
  
  ,a discriminant reduction portion coupled to said cepstral decomposition portion, said discriminant reduction portion being executable to down-select from said representative set of decomposition atoms an optimal combination of atoms for cooperatively distinguishing acoustic signals emitted by different ones of the distinct acoustic sources;
  
  (c) a classification unit coupled to said transformation unit, said classification unit including;
  
  a cepstral projection portion projecting a spectral vector of an input acoustic signal segment onto said cepstral-frequency plane to generate a cepstral decomposition therefor as a coefficient weighted sum of said representative set of decomposition atoms; and
  
  ,a classification decision portion coupled to said cepstral projection portion, said classification decision portion being executable to discover for said cepstral decomposition of said input acoustic signal segment a degree of similarity relative to each of the distinct acoustic sources, and to thereby determine one of the distinct acoustic sources to have generated the input acoustic signal segment as sound, according to the degree of similarity.
- View Dependent Claims (27, 28, 29)
- - 27. The system as recited in claim 26, wherein said segments of acoustic signals include time-captured audio recordings of speech, and the distinct acoustic sources include individual speakers.
  - 28. The system as recited in claim 27, wherein said discriminant reduction portion includes a Support Vector Machine (SVM) part programmably implemented therein, said SVM part pair-wise comparing the distinct acoustic sources in cepstral decomposition to selectively determine said optimal combination of atoms for each said pair-wise comparison.
  - 29. The system as recited in claim 28, wherein said SVM part determines for each said pair-wise comparison of sources a two-dimensional decision subspace defined by a corresponding pair of optimal atoms;
    - and, said classification decision portion executes a non-parametric voting process iteratively mapping corresponding portions of said input acoustic signal segment cepstral decomposition to each said decision subspace.

30. A system for distinguishing between a plurality of acoustic sources generating sound in the form of unconstrained acoustic signals captured therefrom, comprising:
- a transformation unit applying a spectrographic transformation upon each time-captured segment of unconstrained acoustic signal generated by one of a plurality of distinct acoustic sources, said transformation unit generating a spectral vector for each said segment;
  
  a sparse decomposition unit coupled to said transformation unit, said sparse decomposition unit selectively executing in at least a training system mode a simultaneous sparse approximation upon a joint corpus of spectral vectors for a plurality of unconstrained acoustic signal segments from at least a subset of the distinct acoustic sources, at least one of said spectral vectors generated by the spectrographic transformation, said sparse decomposition unit generating at least one sparse decomposition defined on a two-dimensional plane for each said spectral vector in terms of a representative set of decomposition atoms;
  
  a discriminant reduction unit coupled to said sparse decomposition unit, said discriminant reduction unit being executable during the training system mode to down-select from said representative set of decomposition atoms an optimal combination of atoms for cooperatively distinguishing acoustic signals emitted by different ones of the distinct acoustic sources based on characteristics of the acoustic signals independent of contextually-determined data content; and
  
  ,a classification unit coupled to said sparse decomposition unit, said classification unit being executable in a classification system mode to;
  
  project a spectral vector of an input acoustic signal segment onto said two-dimensional plane to generate a sparse decomposition therefor as a coefficient weighted sum of said representative set of decomposition atoms,discover for said sparse decomposition of an input acoustic signal segment a degree of similarity between the representative set of decomposition atoms of the signal segment and the optimal combination of atoms of each of the distinct acoustic sources, anddetermine one of the distinct acoustic sources to have generated the input acoustic signal segment as sound, according to the degree of similarity.
- View Dependent Claims (31, 32)
- - 31. The system as recited in claim 30, wherein:
    - said discriminant reduction unit includes a Support Vector Machine (SVM) portion programmably implemented therein, said SVM portion pair-wise comparing the distinct acoustic sources in sparse decomposition to selectively determine said optimal combination of atoms for each said pair-wise comparison; and
      
      ,said SVM portion determines for each said pair-wise comparison of sources a two-dimensional decision subspace defined by a corresponding pair of optimal atoms; and
      
      , said classification unit executes a non-parametric voting process iteratively mapping corresponding portions of said input acoustic signal segment sparse decomposition to each said decision subspace.
  - 32. The system as recited in claim 31, wherein said simultaneous sparse approximation and parametric mean are carried out according to a greedy adaptive decomposition (GAD) process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Reality Analytics, Inc. (Renesas Electronics Corporation)
Original Assignee
Reality Analytics, Inc. (Renesas Electronics Corporation)
Inventors
Sieracki, Jeffrey M.
Primary Examiner(s)
Sirjani, Fariba

Application Number

US13/541,592
Time in Patent Office

1,673 Days
Field of Search

704/237
US Class Current

1/1
CPC Class Codes

G06F 2218/08   Feature extraction

G06V 40/33   based only on signature ima...

G10L 13/00   Speech synthesis; Text to s...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 17/06   Decision making techniques;...

G10L 2021/02087   the noise being separate sp...

G10L 21/0272   Voice signal separating

G10L 25/00   Speech or voice analysis te...

System and method for distinguishing source from unconstrained acoustic signals emitted thereby in context agnostic manner

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

57 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for distinguishing source from unconstrained acoustic signals emitted thereby in context agnostic manner

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links