Language informed source separation

US 8,843,364 B2
Filed: 02/29/2012
Issued: 09/23/2014
Est. Priority Date: 02/29/2012
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer-readable storage medium storing program instructions, the program instructions being computer-executable to implement:

for a first source, generating a model for each word of a plurality of words, each model includes including;

a plurality of dictionaries, each of the plurality of dictionaries including one or more spectral components; and

probabilities of transition between the plurality of dictionaries; and

constraining the models according to high level information that defines valid transitions, the constrained models being usable to perform source separation on a sound mixture that includes multiple sources.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. The modeling may be constrained according to high level information. In some embodiments, methods and systems may enable the separation of a signal'"'"'s various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source separation/extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.

30 Citations

View as Search Results

20 Claims

1. A non-transitory computer-readable storage medium storing program instructions, the program instructions being computer-executable to implement:
- for a first source, generating a model for each word of a plurality of words, each model includes including;
  
  a plurality of dictionaries, each of the plurality of dictionaries including one or more spectral components; and
  
  probabilities of transition between the plurality of dictionaries; and
  
  constraining the models according to high level information that defines valid transitions, the constrained models being usable to perform source separation on a sound mixture that includes multiple sources.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The non-transitory computer-readable storage medium of claim 1, wherein the high level information is a language model that defines a corpus of words and a plurality of valid sequences of the words of the corpus.
  - 3. The non-transitory computer-readable storage medium of claim 1, wherein said generating the model for each word includes performing a non-negative hidden Markov technique.
  - 4. The non-transitory computer-readable storage medium of claim 1, wherein the program instructions are further computer-executable to implement combining the models into a single source dependent model, wherein said constraining the models includes constraining transitions between the models of the single source dependent model according to the high level information.
  - 5. The non-transitory computer-readable storage medium of claim 1, wherein the program instructions are further computer-executable to implement:
    - for a second source, generating another model for each word of the plurality of words; and
      
      constraining the other models according to the high level information.
  - 6. The non-transitory computer-readable storage medium of claim 5, wherein the program instructions are further computer-executable to implement combining the models and the other models into a single composite model.
  - 7. The non-transitory computer-readable storage medium of claim 6, wherein said performing source separation includes:
    - receiving the sound mixture that includes the first and second sources;
      
      receiving the single composite model; and
      
      for each time frame of the sound mixture, estimating a weight of each of the first and second sources in the sound mixture based on the single composite model.
  - 8. The non-transitory computer-readable storage medium of claim 6, wherein the program instructions are further computer-executable to implement pruning the single composite model according to a threshold.
  - 9. The non-transitory computer-readable storage medium of claim 1, wherein said generating the model of each word is based on multiple instances of the respective word.
  - 10. The non-transitory computer-readable storage medium of claim 1, wherein a portion of a given word of the plurality of words is represented by a linear combination of one or more spectral components of one of the respective word'"'"'s corresponding dictionaries.

11. A non-transitory computer-readable storage medium storing program instructions, the program instructions being computer-executable to implement:
- receiving a sound mixture including a first source and a second source;
  
  receiving a model including;
  
  a first plurality of dictionaries corresponding to a first source, the first plurality of dictionaries including multiple dictionaries for each word of a plurality of words;
  
  a first transition matrix corresponding to the first source, the transition matrix including probabilities of transition among the first plurality of dictionaries, at least some of the probabilities of transition are based on high level information that defines valid transitions;
  
  a second plurality of dictionaries corresponding to the second source, the second plurality of dictionaries including multiple other dictionaries for each word of the plurality of words; and
  
  a second transition matrix corresponding to the second source, the second transition matrix including probabilities of transition among the second plurality of dictionaries, at least some of the probabilities of transition in the second transition matrix being based on the high level information; and
  
  calculating contributions to the sound mixture from respective plurality of dictionaries for each of the first and second sources, said calculating is based on the model.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The non-transitory computer-readable storage medium of claim 11, wherein said estimating is performed for each time frame of the sound mixture.
  - 13. The non-transitory computer-readable storage medium of claim 11, wherein said calculating a contribution of the first plurality of dictionaries and a contribution of the second plurality of dictionaries to the sound mixture, wherein the high level information is a language model that defines valid grammar.
  - 14. The non-transitory computer-readable storage medium of claim 11, wherein the model is a non-negative factorial hidden Markov model.
  - 15. The non-transitory computer-readable storage medium of claim 11, wherein the program instructions are further computer-executable to implement:
    - generating a mask for the first source based on the estimated contributions from the first source'"'"'s respective dictionaries; and
      
      applying each mask to the sound mixture to separate the respective source from the sound mixture.

16. A method, comprising:
- for each source of a plurality of sources, generating a plurality of word level models, each word level model corresponding to a respective one word of a plurality of words, each word level model including;
  
  a plurality of dictionaries, each of the plurality of dictionaries including one or more spectral components, andprobabilities of transition between the dictionaries;
  
  for each source, combining the word level models into a single source specific model; and
  
  constraining the single source specific models according to high level information that defines valid transitions, the constrained single source specific models being usable to perform source separation on a sound mixture that includes multiple sources.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, wherein the high level information is a language model that defines a corpus of words and a plurality of valid sequences of the words of the corpus.
  - 18. The method of claim 16, wherein said generating the plurality of word level models includes performing a non-negative hidden Markov technique.
  - 19. The method of claim 16, wherein each word level model is based on multiple instances of the corresponding respective word.
  - 20. The method of claim 16, wherein said constraining the single source specific models includes constraining transitions between word level models in the single source dependent model according to the high level information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Mysore, Gautham J., Smaragdis, Paris
Primary Examiner(s)
Chawan, Vijay B
Assistant Examiner(s)
Shin, Seong-Ah A

Application Number

US13/408,934
Publication Number

US 20130226558A1
Time in Patent Office

937 Days
Field of Search

704/226, 704/238, 704/240, 704/256, 704/200, 704/10
US Class Current

704/10
CPC Class Codes

G10L 21/028 using properties of sound s...

Language informed source separation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Language informed source separation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links