Sound feature priority alignment

US 9,451,304 B2
Filed: 11/29/2012
Issued: 09/20/2016
Est. Priority Date: 11/29/2012
Status: Active Grant

First Claim

Patent Images

1. A method implemented by one or more computing devices, the method comprising:

identifying features of sound data from a plurality of recordings;

calculating values for individual frames of the sound data from the plurality of recordings, each of the values based on a similarity value scaled by a priority value, wherein;

the similarity value corresponds to a similarity determined using a similarity matrix built by comparing a first set of individual frames of a first of the plurality of recordings to a second set of individual frames of a second of the plurality of recordings, wherein the similarity matrix represents a comparison between the first set of individual frames and the second set of individual frames computed using a normalized inner product of compared frames; and

the priority value assigned to individual frames of the sound data based on the identified features of the sound data as the identified features occur in the respective individual frames of the sound data, the priority value based on how speech characteristics of a particular frame compare to speech characteristics of other individual frames of the sound data; and

aligning the sound data from the plurality of recordings based at least in part on the calculated values that are based on the similarity value scaled by the priority value, wherein individual frames of the sound data from the plurality of recordings are more likely to be aligned between different recordings when the priority value assigned to the individual frames of the sound data based on the identified features is higher than priority values assigned to other individual frames; and

outputting the aligned sound data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Sound feature priority alignment techniques are described. In one or more implementations, features of sound data are identified from a plurality of recordings. Values are calculated for frames of the sound data from the plurality of recordings. The values are based on similarity of the frames of the sound data from the plurality of recordings to each other, the similarity based on the identified features and a priority that is assigned based on the identified features of respective frames. The sound data from the plurality of recordings is then aligned based at least in part on the calculated values.

Citations

20 Claims

1. A method implemented by one or more computing devices, the method comprising:
- identifying features of sound data from a plurality of recordings;
  
  calculating values for individual frames of the sound data from the plurality of recordings, each of the values based on a similarity value scaled by a priority value, wherein;
  
  the similarity value corresponds to a similarity determined using a similarity matrix built by comparing a first set of individual frames of a first of the plurality of recordings to a second set of individual frames of a second of the plurality of recordings, wherein the similarity matrix represents a comparison between the first set of individual frames and the second set of individual frames computed using a normalized inner product of compared frames; and
  
  the priority value assigned to individual frames of the sound data based on the identified features of the sound data as the identified features occur in the respective individual frames of the sound data, the priority value based on how speech characteristics of a particular frame compare to speech characteristics of other individual frames of the sound data; and
  
  aligning the sound data from the plurality of recordings based at least in part on the calculated values that are based on the similarity value scaled by the priority value, wherein individual frames of the sound data from the plurality of recordings are more likely to be aligned between different recordings when the priority value assigned to the individual frames of the sound data based on the identified features is higher than priority values assigned to other individual frames; and
  
  outputting the aligned sound data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19)
- - 2. A method as described in claim 1, wherein the identified features include features that describe spectral characteristics of the sound data.
  - 3. A method as described in claim 2, wherein the spectral characteristics are described using bases and the identified features further include weights that describe temporal features of the sound data.
  - 4. A method as described in claim 1, wherein the identified features include features that describe a context of one or more said frames of the sound data.
  - 5. A method as described in claim 4, wherein the context indicates a phrase onset, phrase offset, word onset, or word offset.
  - 6. A method as described in claim 1, wherein the identified features describe characteristics of a respective frame, the characteristics including level of transience, silence, frame energy, peak value, pitch, frequency content, or phoneme type.
  - 7. A method as described in claim 1, wherein the scaling is performed using linear or non-linear scaling.
  - 8. A method as described in claim 1, wherein the calculating is performed such that the priority value that is assigned to the individual frames of the sound data from the plurality of sound recordings based on the identified features is used to override or apply an offset to a value indicating the similarity of the frames of the sound data to each other.
  - 9. A method as described in claim 1, wherein the priority value is assigned using one or more sound feature rules such that the priority value reflects a likelihood of perceptual importance to a user.
  - 10. A method as described in claim 1, further comprising building a similarity matrix using the calculated values that describes a comparison between frames from the sound data from a first said recording with frames from the sound data from a second said recording and wherein the similarity matrix is used as part of the aligning.
  - 19. A method as described in claim 1, wherein the aligning further comprises determining an optimal path such that individual frames of the first of the plurality of recordings is the most similar to the individual frames of the second of the plurality of recordings in the most number of frames.

11. A system comprising:
- at least one module implemented at least partially in hardware and configured to identify features of sound data; and
  
  one or more modules implemented at least partially in hardware and configured to automatically generate sound feature rules, the sound feature rules generated based on the identified features of the sound data, the one or more modules further configured to use the sound feature rules to assign a priority value to individual frames of the sound data based on features that are identified in the frames of the sound data, the one or more modules further configured to use the priority value to scale a similarity value of frames of different recordings to determine an alignment of sound data between the different recordings, the similarity value determined using a similarity matrix built by comparing a first set of individual frames of a first of the plurality of recordings to a second set of individual frames of a second of the plurality of recordings, wherein the similarity matrix represents a comparison between the first set of individual frames and the second set of individual frames computed using a normalized inner product of compared frames, the sound feature rules specifying that frames of the sound data having a higher energy are assigned a higher priority value for alignment than frames of the sound data having a lower energy.
- View Dependent Claims (12, 13, 18)
- - 12. A system as described in claim 11, wherein the identified features include:
    - features that are specified by a user;
      
      features that describe spectral characteristics of the sound data;
      
      features that describe a context of one or more said frames of the sound data to other frames, the context indicating a phrase onset, phrase offset, word onset, or word offset;
      
      orfeatures that describe characteristics of a respective said frame, the characteristics including level of transience, silence, frame energy, peak value, or phoneme type.
  - 13. A system as described in claim 11, wherein the one or more modules are further configured to align the sound data from the plurality of recordings to each other using the sound feature rules.
  - 18. A system as described in claim 11, wherein the priority value is assigned using the one or more sound feature rules such that the priority reflects a likelihood of perceptual importance to a user.

14. A computing device comprising:
- one or more processors; and
  
  one or more computer-readable storage media having instructions stored thereon that, responsive to execution by the one or more processors of the computing device, causes the computing device to perform operations comprising;
  
  aligning sound data from different recordings of a plurality of recordings to generate aligned sound data, the aligning performed based at least in part using sound feature rules to assign a priority value to individual frames in the sound data based on features that are identified in the individual frames of the sound data and scaling a similarity value by the priority value, the similarity value corresponding to a similarity determined using a similarity matrix built by comparing a first set of individual frames of a first of the plurality of recordings to a second set of individual frames of a second of the plurality of recordings, wherein the similarity matrix represents a comparison between the first set of individual frames and the second set of individual frames computed using a normalized inner product of compared frames, the sound feature rules configured such that priority values are increased to place higher importance on frames of the sound data with phrase onsets; and
  
  outputting the generated aligned sound data.
- View Dependent Claims (15, 16, 17, 20)
- - 15. The computing device as described in claim 14, wherein the features referenced by the sound feature rules include:
    - features that describe spectral characteristics of the sound data;
      
      features that describe a context of one or more said frames of the sound data to other frames, the context indicating a phrase onset, phrase offset, word onset, or word offset;
      
      orfeatures that describe characteristics of a respective said frame, the characteristics including level of transience, silence, frame energy, peak value, pitch, frequency content, or phoneme type.
  - 16. The computing device as described in claim 14, wherein the sound features rules are utilized to scale or override the similarity value.
  - 17. The computing device as described in claim 14, wherein the outputting is performed by storing the generated aligned sound data to computer-readable storage media, displaying a representation of the generated aligned sound data in a user interface, or outputting the generated aligned sound data via a sound output device.
  - 20. The computing device as described in claim 14, wherein the aligning further comprises determining an optimal path such that individual frames of the first of the plurality of recordings is the most similar to the individual frames of the second of the plurality of recordings in the most number of frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
King, Brian John, Mysore, Gautham J., Smaragdis, Paris
Primary Examiner(s)
Kuntz, Curtis
Assistant Examiner(s)
MAUNG, THOMAS H

Application Number

US13/688,421
Publication Number

US 20140148933A1
Time in Patent Office

1,391 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10H 7/008   Means for controlling the t...

G10L 21/04   Time compression or expansion

H04N 21/233   Processing of audio element...

H04N 21/242   Synchronization processes, ...

H04N 21/43072   of multiple content streams...

H04N 21/4394   involving operations for an...

Sound feature priority alignment

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Sound feature priority alignment

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links