Speaker dependent voiced sound pattern detection thresholds

US 10,242,677 B2
Filed: 08/25/2015
Issued: 03/26/2019
Est. Priority Date: 08/25/2015
Status: Active Grant

First Claim

Patent Images

1. A method of determining a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection, the method comprising:

converting, at one or more audio sensors, an audible signal into electronic audible signal data;

obtaining, from the electronic audible signal data, a common set of segment templates characterizing a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, wherein each segment template provides a stochastic characterization of how a particular portion of the VSP is vocalized by a particular speaker, wherein at least a subset of the first subset of the plurality of vocalization instances are divided into the same number of segments as one other;

synthesizing a noisy segment matrix using a second subset of the plurality of vocalization instances of the VSP, wherein the noisy segment matrix includes one or more noisy copies of segment representations of the second subset of the plurality of vocalization instances of the VSP;

scoring segments from the noisy segment matrix against the common set of segment templates, wherein utilizing the common set of segment templates for scoring the segments reduces resource utilization associated with scoring the segments;

synthesizing detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring; and

outputting the detection normalization threshold values to a non-transitory memory through an output device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various implementations disclosed herein include a training module configured to determining a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection. In some implementations, a method includes obtaining segment templates characterizing a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, each segment template provides a stochastic characterization of how a particular portion of the VSP is vocalized by a particular speaker; generating a noisy segment matrix using a second subset of the plurality of vocalization instances of the VSP, wherein the noisy segment matrix includes one or more noisy copies of segment representations of the second subset; scoring segments from the noisy segment matrix against the segment templates; and determining detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring.

14 Citations

View as Search Results

16 Claims

1. A method of determining a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection, the method comprising:
- converting, at one or more audio sensors, an audible signal into electronic audible signal data;
  
  obtaining, from the electronic audible signal data, a common set of segment templates characterizing a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, wherein each segment template provides a stochastic characterization of how a particular portion of the VSP is vocalized by a particular speaker, wherein at least a subset of the first subset of the plurality of vocalization instances are divided into the same number of segments as one other;
  
  synthesizing a noisy segment matrix using a second subset of the plurality of vocalization instances of the VSP, wherein the noisy segment matrix includes one or more noisy copies of segment representations of the second subset of the plurality of vocalization instances of the VSP;
  
  scoring segments from the noisy segment matrix against the common set of segment templates, wherein utilizing the common set of segment templates for scoring the segments reduces resource utilization associated with scoring the segments;
  
  synthesizing detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring; and
  
  outputting the detection normalization threshold values to a non-transitory memory through an output device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein each of the plurality of vocalization instances of the VSP has vocal characteristics of the particular speaker.
  - 3. The method of claim 1, wherein the second subset is substantially independent of the first subset.
  - 4. The method of claim 1, wherein obtaining the common set of segment templates includes:
    - selecting two or more related segments within a respective segment position across the concurrent segmentation of the first subset of the plurality of vocalization instances of the VSP; and
      
      determining the respective segment template by determining a function of spectral features of the selected two or more related segments.
  - 5. The method of claim 1, wherein synthesizing the noisy segment matrix includes:
    - segmenting the second subset of the plurality of vocalization instances of the VSP in order to generate test-training segments; and
      
      generating one or more noisy copies of each of the test-training segments by at least one of adding noise at one or more signal-to-noise (SNR) levels and adding noise of one or more noise types to each of the test-training segments.
  - 6. The method of claim 5, wherein segmenting the second subset of the plurality of vocalization instances of the VSP includes segmenting a single test-training VSP vocalization instance into the same number of segments, N_s, as in concurrent segmentation of the first subset.
  - 7. The method of claim 6, wherein segmenting the second subset of the plurality of vocalization instances of the VSP includes converting the N_ssegments into a spectral feature format that is compatible with the spectral feature format used to generate the concurrent segmentation of the first subset.
  - 8. The method of claim 1, wherein scoring segments from the noisy segment matrix against the common set of segment templates includes generating raw score match probabilities as a function of one or more statistical similarity characterizations between noisy copies of segment representations and the common set of segment templates.
  - 9. The method of claim 8, wherein generating the raw score match probabilities includes determining the inverse Euclidian distance between a particular noisy copy of a segment representation and a particular segment template.
  - 10. The method of claim 9, wherein the inverse Euclidian distance is determined between respective vector sets representing a particular noisy copy of a segment representation and a particular segment template.
  - 11. The method of claim 8, wherein generating the raw score match probabilities includes determining the Mahalanobis distance between a particular noisy copy of a segment representation and a particular segment template.
  - 12. The method of claim 8, wherein scoring segments from the noisy segment matrix against the segment templates includes generating a raw score posteriorgram, wherein the raw score posteriorgram includes match probabilities between the noisy copies of the segment representations and the common set of segment templates.
  - 13. The method of claim 8, wherein generating the raw score match probabilities includes generating an accumulated score for each segment template for each combination of SNR level and noise type.
  - 14. The method of claim 8, wherein scoring segments from the noisy segment matrix against the common set of segment templates includes generating unbiased scores from the raw score match probabilities at a number of SNR levels for at least one particular noise type by subtracting a windowed mean of an accumulated score from the accumulated score.
  - 15. The method of claim 1, wherein synthesizing the detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring includes:
    - selecting a respective unbiased score for each of two or more SNR levels;
      
      identifying a corresponding peak in each respective unbiased score; and
      
      determining a respective sigmoid center anchor as a function of the corresponding peak value for each SNR, wherein each detection normalization threshold value includes a respective sigmoid center anchor.

16. A system provided to determine a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection, the system comprising:
- one or more audio sensors configured to convert an audible signal into electronic audible signal data;
  
  a processor; and
  
  a non-transitory memory including instructions which, when executed by the processor, cause the system to;
  
  synthesize, based on the electronic audible signal data, match probabilities as a function of one or more statistical similarity characterizations between noisy copies of segment representations and common segment templates, wherein the common segment templates characterize a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, wherein at least a subset of the first subset of the plurality of vocalization instances are divided into the same number of segments as one other, and each of the segment representations are associated with a second subset of the plurality of vocalization instances of the VSP, wherein utilizing the common segment templates for synthesizing the match probabilities reduces resource utilization associated with synthesizing the match probabilities;
  
  synthesize unbiased scores from raw score match probabilities at a number of (signal-to-noise) SNR levels of at least one particular noise type;
  
  synthesize detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on the unbiased scores; and
  
  output the detection normalization threshold values to the non-transitory memory through an output device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Malaspina Labs (Barbados), Inc.
Original Assignee
Malaspina Labs (Barbados), Inc.
Inventors
Escott, Alexander
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US14/835,192
Publication Number

US 20170061970A1
Time in Patent Office

1,309 Days
Field of Search

704234
US Class Current
CPC Class Codes

G10L 17/04   Training, enrolment or mode...

G10L 17/08   Use of distortion metrics o...

G10L 17/12   Score normalisation

G10L 17/20   Pattern transformations or ...

Speaker dependent voiced sound pattern detection thresholds

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

14 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker dependent voiced sound pattern detection thresholds

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links