Method and device for detecting speech patterns and errors when practicing fluency shaping techniques

US 10,188,341 B2
Filed: 12/29/2015
Issued: 01/29/2019
Est. Priority Date: 12/31/2014
Status: Active Grant

First Claim

Patent Images

1. A method for detecting errors when practicing fluency shaping exercises, comprising:

receiving a set of initial energy levels;

setting each threshold of a set of thresholds to a respective predetermined initial value;

receiving a voice production of a user practicing a fluency shaping exercise;

analyzing the received voice production to compute a set of energy levels composing the voice production;

detecting at least one speech-related error based on the computed set of energy levels, the set of initial energy levels, and the set of a thresholds, wherein the detection of the at least one speech-related error is with respect to the fluency shaping exercise being practiced by the user;

wherein the set of initial energy levels includes at least one of;

a normal speech energy level, a silence energy level, and a calibration energy level,upon detection of the at least one speech-related error, generating visual feedback indicating the at least one detected speech-related error with respect to the received voice production, andperforming an audio calibration process for a computing device of the user to set the normal speech energy level, the silence energy level, and the calibration energy level, wherein the voice production is captured on the computing device of the user,wherein processing the received voice production further comprises;

sampling the received voice production to create voice samples;

buffering the voice samples to create voice chunks;

converting the voice chunks from a time domain to a frequency domain;

extracting spectrum features from each of the frequency domain voice chunks, wherein the spectrum features include at least dominant frequencies, wherein each dominant frequency corresponds to a voice chunk;

computing, for each voice chunk, the energy level of the corresponding dominant frequency; and

determining, for each voice chunk, an energy level of the voice chunk based on the energy level of the corresponding dominant frequency.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and device for detecting errors when practicing fluency shaping exercises, are presented. The method includes receiving a set of initial energy levels; setting a set of thresholds to their respective initial values; receiving a voice production of a user practicing a fluency shaping exercise; analyzing the received voice production to compute a set of energy levels composing the voice production; detecting based on the computed set of energy levels, the set of initial energy levels, and the set of a threshold of at least one speech-related error, wherein the detection of the at least one speech-related error is respective of the fluency shaping exercise being practiced by the user; and upon detection of the at least one speech-related error, generating a feedback indicating the at least one detected speech-related error.

31 Citations

37 Claims

1. A method for detecting errors when practicing fluency shaping exercises, comprising:
- receiving a set of initial energy levels;
  
  setting each threshold of a set of thresholds to a respective predetermined initial value;
  
  receiving a voice production of a user practicing a fluency shaping exercise;
  
  analyzing the received voice production to compute a set of energy levels composing the voice production;
  
  detecting at least one speech-related error based on the computed set of energy levels, the set of initial energy levels, and the set of a thresholds, wherein the detection of the at least one speech-related error is with respect to the fluency shaping exercise being practiced by the user;
  
  wherein the set of initial energy levels includes at least one of;
  
  a normal speech energy level, a silence energy level, and a calibration energy level,upon detection of the at least one speech-related error, generating visual feedback indicating the at least one detected speech-related error with respect to the received voice production, andperforming an audio calibration process for a computing device of the user to set the normal speech energy level, the silence energy level, and the calibration energy level, wherein the voice production is captured on the computing device of the user,wherein processing the received voice production further comprises;
  
  sampling the received voice production to create voice samples;
  
  buffering the voice samples to create voice chunks;
  
  converting the voice chunks from a time domain to a frequency domain;
  
  extracting spectrum features from each of the frequency domain voice chunks, wherein the spectrum features include at least dominant frequencies, wherein each dominant frequency corresponds to a voice chunk;
  
  computing, for each voice chunk, the energy level of the corresponding dominant frequency; and
  
  determining, for each voice chunk, an energy level of the voice chunk based on the energy level of the corresponding dominant frequency.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the set of initial energy levels includes a calibration energy level, wherein detecting the at least one speech-related error further comprises:
    - checking if a percentage of a total of computed energy levels above the calibration energy level is below an initial value of a too soft threshold, wherein the too soft threshold is one of the set of the thresholds; and
      
      detecting a too soft error voice production when the percentage is below the initial value of the too soft threshold.
  - 3. The method of claim 2, wherein detecting the at least one speech-related error further comprises:
    - checking if the percentage of the total of computed energy levels above the calibration energy level is above an initial value of a too loud threshold, wherein the too loud threshold is one of the set of thresholds; and
      
      detecting a too loud error voice production when the percentage is above the initial value of the too loud threshold.
  - 4. The method of claim 3, wherein the set of initial energy levels includes a normal speech enemy level and a silence energy level, wherein detecting the at least one speech-related error further comprises:
    - computing an energy difference between the normal speech energy level and the silence energy level;
      
      comparing the energy difference to the too loud threshold and the too soft threshold; and
      
      detecting a syllable transition error when the computed difference is below the too loud threshold and the too soft threshold.
  - 5. The method of claim 1, wherein the set of initial energy levels includes a normal speech energy level, wherein detecting the at least one speech-related error further comprises:
    - determining a maximum energy level out of the measured energy levels of the voice production;
      
      checking if the maximum energy level is above an initial value of an intense peak threshold, wherein an initial value the intense peak threshold is set respective of the normal speech energy level, wherein the intense peak threshold is one of the set of the thresholds; and
      
      detecting an intense peak threshold error voice production when the maximum energy level is above the initial value of an intense peak threshold.
  - 6. The method of claim 1, wherein the set of initial energy levels includes a normal speech energy level and a silence energy level, further comprising:
    - computing an energy difference between each two consecutive energy levels;
      
      comparing the energy difference to an initial value of an onset slope threshold, wherein an initial value the onset slope threshold is set respective of the silence energy level and a maximum energy level, wherein the onset slope threshold is one of the set of the thresholds;
      
      detecting an un-gradual slope gentle onset speech error, when the computed energy difference is above the initial value of the onset slope threshold.
  - 7. The method of claim 1, further comprising:
    - determining a maximum energy level out of the measured energy levels of the voice production;
      
      comparing the energy difference to an initial value of an onset amplitude threshold, wherein the onset amplitude threshold is one of the set of thresholds;
      
      detecting a high amplitude gentle onset speech error, when the maximum energy level is above the initial value of the onset amplitude threshold.
  - 8. The method of claim 7, wherein the detected speech-related error further includes any one of:
    - a too-long gentle onset and a concave gentle onset.
  - 9. The method of claim 1, wherein the detected speech-related error further includes any one of:
    - a too-long gentle offset, an un-gradual slope gentle offset, a high amplitude gentle offset, and a concave gentle offset.
  - 10. The method of claim 1, wherein the set of initial energy levels includes a silence energy level, further comprising:
    - determining a maximum energy level out of the measured energy levels of the voice production;
      
      computing an energy difference between each two consecutive energy levels from the silence energy level to the maximum energy level; and
      
      detecting a volume control speech error, when the energy difference is negative.
  - 11. The method of claim 1, wherein the set of initial energy levels includes a silence energy level, further comprising:
    - determining a maximum energy level out of the measured energy levels of the voice production;
      
      computing an energy difference between each two consecutive energy levels from the maximum energy level to the silence energy level; and
      
      detecting a volume control speech error, when the energy difference is positive.
  - 12. The method of claim 1, further comprising:
    - checking if a first number of the computed energy levels out of a total of computed energy levels is above an initial value of a soft peak threshold, wherein the soft peak threshold is one of the set of the thresholds; and
      
      detecting a soft peak error voice production when the first number of the computed energy levels is above the soft peak threshold.
  - 13. The method of claim 1, further comprising:
    - measuring a speech rate respective of the analysis; and
      
      detecting a speech rate error when the measured speech rate is below an initial value of a rate threshold, wherein the rate threshold is one of the set of thresholds and set to indicate a normal speech rate.
  - 14. The method of claim 1, further comprising:
    - sending the generated feedback to a computing device of the user for display.
  - 15. The method of claim 14, wherein generating the feedback further comprises:
    - coloring the voice production using at least a first color and a second color, wherein the first color represents a loud sound produced by the user and the second color represents a soft sound produced by the user.
  - 16. The method of claim 1, wherein the at least one exercise includes a sequence of voice productions.
  - 17. The method of claim 1, further comprising:
    - generating a report summarizing the execution of the voice production throughout a current therapy session; and
      
      saving the report.
  - 18. The method of claim 1, wherein the fluency shaping exercise is being practiced during a speech disorder therapy, the speech disorder therapy is used for at least one of:
    - stuttering, cluttering, and diction.
  - 19. A non-transitory computer readable medium having stored thereon instructions for causing one or more processing units to execute the method according to claim 1.

20. A device for detecting errors when practicing of fluency shaping exercises, comprising:
- a processing unit; and
  
  a memory, the memory containing instructions that, when executed by the processing unit, configures the device to;
  
  receive a set of initial energy levels, wherein the set of initial energy levels include at least one of;
  
  a normal speech energy level, a silence energy level, and a calibration energy level;
  
  set each threshold of a set of thresholds to a respective predetermined initial value;
  
  receive a voice production of a user practicing a fluency shaping exercise;
  
  analyze the received voice production to compute a set of energy levels composing the voice production;
  
  detect at least one speech-related error based on the computed set of energy levels, the set of initial energy levels, and the set of thresholds, wherein the detection of the least one speech-related error is with respect to the fluency shaping exercise being practiced by the user; and
  
  upon detection of at least one speech-relate error, generate a-visual feedback indicating the least one detected speech-related error with respect to the received voice production,wherein the device is further configured to;
  
  perform an audio calibration process for a computing device of user to set the normal speech energy level, the silence energy level, and the calibration energy level, wherein the voice production is captured on a computing device of the user, andwherein the device is further configured to;
  
  sample the received voice production to create voice samples;
  
  buffer the voice samples to create voice chunks;
  
  convert the voice chunks from a time domain to a frequency domain;
  
  extract spectrum features from each of the frequency domain voice chunks, wherein the spectrum features include at least dominant frequencies, wherein each dominant frequency corresponds to a voice chunk;
  
  compute, for each voice chunk, the energy level of the corresponding dominant frequency; and
  
  determine, for each voice chunk, an energy level of the voice chunk based on the energy level of the corresponding dominant frequency.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 21. The device of claim 20, wherein the set of initial energy levels includes a calibration energy level, wherein the device is further configured to:
    - check if a percentage of a total of computed energy levels above the calibration energy level is a below an initial value of a too-soft threshold, wherein the too soft threshold is one of the set of thresholds; and
      
      detect a too-soft error voice production when the percentage is below the initial value of the too-soft threshold.
  - 22. The device of claim 20, wherein the device is further configured to:
    - check if a percentage of a total of computed energy levels above the calibration energy level is a above an initial value of a too-loud threshold, wherein the too-loud threshold is one of the set of the thresholds; and
      
      detect a too loud error voice production when the percentage is above initial value of the too-loud threshold.
  - 23. The device of claim 22, wherein the set of initial energy levels includes a normal speech energy level and a silence energy level, wherein the device is further configured to:
    - compute an energy difference between the normal energy level and the silence energy level;
      
      compare the energy difference to the too-loud threshold and the too soft threshold; and
      
      detect a syllable transition error when the computed difference is below the too-loud threshold and the too-soft threshold.
  - 24. The device of claim 20, wherein the set of initial energy levels includes a normal speech energy level, wherein the device is further configured to:
    - determine a maximum energy level out of the measured energy levels of the voice production;
      
      check if the maximum energy level is above an initial value of an intense peak threshold, wherein an initial value the intense peak threshold is set respective of the normal energy level, the intense peak threshold is one of the set of the thresholds; and
      
      detect an intense peak threshold error voice production when the maximum energy level is above the initial value of an intense peak threshold.
  - 25. The device of claim 20, wherein the set of initial energy levels includes a normal speech energy level and a silence energy level, wherein the device is further configured to:
    - compute an energy difference between each two consecutive energy levels;
      
      compare the energy difference to an initial value of an onset slope threshold, wherein an initial value the onset slope threshold is set respective of the silence energy level and a maximum energy level, the onset slope threshold is one of the set of the thresholds;
      
      detect an un-gradual slope gentle onset speech error, when the computed energy difference is above the initial value of the onset slope threshold.
  - 26. The device of claim 20, wherein the device is further configured to:
    - determine a maximum energy level out of the measured energy levels of the voice production;
      
      compare the energy difference to an initial value of an onset amplitude threshold, wherein the onset amplitude threshold is one of the set of the thresholds;
      
      detect a high amplitude gentle onset speech error, when the maximum energy level is above the initial value of the onset amplitude threshold.
  - 27. The device of claim 26, wherein the detected speech-related error further includes any one of:
    - a too long gentle onset and a concave gentle onset.
  - 28. The device of claim 20, wherein the detected speech-related error further includes any one of:
    - a too long gentle offset, an un-gradual slope gentle offset, a high amplitude gentle offset, and a concave gentle offset.
  - 29. The device of claim 20, wherein the set of initial energy levels includes a silence energy level, wherein the device is further configured to:
    - determine a maximum energy level out of the measured energy levels of the voice production;
      
      compute an energy difference between each two consecutive energy levels from the silence energy level to maximum energy level; and
      
      detect a volume control speech error when the energy difference is negative.
  - 30. The device of claim 20, wherein the set of initial energy levels includes a silence energy level, wherein the device is further configured to:
    - determine a maximum energy level out of the measured energy levels of the voice production;
      
      compute an energy difference between each two consecutive energy levels from the maximum energy level to the silence energy level; and
      
      detect a volume control speech error, when the energy difference is positive.
  - 31. The device of claim 20, wherein the device is further configured to:
    - check if a first number of the computed energy levels out of a total of computed energy levels is above an initial value of a soft peak threshold, wherein the soft peak threshold is one of the set of the thresholds; and
      
      detect a soft peak error voice production when the first number of the computed energy levels is above the soft peak threshold.
  - 32. The device of claim 20, wherein the device is further configured to:
    - measure a speech rate respective of the analysis; and
      
      detect a speech rate error when the measured speech rate is below initial an value of a rate threshold, wherein the rate threshold is one of the set of thresholds and set to indicate a normal speech rate.
  - 33. The device of claim 20, wherein the device is further configured to:
    - send the generated feedback to a computing device of the user for display.
  - 34. The device of claim 33, wherein generating the feedback further comprises:
    - coloring the voice production using at least a first color and a second color, wherein the first color represents a loud sound produced by the user and the second color represents a soft sound produced by the user.
  - 35. The device of claim 20, wherein the at least one exercise includes a sequence of voice productions.
  - 36. The device of claim 20, wherein the device is further configured to:
    - generate a report summarizing the execution of the voice production throughout the current therapy session; and
      
      save the report.
  - 37. The device of claim 20, wherein the fluency shaping exercise is being practiced during a speech disorder therapy, the speech disorder therapy is used for at least one of:
    - stuttering, cluttering, and diction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Novotalk , Ltd.
Original Assignee
Novotalk , Ltd.
Inventors
Rot, Moshe, Rothschild, Lilach, Lerner, Smadar
Primary Examiner(s)
HONG, THOMAS J

Application Number

US14/982,230
Publication Number

US 20160183868A1
Time in Patent Office

1,127 Days
Field of Search

434185
US Class Current
CPC Class Codes

A61B 5/0022   Monitoring a patient using ...

A61B 5/4803   Speech analysis specially a...

A61B 5/486   Bio-feedback A61B5/375 take...

A61B 5/7282   Event detection, e.g. detec...

A61B 5/742   using visual displays A61B5...

A61B 5/7465   Arrangements for interactiv...

G09B 19/04   Speaking with audible prese...

G09B 5/02   with visual presentation of...

G09B 7/00   Electrically-operated teach...

G10L 25/66   for extracting parameters r...

G16H 20/30   relating to physical therap...

G16H 20/40   relating to mechanical, rad...

G16H 40/63   for local operation

G16H 40/67   for remote operation

G16Z 99/00   Subject matter not provided...

H04L 65/1069   Session establishment or de...

H04L 67/10   in which an application is ...

Method and device for detecting speech patterns and errors when practicing fluency shaping techniques

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Method and device for detecting speech patterns and errors when practicing fluency shaping techniques

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links