Systems and methods of interpreting speech data

US 10,510,344 B2
Filed: 12/07/2018
Issued: 12/17/2019
Est. Priority Date: 06/05/2014
Status: Active Grant

First Claim

Patent Images

1. An uncontrolled environment-based speech recognition system comprising:

one or more audio data filters to each generate a set of processed audio data based on raw audio data received from one or more computing devices, the one or more audio data filters comprises;

a first audio data filter to apply a first filter process to the raw audio data to generate a first processed audio data, anda second audio data filter to apply a second filter process to the raw audio data to generate a second processed audio data,the first audio data filter being different from the second audio data filter, the one or more audio data filters comprising at least one audio data filter appropriate for the uncontrolled environment;

a translator, operable by a processor, to provide;

a first set of translation results based on the first processed audio data for the raw audio data, each translation result of the first set of translation results comprising a first text data and a first confidence level associated with that first text data; and

a second set of translation results based on the second processed audio data for the raw audio data, each translation result of the second set of translation results comprising a second text data and a second confidence level associated with that second text data; and

in response to receiving the first and second sets of translation results, a decision controller is operable by the processor to identify at least one translation result to represent the raw audio data, the decision controller is operable to;

identify at least one translation result that includes the text data associated with the confidence level that exceeds a confidence threshold;

determine whether the identified at least one translation result comprises more than one translation result;

in response to determining the identified at least one translation result comprises more than one translation result, determine an occurrence frequency for each text data of the identified at least one translation result and select the text data based on the occurrence frequency, the occurrence frequency representing a number of times that the text data appears in the set of translation results; and

generate an output signal associated with the identification of the at least one translation result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method and systems are provided for interpreting speech data. A method and system for recognizing speech involving a filter module to generate a set of processed audio data based on raw audio data; a translation module to provide a set of translation results for the raw audio data; and a decision module to select the text data that represents the raw audio data. A method for minimizing noise in audio signals received by a microphone array is also described. A method and system of automatic entry of data into one or more data fields involving receiving a processed audio data; and operating a processing module to: search in a trigger dictionary for a field identifier that corresponds to the trigger identifier; identify a data field associated with a data field identifier corresponding to the field identifier; and providing content data associated with the trigger identifier to the identified data field.

Citations

18 Claims

1. An uncontrolled environment-based speech recognition system comprising:
- one or more audio data filters to each generate a set of processed audio data based on raw audio data received from one or more computing devices, the one or more audio data filters comprises;
  
  a first audio data filter to apply a first filter process to the raw audio data to generate a first processed audio data, anda second audio data filter to apply a second filter process to the raw audio data to generate a second processed audio data,the first audio data filter being different from the second audio data filter, the one or more audio data filters comprising at least one audio data filter appropriate for the uncontrolled environment;
  
  a translator, operable by a processor, to provide;
  
  a first set of translation results based on the first processed audio data for the raw audio data, each translation result of the first set of translation results comprising a first text data and a first confidence level associated with that first text data; and
  
  a second set of translation results based on the second processed audio data for the raw audio data, each translation result of the second set of translation results comprising a second text data and a second confidence level associated with that second text data; and
  
  in response to receiving the first and second sets of translation results, a decision controller is operable by the processor to identify at least one translation result to represent the raw audio data, the decision controller is operable to;
  
  identify at least one translation result that includes the text data associated with the confidence level that exceeds a confidence threshold;
  
  determine whether the identified at least one translation result comprises more than one translation result;
  
  in response to determining the identified at least one translation result comprises more than one translation result, determine an occurrence frequency for each text data of the identified at least one translation result and select the text data based on the occurrence frequency, the occurrence frequency representing a number of times that the text data appears in the set of translation results; and
  
  generate an output signal associated with the identification of the at least one translation result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 15, 16)
- - 2. The speech recognition system of claim 1, wherein:
    - the confidence threshold comprises a set of confidence thresholds, the set of confidence thresholds including a first confidence threshold and at least one subsequent confidence threshold that is lower than the first confidence threshold; and
      
      the decision controller is operable to;
      
      determine that none of the text data is associated with the respective confidence level that exceeds the first confidence threshold; and
      
      determine whether any text data is associated with the respective confidence level that exceeds the at least one subsequent confidence threshold.
  - 3. The speech recognition system of claim 2, wherein the decision controller is operable to:
    - determine that none of the text data is associated with the respective confidence level that exceeds the at least one subsequent confidence threshold; and
      
      indicate additional processing is required to translate the raw audio data.
  - 4. The speech recognition system of claim 3, wherein:
    - the at least one subsequent confidence threshold comprises a first subsequent confidence threshold and a second subsequent confidence threshold that is lower than the first subsequent confidence threshold; and
      
      the decision controller is operable to;
      
      determine that none of the text data is associated with a confidence level that exceeds the first subsequent confidence threshold;
      
      determine that at least one text data is associated with a confidence level that exceeds the second subsequent confidence threshold; and
      
      indicate additional processing on the at least one text data is required to translate the raw audio data.
  - 5. The speech recognition system of claim 1, wherein the decision controller is further operable to:
    - select the text data based on the confidence level and the text data associated with a highest occurrence frequency.
  - 6. The speech recognition system of claim 1, wherein each set of translation results comprises two or more text data and each text data is associated with a respective confidence level.
  - 7. The speech recognition system of claim 1, wherein the decision controller is further operable to:
    - determine a trigger identifier associated with the identified at least one translation translation result;
      
      search in the trigger dictionary for a field identifier that corresponds to the trigger identifier;
      
      identify, from one or more data fields of an electronic form, a data field associated with a data field identifier corresponding to the field identifier; and
      
      provide the text data of the identified at least one translation result to the identified data field.
  - 15. The speech recognition system of claim 1, wherein each of the first audio data filter and the second audio data filter comprises at least one of a blind source filter, a phase shift filter, a subtract spectrum filter, a comb filter, a low pass filter, a high pass filter, and a band pass filter.
  - 16. The speech recognition system of claim 2, wherein the at least one subsequent confidence threshold is within a range of 40% to 75%.

8. A computer-implemented method of operating an uncontrolled environment-based recognition system for recognizing speech, the method comprising:
- operating one or more audio data filters to each generate a set of processed audio data based on raw audio data received from one or more computing devices, the one or more audio data filters comprises;
  
  a first audio data filter to apply a first filter process to the raw audio data to generate a first processed audio data, anda second audio data filter to apply a second filter process to the raw audio data to generate a second processed audio data,the first audio data filter being different from the second audio data filter, the one or more audio data filters comprising at least one audio data filter appropriate for the uncontrolled environment;
  
  operating a translator to provide;
  
  a first set of translation results based on the first processed audio data for the raw audio data, each translation result of the first set of translation results comprising a first text data and a first confidence level associated with that first text data; and
  
  a second set of translation results based on the second processed audio data for the raw audio data, each translation result of the second set of translation results comprising a second text data and a second confidence level associated with that second text data; and
  
  in response to receiving the first and second sets of translation results, operating a decision controller to identify at least one translation result to represent the raw audio data, wherein the decision controller is operable to;
  
  identify at least one translation result that includes the text data associated with the confidence level that exceeds a confidence threshold;
  
  determine whether the identified at least one translation result comprises more than one translation result;
  
  in response to determining the identified at least one translation result comprises more than one translation result, determine an occurrence frequency for each text data of the identified at least one translation result and select the text data based on the occurrence frequency, the occurrence frequency representing a number of times that the text data appears in the set of translation results; and
  
  generate an output signal associated with the identification of the at least one translation result.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 17, 18)
- - 9. The computer-implemented method of claim 8 wherein:
    - the confidence threshold comprises a set of confidence thresholds, the set of confidence thresholds including a first confidence threshold and at least one subsequent confidence threshold that is lower than the first confidence threshold; and
      
      the method further comprises operating the decision controller to;
      
      determine that none of the text data is associated with the respective confidence level that exceeds the first confidence threshold; and
      
      determine whether any text data is associated with the respective confidence level that exceeds the at least one subsequent confidence threshold.
  - 10. The computer-implemented method of claim 9 comprises operating the decision controller to:
    - determine that none of the text data is associated with the respective confidence level that exceeds the at least one subsequent confidence threshold; and
      
      indicate additional processing is required to translate the raw audio data.
  - 11. The computer-implemented method of claim 9, wherein:
    - the at least one subsequent confidence threshold comprises a first subsequent confidence threshold and a second subsequent confidence threshold that is lower than the first subsequent confidence threshold; and
      
      the method further comprises operating the decision controller to;
      
      determine that none of the text data is associated with a confidence level that exceeds the first subsequent confidence threshold;
      
      determine that at least one text data is associated with a confidence level that exceeds the second subsequent confidence threshold; and
      
      indicate additional processing on the at least one text data is required to translate the raw audio data.
  - 12. The computer-implemented method of claim 8 comprises operating the decision controller to:
    - select the text data based on the confidence level and the text data associated with a highest occurrence frequency.
  - 13. The computer-implemented method of claim 8, wherein each set of translation results comprises two or more text data and each text data is associated with a respective confidence level.
  - 14. The computer-implemented method of claim 8 comprises operating the decision controller to:
    - determine a trigger identifier associated with the selected translation result;
      
      search in the trigger dictionary for a field identifier that corresponds to the trigger identifier;
      
      identify, from one or more data fields of an electronic form, a data field associated with a data field identifier corresponding to the field identifier; and
      
      provide the text data of the identified at least one translation result to the identified data field.
  - 17. The computer-implemented method of claim 8, wherein each of the first audio data filter and the second audio data filter comprises at least one of a blind source filter, a phase shift filter, a subtract spectrum filter, a comb filter, a low pass filter, a high pass filter, and a band pass filter.
  - 18. The computer-implemented method of claim 8, wherein the at least one subsequent confidence threshold is within a range of 40% to 75%.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interdev Technologies Inc. (Valsef Group)
Original Assignee
Interdev Technologies Inc. (Valsef Group)
Inventors
Rice, Janet M., Liang, Peng, Kuehn, Terence W.
Primary Examiner(s)
Sharma, Neeraj

Application Number

US16/212,772
Publication Number

US 20190214000A1
Time in Patent Office

375 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/20   Speech recognition techniqu...

G10L 15/26   Speech to text systems G10L...

G10L 19/26   Pre-filtering or post-filte...

G10L 21/02   Speech enhancement, e.g. no...

G10L 25/93   Discriminating between voic...

H04R 29/006   Microphone matching

Systems and methods of interpreting speech data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods of interpreting speech data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links