Fast out-of-vocabulary search in automatic speech recognition systems

US 10,290,301 B2
Filed: 01/09/2017
Issued: 05/14/2019
Est. Priority Date: 12/29/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, on a computer system, a text search query;

searching, on the computer system, a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files and one or more sub-words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising;

identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary, each of the one or more query words comprising one or more sub-words;

identifying segments of the speech recognition processed audio files, each of the segments comprising audio data, the segments being more likely than other portions of the audio file to include at least one of the identified query words, by searching the metadata for instances of the sub-words of the one or more query words to identify one or more anchor segments of the audio files and expanding the one or more anchor segments, each of the anchor segments including a start time and an end time within the audio file, the segments of the audio files comprising the anchor segments;

performing speech recognition on the audio data of the identified segments for instances of the one or more identified query words using a constrained grammar comprising the one or more identified query words not in the vocabulary; and

returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar.

128 Citations

20 Claims

1. A method comprising:
- receiving, on a computer system, a text search query;
  
  searching, on the computer system, a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files and one or more sub-words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising;
  
  identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary, each of the one or more query words comprising one or more sub-words;
  
  identifying segments of the speech recognition processed audio files, each of the segments comprising audio data, the segments being more likely than other portions of the audio file to include at least one of the identified query words, by searching the metadata for instances of the sub-words of the one or more query words to identify one or more anchor segments of the audio files and expanding the one or more anchor segments, each of the anchor segments including a start time and an end time within the audio file, the segments of the audio files comprising the anchor segments;
  
  performing speech recognition on the audio data of the identified segments for instances of the one or more identified query words using a constrained grammar comprising the one or more identified query words not in the vocabulary; and
  
  returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1,wherein the identifying segments of the speech recognition processed audio files comprises, for each identified query word:
    - generating a search list of the one or more sub-words of the identified query word;
      
      searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify the one or more anchor segments corresponding to the one or more sub-words;
      
      post-processing the one or more anchor segments, the post-processing comprising;
      
      sorting the one or more anchor segments by start time; and
      
      merging the one or more anchor segments that overlap in time; and
      
      returning the post-processed anchor segments as the identified segments.
  - 3. The method of claim 2, wherein the metadata further includes a confidence measure for each of the one or more words as detected within the audio file, andwherein the one or more anchor segments corresponding to the one or more sub-words have confidence measures below a threshold.
  - 4. The method of claim 2, wherein the expanding the one or more anchor segments comprises:
    - for each identified query word in the text search query;
      
      counting a first number of characters in the text search query before the query word and a second number of characters after the query word;
      
      multiplying the first number of characters by an average character duration of the audio file containing the anchor segment to obtain a first expansion amount; and
      
      multiplying the second number of characters by the average character duration to obtain a second expansion amount; and
      
      for each anchor segment, each anchor segment being identified by an anchor word, a start time, and an end time;
      
      subtracting the first expansion amount and a first constant expansion duration from the start time; and
      
      adding the second expansion amount and a second constant expansion duration to the end time.
  - 5. The method of claim 2, wherein the merging the one or more anchor segments that overlap in time comprises:
    - identifying a first anchor segment of a particular audio file, the first anchor segment having a first start time and a first end time;
      
      identifying a second anchor segment of the particular audio file, the second anchor segment having a second start time and a second end time, the second start time being after the first start time and before the first end time; and
      
      returning a merged anchor segment having a merged start time equal to the first start time and a merged end time equal to the second end time.
  - 6. The method of claim 2, wherein the metadata further comprises a phoneme transcription, andwherein the searching the metadata to identify one or more audio files containing at least one of the one or more sub-words comprises:
    - converting the identified query word to phonemes; and
      
      searching the phoneme transcription for the phonemes of the identified query word.
  - 7. The method of claim 1, wherein the constrained grammar comprises the one or more identified query words of the text search query.
  - 8. The method of claim 1, wherein the performing speech recognition on the identified segments comprises computing one or more event confidence levels, each of the event confidence levels corresponding to a confidence that a segment of the identified segments contains a particular one of the identified query words.
  - 9. The method of claim 1, further comprising:
    - applying a utility function to each of the one or more identified segments to compute one or more corresponding segment utility values; and
      
      sorting the one or more segments in accordance with the one or more segment utility values.
  - 10. The method of claim 9, wherein the searching the one or more identified segments only searches the one or more identified segments having best segment utility values of the one or more segment utility values.

11. A system comprising:
- a processor; and
  
  memory, the memory having instructions that, when executed by the processor, cause the processor to;
  
  receive a text search query;
  
  search a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files and one or more sub-words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising;
  
  identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary, each of the one or more query words comprising one or more sub-words;
  
  identifying segments of the speech recognition processed audio files, each of the segments comprising audio data, the segments being more likely than other portions of the audio file to include at least one of the identified query words, by searching the metadata for instances of the sub-words of the one or more query words to identify one or more anchor segments of the audio files and expanding the one or more anchor segments, each of the anchor segments including a start time and an end time within the audio file, the segments of the audio files comprising the anchor segments;
  
  performing speech recognition on the audio data of the identified segments for instances of the identified query words using a constrained grammar comprising the one or more identified query words not in the vocabulary; and
  
  returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11,wherein the identifying segments of the speech recognition processed audio files comprises, for each identified query word:
    - generating a search list of the one or more sub-words of the identified query word;
      
      searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify the one or more anchor segments corresponding to the one or more sub-words;
      
      post-processing the one or more anchor segments, the post-processing comprising;
      
      sorting the one or more anchor segments by start time; and
      
      merging the one or more anchor segments that overlap in time; and
      
      returning the post-processed anchor segments as the identified segments.
  - 13. The system of claim 12, wherein the metadata further includes a confidence measure for the word as detected within the audio file, andwherein the one or more anchor segments corresponding to the one or more sub-words have confidence measures below a threshold.
  - 14. The system of claim 12, wherein the instructions for expanding the one or more anchor segments comprise instructions that, when executed by the processor, cause the processor to:
    - for each identified query word in the text search query;
      
      count a first number of characters in the text search query before the query word and a second number of characters after the query word;
      
      multiply the first number of characters by an average character duration of the audio file containing the anchor segment to obtain a first expansion amount; and
      
      multiply the second number of characters by the average character duration to obtain a second expansion amount; and
      
      for each anchor segment, each anchor segment being identified by an anchor word, a start time, and an end time;
      
      subtract the first expansion amount and a first constant expansion duration from the start time; and
      
      add the second expansion amount and a second constant expansion duration to the end time.
  - 15. The system of claim 12, wherein the instructions for merging the one or more anchor segments that overlap in time comprise instructions that, when executed by the processor, cause the processor to:
    - identify a first anchor segment of a particular audio file, the first anchor segment having a first start time and a first end time;
      
      identify a second anchor segment of the particular audio file, the second anchor segment having a second start time and a second end time, the second start time being after the first start time and before the first end time; and
      
      return a merged anchor segment having a merged start time equal to the first start time and a merged end time equal to the second end time.
  - 16. The system of claim 12, wherein the metadata further comprises a phoneme transcription, andwherein the instructions for searching the metadata to identify one or more audio files containing at least one of the one or more sub-words comprise instructions that, when executed by the processor, cause the processor to:
    - convert the identified query word to phonemes; and
      
      search the phoneme transcription for the phonemes of the identified query word.
  - 17. The system of claim 11, wherein the constrained grammar comprises the one or more identified query words of the text search query.
  - 18. The system of claim 11, wherein the instructions for performing speech recognition on the identified segments comprise instructions that, when executed by the processor, cause the processor to compute one or more event confidence levels, each of the event confidence levels corresponding to a confidence that a segment of the identified segments contains a particular one of the identified query words.
  - 19. The system of claim 11, wherein the instructions further comprise instructions that, when executed by the processor, cause the processor to:
    - apply a utility function to each of the one or more identified segments to compute one or more corresponding segment utility values; and
      
      sort the one or more identified segments in accordance with the one or more segment utility values.
  - 20. The system of claim 19, wherein the instructions for searching the one or more identified segments only search the one or more identified segments having best segment utility values of the one or more segment utility values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Telecommunications Laboratories Incorporated (Genesys Cloud Services Incorporated)
Original Assignee
Genesys Telecommunications Laboratories Incorporated (Genesys Cloud Services Incorporated)
Inventors
Lev-Tov, Amir, Faizakof, Avraham, Konig, Yochai
Primary Examiner(s)
Le, Thuykhanh

Application Number

US15/402,070
Publication Number

US 20170186422A1
Time in Patent Office

855 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/242   Query formulation

G06F 16/638   Presentation of query results

G06F 16/685   using automatically derived...

G10L 15/02   Feature extraction for spee...

G10L 15/08   Speech classification or se...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

Fast out-of-vocabulary search in automatic speech recognition systems

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

128 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Fast out-of-vocabulary search in automatic speech recognition systems

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

128 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links