FAST OUT-OF-VOCABULARY SEARCH IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

US 20170186422A1
Filed: 01/09/2017
Published: 06/29/2017
Est. Priority Date: 12/29/2012
Status: Active Grant

First Claim

Patent Images

1-19. -19. (canceled)

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar.

14 Citations

View as Search Results

39 Claims

1-19. -19. (canceled)

20. A method comprising:
- receiving, on a computer system, a text search query;
  
  searching, on the computer system, a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising;
  
  identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary;
  
  identifying segments of the speech recognition processed audio files, the segments being more likely than other portions of the audio file to include at least one of the identified query words;
  
  performing speech recognition on the identified segments for instances of the one or more identified query words using a constrained grammar; and
  
  returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 21. The method of claim 20, wherein the metadata includes representation of one or more sub-words detected in the audio files, andwherein the identifying segments of the speech recognition processed audio files comprises, for each identified query word:
    - generating a search list of one or more sub-words of the identified query word;
      
      searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify one or more anchor segments corresponding to the one or more sub-words, each of the anchor segments including a start time and an end time within the audio file;
      
      post-processing the one or more anchor segments, the post-processing comprising;
      
      expanding the one or more anchor segments;
      
      sorting the one or more anchor segments by start time; and
      
      merging the one or more anchor segments that overlap in time; and
      
      returning the post-processed anchor segments as the identified segments.
  - 22. The method of claim 21, wherein the metadata further includes a confidence measure for each of the one or more words as detected within the audio file, andwherein the one or more anchor segments corresponding to the one or more sub-words have confidence measures below a threshold.
  - 23. The method of claim 21, wherein the expanding the one or more anchor segments comprises:
    - for each identified query word in the text search query;
      
      counting a first number of characters in the text search query before the query word and a second number of characters after the query word;
      
      multiplying the first number of characters by an average character duration of the audio file containing the anchor segment to obtain a first expansion amount; and
      
      multiplying the second number of characters by the average character duration to obtain a second expansion amount; and
      
      for each anchor segment, each anchor segment being identified by an anchor word, a start time, and an end time;
      
      subtracting the first expansion amount and a first constant expansion duration from the start time; and
      
      adding the second expansion amount and a second constant expansion duration to the end time.
  - 24. The method of claim 21, wherein the merging the one or more anchor segments that overlap in time comprises:
    - identifying a first anchor segment of a particular audio file, the first anchor segment having a first start time and a first end time;
      
      identifying a second anchor segment of the particular audio file, the second anchor segment having a second start time and a second end time, the second start time being after the first start time and before the first end time; and
      
      returning a merged anchor segment having a merged start time equal to the first start time and a merged end time equal to the second end time.
  - 25. The method of claim 21, wherein the metadata further comprises a phoneme transcription, andwherein the searching the metadata to identify one or more audio files containing at least one of the one or more sub-words comprises:
    - converting the identified query word to phonemes; and
      
      searching the phoneme transcription for the phonemes of the identified query word.
  - 26. The method of claim 20, wherein the constrained grammar comprises the one or more identified query words of the text search query.
  - 27. The method of claim 20, wherein the performing speech recognition on the identified segments comprises computing one or more event confidence levels, each of the event confidence levels corresponding to a confidence that a segment of the identified segments contains a particular one of the identified query words.
  - 28. The method of claim 20, further comprising:
    - applying a utility function to each of the one or more identified segments to compute one or more corresponding segment utility values; and
      
      sorting the one or more segments in accordance with the one or more segment utility values.
  - 29. The method of claim 28, wherein the searching the one or more identified segments only searches the one or more identified segments having best segment utility values of the one or more segment utility values.

30. A system comprising:
- a processor; and
  
  memory, the memory having instructions that, when executed by the processor, cause the processor to;
  
  receive a text search query;
  
  search a plurality of speech recognition processed audio files for instances of words of the text search query, the speech recognition processed audio files being associated with metadata including representations of one or more words detected in the audio files, the metadata being generated by a speech recognition engine in accordance with a vocabulary, the searching comprising;
  
  identifying one or more query words from the text search query, the one or more identified query words not being in the vocabulary;
  
  identifying segments of the speech recognition processed audio files, the segments being more likely than other portions of the audio file to include at least one of the identified query words;
  
  performing speech recognition on the identified segments for instances of the identified query words using a constrained grammar; and
  
  returning one or more search results comprising one or more audio files corresponding to segments containing instances of at least one of the identified query words.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 31. The system of claim 30, wherein the metadata includes representation of one or more sub-words detected in the audio files, andwherein the identifying segments of the speech recognition processed audio files comprises, for each identified query word:
    - generating a search list of one or more sub-words of the identified query word;
      
      searching the metadata to identify one or more audio files containing at least one of the one or more sub-words to identify one or more anchor segments corresponding to the one or more sub-words, each of the anchor segments including a start time and an end time within the audio file;
      
      post-processing the one or more anchor segments, the post-processing comprising;
      
      expanding the one or more anchor segments;
      
      sorting the one or more anchor segments by start time; and
      
      merging the one or more anchor segments that overlap in time; and
      
      returning the post-processed anchor segments as the identified segments.
  - 32. The system of claim 31, wherein the metadata further includes a confidence measure for the word as detected within the audio file, andwherein the one or more anchor segments corresponding to the one or more sub-words have confidence measures below a threshold.
  - 33. The system of claim 31, wherein the instructions for expanding the one or more anchor segments comprise instructions that, when executed by the processor, cause the processor to:
    - for each identified query word in the text search query;
      
      count a first number of characters in the text search query before the query word and a second number of characters after the query word;
      
      multiply the first number of characters by an average character duration of the audio file containing the anchor segment to obtain a first expansion amount; and
      
      multiply the second number of characters by the average character duration to obtain a second expansion amount; and
      
      for each anchor segment, each anchor segment being identified by an anchor word, a start time, and an end time;
      
      subtract the first expansion amount and a first constant expansion duration from the start time; and
      
      add the second expansion amount and a second constant expansion duration to the end time.
  - 34. The system of claim 31, wherein the instructions for merging the one or more anchor segments that overlap in time comprise instructions that, when executed by the processor, cause the processor to:
    - identify a first anchor segment of a particular audio file, the first anchor segment having a first start time and a first end time;
      
      identify a second anchor segment of the particular audio file, the second anchor segment having a second start time and a second end time, the second start time being after the first start time and before the first end time; and
      
      return a merged anchor segment having a merged start time equal to the first start time and a merged end time equal to the second end time.
  - 35. The system of claim 31, wherein the metadata further comprises a phoneme transcription, andwherein the instructions for searching the metadata to identify one or more audio files containing at least one of the one or more sub-words comprise instructions that, when executed by the processor, cause the processor to:
    - convert the identified query word to phonemes; and
      
      search the phoneme transcription for the phonemes of the identified query word.
  - 36. The system of claim 30, wherein the constrained grammar comprises the one or more identified query words of the text search query.
  - 37. The system of claim 30, wherein the instructions for performing speech recognition on the identified segments comprise instructions that, when executed by the processor, cause the processor to compute one or more event confidence levels, each of the event confidence levels corresponding to a confidence that a segment of the identified segments contains a particular one of the identified query words.
  - 38. The system of claim 30, wherein the instructions further comprise instructions that, when executed by the processor, cause the processor to:
    - apply a utility function to each of the one or more identified segments to compute one or more corresponding segment utility values; and
      
      sort the one or more identified segments in accordance with the one or more segment utility values.
  - 39. The system of claim 38, wherein the instructions for searching the one or more identified segments only search the one or more identified segments having best segment utility values of the one or more segment utility values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Telecommunications Laboratories Incorporated (Genesys Cloud Services Incorporated)
Original Assignee
Genesys Telecommunications Laboratories Incorporated (Genesys Cloud Services Incorporated)
Inventors
Faizakof, Avraham, Lev-Tov, Amir, Konig, Yochai

Granted Patent

US 10,290,301 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/242   Query formulation

G06F 16/638   Presentation of query results

G06F 16/685   using automatically derived...

G10L 15/02   Feature extraction for spee...

G10L 15/08   Speech classification or se...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

FAST OUT-OF-VOCABULARY SEARCH IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

FAST OUT-OF-VOCABULARY SEARCH IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links