Dynamic adaptation of language models and semantic tracking for automatic speech recognition

US 9,858,923 B2
Filed: 09/24/2015
Issued: 01/02/2018
Est. Priority Date: 09/24/2015
Status: Active Grant

First Claim

Patent Images

1. A system for recognizing phrases of speech from a conversation, said system comprising:

an information gathering circuit to collect textual information associated with a user using general knowledge sources in combination with user textual information sources including electronic documents, emails, text messages and social media communications;

a text clustering circuit to analyze said collected textual information and organize that information into clusters;

a knowledge domain generation circuit to generate domains of knowledge based upon each cluster and to map those domains to a plurality of personalized language models (PLM) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs;

a first ASR circuit to initially transcribe speech, of a user of said system, to a first estimated text sequence, based on a generalized language model;

a language model matching circuit to analyze said first estimated text sequence to determine a context and to select a PLM, from said plurality of PLMs, based on said context;

a second ASR circuit to re-transcribe said speech based on said selected PLM to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words; and

a semantic analysis circuit to select one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, wherein the semantic analysis circuit includes;

a semantic distance calculation circuit to estimate a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation, anda condition random field (CRF) classifier circuit torank each of said paths of estimated text sequences based on contextual relationships between said words in said paths, andwherein the semantic analysis circuit selects one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said CRF ranking.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Generally, this disclosure provides systems, devices, methods and computer readable media for adaptation of language models and semantic tracking to improve automatic speech recognition (ASR). A system for recognizing phrases of speech from a conversation may include an ASR circuit configured to transcribe a user'"'"'s speech to a first estimated text sequence, based on a generalized language model. The system may also include a language model matching circuit configured to analyze the first estimated text sequence to determine a context and to select a personalized language model (PLM), from a plurality of PLMs, based on that context. The ASR circuit may further be configured to re-transcribe the speech based on the selected PLM to generate a lattice of paths of estimated text sequences, wherein each of the paths of estimated text sequences comprise one or more words and an acoustic score associated with each of the words.

67 Citations

View as Search Results

15 Claims

1. A system for recognizing phrases of speech from a conversation, said system comprising:
- an information gathering circuit to collect textual information associated with a user using general knowledge sources in combination with user textual information sources including electronic documents, emails, text messages and social media communications;
  
  a text clustering circuit to analyze said collected textual information and organize that information into clusters;
  
  a knowledge domain generation circuit to generate domains of knowledge based upon each cluster and to map those domains to a plurality of personalized language models (PLM) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs;
  
  a first ASR circuit to initially transcribe speech, of a user of said system, to a first estimated text sequence, based on a generalized language model;
  
  a language model matching circuit to analyze said first estimated text sequence to determine a context and to select a PLM, from said plurality of PLMs, based on said context;
  
  a second ASR circuit to re-transcribe said speech based on said selected PLM to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words; and
  
  a semantic analysis circuit to select one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, wherein the semantic analysis circuit includes;
  
  a semantic distance calculation circuit to estimate a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation, anda condition random field (CRF) classifier circuit torank each of said paths of estimated text sequences based on contextual relationships between said words in said paths, andwherein the semantic analysis circuit selects one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said CRF ranking.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, further comprising a lattice pruning circuit to remove a subset of said paths of estimated text sequences based on a comparison of said acoustic scores to a threshold value.
  - 3. The system of claim 1,wherein the information gathering circuit, the text clustering circuit and the knowledge domain circuit are included in a PLM generation circuit, andwherein the PLM generation circuit is configured to:
    - analyze the textual information sources associated with said user;
      
      organize said textual information from said sources into clusters based on a measurement of content similarity between said sources, to generate domains of knowledge based on said clusters; and
      
      map said domains into said plurality of PLMs.
  - 4. The system of claim 3, wherein said PLM generation circuit operates in an offline mode prior to execution of said ASR circuit.
  - 5. The system of claim 1, further comprising an insight extraction circuit to analyze said current and previously recognized phrases of speech from said conversation and to generate a summary of said conversation, extract keywords from said conversation, perform a translation of said conversation or extract action requests taken by said user during said conversation.

6. A method for recognizing phrases of speech from a conversation, said method comprising:
- collecting, by an information gathering circuit, textual information associated with a participant using general knowledge sources in combination with participant textual information sources including electronic documents, emails, text messages and social media communications;
  
  analyzing collected textual information by a text clustering circuit;
  
  organizing said textual information into clusters by said text clustering circuit;
  
  generating, by a knowledge domain generation circuit, domains of knowledge based upon said clusters;
  
  mapping, by said knowledge domain generating circuit, said knowledge domains to a plurality of personalized language models (PLMs) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs;
  
  transcribing speech, of a participant in said conversation, to a first estimated text sequence, by a first ASR circuit, said transcription based on a generalized language model;
  
  analyzing said first estimated text sequence to determine a context;
  
  selecting a PLM, from said plurality of PLMs, based on said context;
  
  re-transcribing said speech, by said ASR circuit, based on said selected PLM, to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words;
  
  estimating a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation;
  
  ranking each of said paths of estimated text sequences based on determining contextual relationships between said words in said paths; and
  
  selecting one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said ranking.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method of claim 6, further comprising removing a subset of said paths of estimated text sequences based on a comparison of said acoustic scores to a threshold value.
  - 8. The method of claim 6, wherein said plurality of PLMs are generated by:
    - analyzing the textual information sources associated with said participant;
      
      organizing said textual information from said sources into clusters based on a measurement of content similarity between said sources;
      
      generating domains of knowledge based on said clusters; and
      
      mapping said domains into said plurality of PLMs.
  - 9. The method of claim 8, wherein said PLM generation is performed in an offline mode prior to execution of said ASR circuit.
  - 10. The method of claim 6, further comprising analyzing said current and previously recognized phrases of speech from said conversation to generate a summary of said conversation, extract keywords from said conversation, perform a translation of said conversation or extract action requests taken by said participant during said conversation.

11. At least one non-transitory computer-readable storage medium having instructions stored thereon which when executed by a processor result in the following operations for recognizing phrases of speech, said operations comprising:
- collecting, by an information gathering circuit, textual information associated with a participant using general knowledge sources in combination with participant textual information sources including electronic documents, emails, text messages and social media communications;
  
  analyzing said collected textual information by a text clustering circuit;
  
  organizing, by said text clustering circuit, said textual information into clusters;
  
  generating, by a knowledge domain generation circuit, domains of knowledge based upon said clusters;
  
  mapping, by said knowledge domain generation circuit, said knowledge domains to a plurality of personalized language models (PLMs) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs;
  
  transcribing speech, of a participant in said conversation, to a first estimated text sequence, by a first ASR circuit, said transcription based on a generalized language model;
  
  analyzing said first estimated text sequence to determine a context;
  
  selecting a PLM, from said plurality of PLMs, based on said context; and
  
  re-transcribing said speech, by a second ASR circuit, based on said selected PLM, to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words andestimating a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation;
  
  ranking each of said paths of estimated text sequences based on determining contextual relationships between said words in said paths; and
  
  selecting one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said ranking.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer-readable storage medium of claim 11, further comprising removing a subset of said paths of estimated text sequences based on a comparison of said acoustic scores to a threshold value.
  - 13. The computer-readable storage medium of claim 11, wherein said plurality of PLMs are generated by:
    - analyzing the textual information sources associated with said participant;
      
      organizing said textual information from said sources into clusters based on a measurement of content similarity between said sources;
      
      generating domains of knowledge based on said clusters; and
      
      mapping said domains into said plurality of PLMs.
  - 14. The computer-readable storage medium of claim 13, wherein said PLM generation is performed in an offline mode prior to execution of said ASR circuit.
  - 15. The computer-readable storage medium of claim 11, further comprising analyzing said current and previously recognized phrases of speech from said conversation to generate a summary of said conversation, extract keywords from said conversation, perform a translation of said conversation or extract action requests taken by said participant during said conversation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Wasserblat, Moshe, Pereg, Oren, Assayag, Michel, Sivak, Alexander, Taite, Shahar, Rider, Tomer
Primary Examiner(s)
Sirjani, Fariba

Application Number

US14/864,456
Publication Number

US 20170092266A1
Time in Patent Office

831 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/10   using distance or distortio...

G10L 15/14   using statistical models, e...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/32   Multiple recognisers used i...

G10L 2015/0631   Creating reference template...

G10L 2015/085   Methods for reducing search...

Dynamic adaptation of language models and semantic tracking for automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

67 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamic adaptation of language models and semantic tracking for automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links