DYNAMIC ADAPTATION OF LANGUAGE MODELS AND SEMANTIC TRACKING FOR AUTOMATIC SPEECH RECOGNITION

US 20170092266A1
Filed: 09/24/2015
Published: 03/30/2017
Est. Priority Date: 09/24/2015
Status: Active Grant

First Claim

Patent Images

1. A system for recognizing phrases of speech from a conversation, said system comprising:

an automatic speech recognition (ASR) circuit to transcribe speech, of a user of said system, to a first estimated text sequence, based on a generalized language model;

a language model matching circuit to analyze said first estimated text sequence to determine a context and to select a personalized language model (PLM), from a plurality of PLMs, based on said context; and

said ASR circuit further to re-transcribe said speech based on said selected PLM to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Generally, this disclosure provides systems, devices, methods and computer readable media for adaptation of language models and semantic tracking to improve automatic speech recognition (ASR). A system for recognizing phrases of speech from a conversation may include an ASR circuit configured to transcribe a user'"'"'s speech to a first estimated text sequence, based on a generalized language model. The system may also include a language model matching circuit configured to analyze the first estimated text sequence to determine a context and to select a personalized language model (PLM), from a plurality of PLMs, based on that context. The ASR circuit may further be configured to re-transcribe the speech based on the selected PLM to generate a lattice of paths of estimated text sequences, wherein each of the paths of estimated text sequences comprise one or more words and an acoustic score associated with each of the words.

Citations

27 Claims

1. A system for recognizing phrases of speech from a conversation, said system comprising:
- an automatic speech recognition (ASR) circuit to transcribe speech, of a user of said system, to a first estimated text sequence, based on a generalized language model;
  
  a language model matching circuit to analyze said first estimated text sequence to determine a context and to select a personalized language model (PLM), from a plurality of PLMs, based on said context; and
  
  said ASR circuit further to re-transcribe said speech based on said selected PLM to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, further comprising a lattice pruning circuit to remove a subset of said paths of estimated text sequences based on a comparison of said acoustic scores to a threshold value.
  - 3. The system of claim 1, further comprising a semantic distance calculation circuit to estimate a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation.
  - 4. The system of claim 3, further comprising a conditional random field (CRF) classifier circuit to rank each of said paths of estimated text sequences based on contextual relationships between said words in said paths.
  - 5. The system of claim 4, further comprising a semantic analysis circuit to select one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and said CRF ranking.
  - 6. The system of claim 1, further comprising a PLM generation circuit to analyze textual information sources associated with said user;
    - to organize said textual information from said sources into clusters based on a measurement of content similarity between said sources, to generate domains of knowledge based on said clusters; and
      
      to map said domains into said plurality of PLMs.
  - 7. The system of claim 6, wherein said textual information sources comprise electronic documents, emails, text messages or social media communications.
  - 8. The system of claim 6, wherein said PLM generation circuit operates in an offline mode prior to execution of said ASR circuit.
  - 9. The system of claim 5, further comprising an insight extraction circuit to analyze said current and previously recognized phrases of speech from said conversation and to generate a summary of said conversation, extract keywords from said conversation, perform a translation of said conversation or extract action requests taken by said user during said conversation.

10. A method for recognizing phrases of speech from a conversation, said method comprising:
- transcribing speech, of a participant in said conversation, to a first estimated text sequence, by an automatic speech recognition (ASR) circuit, said transcription based on a generalized language model;
  
  analyzing said first estimated text sequence to determine a context;
  
  selecting a personalized language model (PLM), from a plurality of PLMs, based on said context; and
  
  re-transcribing said speech, by said ASR circuit, based on said selected PLM, to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, further comprising removing a subset of said paths of estimated text sequences based on a comparison of said acoustic scores to a threshold value.
  - 12. The method of claim 10, further comprising estimating a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation.
  - 13. The method of claim 12, further comprising determining contextual relationships between said words in said paths and ranking each of said paths of estimated text sequences based on said contextual relationships.
  - 14. The method of claim 13, further comprising selecting one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and said ranking.
  - 15. The method of claim 10, further comprising generating said PLMs by:
    - analyzing textual information sources associated with said participant;
      
      organizing said textual information from said sources into clusters based on a measurement of content similarity between said sources;
      
      generating domains of knowledge based on said clusters; and
      
      mapping said domains into said plurality of PLMs.
  - 16. The method of claim 15, wherein said textual information sources comprise electronic documents, emails, text messages or social media communications.
  - 17. The method of claim 15, wherein said PLM generation is performed in an offline mode prior to execution of said ASR circuit.
  - 18. The method of claim 14, further comprising analyzing said current and previously recognized phrases of speech from said conversation to generate a summary of said conversation, extract keywords from said conversation, perform a translation of said conversation or extract action requests taken by said participant during said conversation.

19. At least one computer-readable storage medium having instructions stored thereon which when executed by a processor result in the following operations for recognizing phrases of speech from a conversation, said operations comprising:
- transcribing speech, of a participant in said conversation, to a first estimated text sequence, by an automatic speech recognition (ASR) circuit, said transcription based on a generalized language model;
  
  analyzing said first estimated text sequence to determine a context;
  
  selecting a personalized language model (PLM), from a plurality of PLMs, based on said context; and
  
  re-transcribing said speech, by said ASR circuit, based on said selected PLM, to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The computer-readable storage medium of claim 19, further comprising removing a subset of said paths of estimated text sequences based on a comparison of said acoustic scores to a threshold value.
  - 21. The computer-readable storage medium of claim 19, further comprising estimating a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation.
  - 22. The computer-readable storage medium of claim 21, further comprising determining contextual relationships between said words in said paths and ranking each of said paths of estimated text sequences based on said contextual relationships.
  - 23. The computer-readable storage medium of claim 22, further comprising selecting one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and said ranking.
  - 24. The computer-readable storage medium of claim 19, further comprising generating said PLMs by:
    - analyzing textual information sources associated with said participant;
      
      organizing said textual information from said sources into clusters based on a measurement of content similarity between said sources;
      
      generating domains of knowledge based on said clusters; and
      
      mapping said domains into said plurality of PLMs.
  - 25. The computer-readable storage medium of claim 24, wherein said textual information sources comprise electronic documents, emails, text messages or social media communications.
  - 26. The computer-readable storage medium of claim 24, wherein said PLM generation is performed in an offline mode prior to execution of said ASR circuit.
  - 27. The computer-readable storage medium of claim 23, further comprising analyzing said current and previously recognized phrases of speech from said conversation to generate a summary of said conversation, extract keywords from said conversation, perform a translation of said conversation or extract action requests taken by said participant during said conversation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
WASSERBLAT, MOSHE, PEREG, OREN, ASSAYAG, MICHEL, SIVAK, ALEXANDER, TAITE, SHAHAR, RIDER, TOMER

Granted Patent

US 9,858,923 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/063   Training

G10L 15/10   using distance or distortio...

G10L 15/14   using statistical models, e...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/32   Multiple recognisers used i...

G10L 2015/0631   Creating reference template...

G10L 2015/085   Methods for reducing search...

DYNAMIC ADAPTATION OF LANGUAGE MODELS AND SEMANTIC TRACKING FOR AUTOMATIC SPEECH RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

DYNAMIC ADAPTATION OF LANGUAGE MODELS AND SEMANTIC TRACKING FOR AUTOMATIC SPEECH RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links