Dynamic adaptation of language models and semantic tracking for automatic speech recognition
First Claim
1. A system for recognizing phrases of speech from a conversation, said system comprising:
- an information gathering circuit to collect textual information associated with a user using general knowledge sources in combination with user textual information sources including electronic documents, emails, text messages and social media communications;
a text clustering circuit to analyze said collected textual information and organize that information into clusters;
a knowledge domain generation circuit to generate domains of knowledge based upon each cluster and to map those domains to a plurality of personalized language models (PLM) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs;
a first ASR circuit to initially transcribe speech, of a user of said system, to a first estimated text sequence, based on a generalized language model;
a language model matching circuit to analyze said first estimated text sequence to determine a context and to select a PLM, from said plurality of PLMs, based on said context;
a second ASR circuit to re-transcribe said speech based on said selected PLM to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words; and
a semantic analysis circuit to select one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, wherein the semantic analysis circuit includes;
a semantic distance calculation circuit to estimate a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation, anda condition random field (CRF) classifier circuit torank each of said paths of estimated text sequences based on contextual relationships between said words in said paths, andwherein the semantic analysis circuit selects one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said CRF ranking.
1 Assignment
0 Petitions
Accused Products
Abstract
Generally, this disclosure provides systems, devices, methods and computer readable media for adaptation of language models and semantic tracking to improve automatic speech recognition (ASR). A system for recognizing phrases of speech from a conversation may include an ASR circuit configured to transcribe a user'"'"'s speech to a first estimated text sequence, based on a generalized language model. The system may also include a language model matching circuit configured to analyze the first estimated text sequence to determine a context and to select a personalized language model (PLM), from a plurality of PLMs, based on that context. The ASR circuit may further be configured to re-transcribe the speech based on the selected PLM to generate a lattice of paths of estimated text sequences, wherein each of the paths of estimated text sequences comprise one or more words and an acoustic score associated with each of the words.
67 Citations
15 Claims
-
1. A system for recognizing phrases of speech from a conversation, said system comprising:
-
an information gathering circuit to collect textual information associated with a user using general knowledge sources in combination with user textual information sources including electronic documents, emails, text messages and social media communications; a text clustering circuit to analyze said collected textual information and organize that information into clusters; a knowledge domain generation circuit to generate domains of knowledge based upon each cluster and to map those domains to a plurality of personalized language models (PLM) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs; a first ASR circuit to initially transcribe speech, of a user of said system, to a first estimated text sequence, based on a generalized language model; a language model matching circuit to analyze said first estimated text sequence to determine a context and to select a PLM, from said plurality of PLMs, based on said context; a second ASR circuit to re-transcribe said speech based on said selected PLM to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words; and a semantic analysis circuit to select one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, wherein the semantic analysis circuit includes; a semantic distance calculation circuit to estimate a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation, and a condition random field (CRF) classifier circuit torank each of said paths of estimated text sequences based on contextual relationships between said words in said paths, and wherein the semantic analysis circuit selects one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said CRF ranking. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for recognizing phrases of speech from a conversation, said method comprising:
-
collecting, by an information gathering circuit, textual information associated with a participant using general knowledge sources in combination with participant textual information sources including electronic documents, emails, text messages and social media communications; analyzing collected textual information by a text clustering circuit; organizing said textual information into clusters by said text clustering circuit; generating, by a knowledge domain generation circuit, domains of knowledge based upon said clusters; mapping, by said knowledge domain generating circuit, said knowledge domains to a plurality of personalized language models (PLMs) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs; transcribing speech, of a participant in said conversation, to a first estimated text sequence, by a first ASR circuit, said transcription based on a generalized language model; analyzing said first estimated text sequence to determine a context; selecting a PLM, from said plurality of PLMs, based on said context; re-transcribing said speech, by said ASR circuit, based on said selected PLM, to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words; estimating a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation; ranking each of said paths of estimated text sequences based on determining contextual relationships between said words in said paths; and selecting one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said ranking. - View Dependent Claims (7, 8, 9, 10)
-
-
11. At least one non-transitory computer-readable storage medium having instructions stored thereon which when executed by a processor result in the following operations for recognizing phrases of speech, said operations comprising:
-
collecting, by an information gathering circuit, textual information associated with a participant using general knowledge sources in combination with participant textual information sources including electronic documents, emails, text messages and social media communications; analyzing said collected textual information by a text clustering circuit; organizing, by said text clustering circuit, said textual information into clusters; generating, by a knowledge domain generation circuit, domains of knowledge based upon said clusters; mapping, by said knowledge domain generation circuit, said knowledge domains to a plurality of personalized language models (PLMs) in an offline mode prior to automatic speech recognition (ASR) operation to generate said plurality of PLMs; transcribing speech, of a participant in said conversation, to a first estimated text sequence, by a first ASR circuit, said transcription based on a generalized language model; analyzing said first estimated text sequence to determine a context; selecting a PLM, from said plurality of PLMs, based on said context; and
re-transcribing said speech, by a second ASR circuit, based on said selected PLM, to generate a lattice of paths of estimated text sequences, wherein each of said paths of estimated text sequences comprise one or more words and an acoustic score associated with each of said words andestimating a semantic distance between each of said paths of estimated text sequences to one or more previously recognized phrases of speech from said conversation; ranking each of said paths of estimated text sequences based on determining contextual relationships between said words in said paths; and selecting one of said paths of estimated text sequences, from said lattice, as a currently recognized phrase of speech from said conversation, based on said semantic distance and subsequent said ranking. - View Dependent Claims (12, 13, 14, 15)
-
Specification