Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
First Claim
1. A system for providing real-time transcripts of spoken text, the system comprising:
- a speech to text engine for converting an input speech of an end user into a text input, wherein the text input comprises one or more sequences of recognized word strings or a word lattice in text form; and
a semantic engine to receive the text input for producing one or more transcripts using a language model and extracting semantic meanings for said one or more transcripts;
wherein the semantic engine utilizes a grammar model and the language model to extract a meaning for said one or more transcripts.
1 Assignment
0 Petitions
Accused Products
Abstract
A real-time dialogue system that provides real-time transcription of the spoken text, with a sub-second delay by keeping track of word timings and word accuracy is provided. The system uses a grammar or a list of keywords to produce the transcripts by using a statistical language model. In addition, the system uses a deep neural network based I-vector system to constantly analyze the audio quality to assess and to identify additional metadata such as gender, language, accent, age, emotion and identity of an end user to enhance the response. The present invention provides a conversational dialogue system, to robustly identify certain specific user commands or intents, while otherwise allowing for a natural conversation, without switching between grammar based and natural language modes.
-
Citations
19 Claims
-
1. A system for providing real-time transcripts of spoken text, the system comprising:
-
a speech to text engine for converting an input speech of an end user into a text input, wherein the text input comprises one or more sequences of recognized word strings or a word lattice in text form; and a semantic engine to receive the text input for producing one or more transcripts using a language model and extracting semantic meanings for said one or more transcripts;
wherein the semantic engine utilizes a grammar model and the language model to extract a meaning for said one or more transcripts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for providing real-time transcripts of spoken text, the method comprising:
-
converting, by a speech to text engine, an input speech of an end user into a text input, the text input comprises one or more sequence of recognized word strings and confusions in text form; and receiving, by a semantic engine, the text input for producing one or more transcripts using a language model and extracting semantic meanings for said one or more transcript; wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification