SYSTEM AND METHOD FOR AUTOMATIC SPEECH TO TEXT CONVERSION
First Claim
Patent Images
1. A speech recognition engine comprising:
- an acoustical analyzer for receiving and digitizing a speech encoding signal;
an event extractor for extracting events from said speech signal, wherein said events or patterns of events which are highly relevant in speech recognition; and
a speech recognition module coupled to said event extractor, wherein said speech recognition module uses said events to initiate at least one action in response to detected content.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech recognition is performed in near-real-time and improved by exploiting events and event sequences, employing machine learning techniques including boosted classifiers, ensembles, detectors and cascades and using perceptual clusters. Speech recognition is also improved using tandem processing. An automatic punctuator injects punctuation into recognized text streams.
156 Citations
20 Claims
-
1. A speech recognition engine comprising:
-
an acoustical analyzer for receiving and digitizing a speech encoding signal; an event extractor for extracting events from said speech signal, wherein said events or patterns of events which are highly relevant in speech recognition; and a speech recognition module coupled to said event extractor, wherein said speech recognition module uses said events to initiate at least one action in response to detected content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of speech recognition comprising:
-
establishing a training of weak classifiers based on training examples; constructing an ensemble of detectors; receiving a speech signal; digitizing said speech signal; processing said speech signal using said ensemble of weak detectors, thereby recognizing the presence of at least one event, wherein said event comprises a pattern in said speech signal which is highly relevant in speech recognition; and processing said events to recognize speech. - View Dependent Claims (16, 17, 18)
-
-
19. A method for operating two or more speech recognition systems in tandem, wherein said two or more speech recognition systems perform detection and analysis of a speech signal at overlapping time intervals, said method comprising:
-
configuring the time intervals to be used in each speech recognition engine, wherein said intervals are reconfigurable; configuring the overlap on the intervals, wherein the overlap is reconfigurable, and wherein the overlap is set to reflect the most information-rich portions of said speech signal;
routing said detection and analysis between said speech recognition engines;weighting the results of said speech recognition engines, wherein a higher weight is given to results taken from the middle of the interval, and generating at least two opinions as to the identity of words within a single time interval; and determining which opinion from the at least two opinions better estimates a textual representation of said speech signal.
-
-
20. A speech recognition engine comprising:
-
an acoustic analyzer for receiving and digitizing a speech signal in the form of a digital utterance; a speech recognition module coupled to said acoustic analyzer, wherein said speech recognition module converts said digital utterance into at least one text stream; an automatic punctuation engine coupled with a database containing training data, wherein said automatic punctuation engine includes at least one statistical processor for adding punctuation to said text stream using said training data in the form of statistical-based punctuated text; a rule-based punctuator coupled with a lexical rule database, wherein said rule-based punctuator adds punctuation to said text stream using rules from said lexical rule database in the form of rule-based punctuated text; and a decision module for determining whether said punctuated text or said statistical-based punctuated text produces a better punctuated result.
-
Specification