Adaptive self-trained computer engines with associated databases and methods of use thereof
First Claim
1. A computer system, comprising:
- a processor operable;
to differentiate between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker,to convert the audio stream into text, andto generate time stamps in the audio stream to associate the text with the audio stream; and
a machine learning module implemented by one or more processors;
to access pre-learned phonemes,to identify the first speaker in the audio stream based on the pre-learned phonemes,to locate a portion of the text associated with the first speaker based on the time stamps,to segment the text associated with the first speaker into text phonemes,to correct the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript, andto execute a transaction based on the clean transcript.
3 Assignments
0 Petitions
Accused Products
Abstract
In some embodiments, the present invention provides for an exemplary computer system which includes at least the following components: an adaptive self-trained computer engine programmed, during a training stage, to electronically receive an initial speech audio data generated by a microphone of a computing device; dynamically segment the initial speech audio data and the corresponding initial text into a plurality of user phonemes; dynamically associate a plurality of first timestamps with the plurality of user-specific subject-specific phonemes; and, during a transcription stage, electronically receive to-be-transcribed speech audio data of at least one user; dynamically split the to-be transcribed speech audio data into a plurality of to-be-transcribed speech audio segments; dynamically assigning each timestamped to-be-transcribed speech audio segment to a particular core of the multi-core processor; and dynamically transcribing, in parallel, the plurality of timestamped to-be-transcribed speech audio segments based on the user-specific subject-specific speech training model.
-
Citations
20 Claims
-
1. A computer system, comprising:
-
a processor operable; to differentiate between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker, to convert the audio stream into text, and to generate time stamps in the audio stream to associate the text with the audio stream; and a machine learning module implemented by one or more processors; to access pre-learned phonemes, to identify the first speaker in the audio stream based on the pre-learned phonemes, to locate a portion of the text associated with the first speaker based on the time stamps, to segment the text associated with the first speaker into text phonemes, to correct the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript, and to execute a transaction based on the clean transcript. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
differentiating between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker; converting the audio stream into text; generating time stamps in the audio stream to associate the text with the audio stream; accessing pre-learned phonemes; identifying the first speaker in the audio stream based on the pre-learned phonemes; locating a portion of the text associated with the first speaker based on the time stamps; segmenting the text associated with the first speaker into text phonemes; correcting the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript; and executing a transaction based on the clean transcript. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium comprising instructions that, when executed by a multi-core processor, direct the multi-core processor to:
-
differentiate between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker; convert the audio stream into text; generate time stamps in the audio stream to associate the text with the audio stream; access pre-learned phonemes;
identify the first speaker in the audio stream based on the pre-learned phonemes;locate a portion of the text associated with the first speaker based on the time stamps; segment the text associated with the first speaker into text phonemes; correct the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript; and execute a transaction based on the clean transcript. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification