Adaptive self-trained computer engines with associated databases and methods of use thereof

US 11,114,088 B2
Filed: 04/13/2018
Issued: 09/07/2021
Est. Priority Date: 04/03/2017
Status: Active Grant

First Claim

Patent Images

1. A computer system, comprising:

a processor operable;

to differentiate between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker,to convert the audio stream into text, andto generate time stamps in the audio stream to associate the text with the audio stream; and

a machine learning module implemented by one or more processors;

to access pre-learned phonemes,to identify the first speaker in the audio stream based on the pre-learned phonemes,to locate a portion of the text associated with the first speaker based on the time stamps,to segment the text associated with the first speaker into text phonemes,to correct the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript, andto execute a transaction based on the clean transcript.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some embodiments, the present invention provides for an exemplary computer system which includes at least the following components: an adaptive self-trained computer engine programmed, during a training stage, to electronically receive an initial speech audio data generated by a microphone of a computing device; dynamically segment the initial speech audio data and the corresponding initial text into a plurality of user phonemes; dynamically associate a plurality of first timestamps with the plurality of user-specific subject-specific phonemes; and, during a transcription stage, electronically receive to-be-transcribed speech audio data of at least one user; dynamically split the to-be transcribed speech audio data into a plurality of to-be-transcribed speech audio segments; dynamically assigning each timestamped to-be-transcribed speech audio segment to a particular core of the multi-core processor; and dynamically transcribing, in parallel, the plurality of timestamped to-be-transcribed speech audio segments based on the user-specific subject-specific speech training model.

Citations

20 Claims

1. A computer system, comprising:
- a processor operable;
  
  to differentiate between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker,to convert the audio stream into text, andto generate time stamps in the audio stream to associate the text with the audio stream; and
  
  a machine learning module implemented by one or more processors;
  
  to access pre-learned phonemes,to identify the first speaker in the audio stream based on the pre-learned phonemes,to locate a portion of the text associated with the first speaker based on the time stamps,to segment the text associated with the first speaker into text phonemes,to correct the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript, andto execute a transaction based on the clean transcript.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer system of claim 1, wherein:
    - the machine learning module, during a training stage, is further operable;
      
      to process speech of the first speaker,to learn phonemes of the first speaker,to build a profile of the first speaker in a database, andto store the phonemes of the first speaker as the pre-learned phonemes of the first speaker in the profile of the first speaker of the database.
  - 3. The computer system of claim 2, wherein:
    - the processor is a multi-core processor operable to segment the audio stream into a plurality of speech portions, to process the speech portions through different cores in the multi-core processor at about a same time.
  - 4. The computer system of claim 2, further comprising:
    - a user interface configured with text inputs,wherein the machine learning module is further operable to auto-fill the corrected text associated with the first speaker in at least one of the text inputs of the user interface as the first speaker speaks to the computing system.
  - 5. The computer system of claim 4, wherein:
    - the machine learning module is further operable to process a correction audibly made by the first speaker, and to auto-fill the audibly made correction in the at least one text input of the user interface.
  - 6. The computer system of claim 2, wherein:
    - at least a portion of the pre-learned phonemes comprises subject specific phonemes.
  - 7. The computer system of claim 2, wherein:
    - the machine learning module is further operable to remove duplicated words in the text associated with the first speaker.

8. A method, comprising:
- differentiating between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker;
  
  converting the audio stream into text;
  
  generating time stamps in the audio stream to associate the text with the audio stream;
  
  accessing pre-learned phonemes;
  
  identifying the first speaker in the audio stream based on the pre-learned phonemes;
  
  locating a portion of the text associated with the first speaker based on the time stamps;
  
  segmenting the text associated with the first speaker into text phonemes;
  
  correcting the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript; and
  
  executing a transaction based on the clean transcript.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, further comprising:
    - during a training stage, processing speech of the first speaker;
      
      learning phonemes of the first speaker;
      
      building a profile of the first speaker in a database; and
      
      storing the phonemes of the first speaker as the pre-learned phonemes of the first speaker in the profile of the first speaker of the database.
  - 10. The method of claim 8, further comprising:
    - segmenting the audio stream into a plurality of speech portions; and
      
      processing the speech portions through different cores in a multi-core processor at about a same time.
  - 11. The method of claim 8, further comprising:
    - auto-filling the corrected text associated with the first speaker in at least one of the text inputs of a user interface as the first speaker speaks.
  - 12. The method of claim 11, further comprising:
    - processing a correction audibly made by the first speaker; and
      
      auto-filling the audibly made correction in the at least one text input of the user interface.
  - 13. The method of claim 8, wherein:
    - at least a portion of the pre-learned phonemes comprises subject specific phonemes.
  - 14. The method of claim 8, further comprising:
    - removing duplicated words in the text associated with the first speaker.

15. A non-transitory computer readable medium comprising instructions that, when executed by a multi-core processor, direct the multi-core processor to:
- differentiate between multiple speakers in an audio stream of speech based in part on frequencies in the audio stream, wherein said audio stream of speech comprises audible speech of at least a first speaker and a second speaker;
  
  convert the audio stream into text;
  
  generate time stamps in the audio stream to associate the text with the audio stream;
  
  access pre-learned phonemes;
  
  identify the first speaker in the audio stream based on the pre-learned phonemes;
  
  locate a portion of the text associated with the first speaker based on the time stamps;
  
  segment the text associated with the first speaker into text phonemes;
  
  correct the text associated with the first speaker in real-time by comparing the text phonemes with phonetically-similar letter pairs of the pre-learned phonemes and applying one or more filters to the text to generate a clean transcript; and
  
  execute a transaction based on the clean transcript.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer readable medium of claim 15, further comprising instructions that direct the multi-core processor to:
    - during a training stage, process speech of the first speaker;
      
      learn phonemes of the first speaker;
      
      build a profile of the first speaker in a database; and
      
      store the phonemes of the first speaker as the pre-learned phonemes of the first speaker in the profile of the first speaker of the database.
  - 17. The computer readable medium of claim 15, further comprising instructions that direct the multi-core processor to:
    - segment the audio stream into a plurality of speech portions; and
      
      process the speech portions through different cores of the multi-core processor at about a same time.
  - 18. The computer readable medium of claim 15, further comprising instructions that direct the multi-core processor to:
    - auto-fill the corrected text associated with the first speaker in at least one of the text inputs of a user interface as the first speaker speaks.
  - 19. The computer readable medium of claim 18, further comprising instructions that direct the multi-core processor to:
    - process a correction audibly made by the first speaker; and
      
      auto-fill the audibly made correction in the at least one text input of the user interface.
  - 20. The computer readable medium of claim 16, further comprising instructions that direct the multi-core processor to:
    - remove duplicated words in the text associated with the first speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Green Key Technologies, Inc. (VoxSmart Limited)
Original Assignee
Green Key Technologies, Inc. (VoxSmart Limited)
Inventors
Shastry, Tejas, Tassone, Anthony, Kuca, Patrick, Vergun, Svyatoslav
Primary Examiner(s)
Kim, Jonathan C

Application Number

US15/952,802
Publication Number

US 20180301143A1
Time in Patent Office

1,243 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/08   Speech classification or se...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 15/28   Constructional details of s...

G10L 17/04   Training, enrolment or mode...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/088   Word spotting

G10L 25/18   the extracted parameters be...

Adaptive self-trained computer engines with associated databases and methods of use thereof

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive self-trained computer engines with associated databases and methods of use thereof

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links