Automated training of a user audio profile using transcribed medical record recordings
First Claim
1. A method performed on at least one processor for training a user audio profile without requiring a user to read known text, comprising the steps of:
- selecting a pre-recorded big audio file;
automatically generating a plurality of little audio files from the pre-recorded big audio file;
obtaining a big text file corresponding to the big audio file;
generating a plurality of little text files using the big text file and endpointing metadata, wherein there is one little text file for each of the plurality of little audio files;
creating a plurality of audio-text training pairs by linking each one of the plurality of little audio files with the corresponding one of the plurality of little text files;
selecting at least one of the plurality of audio-text training pairs to train the user audio profile; and
transmitting the at least one of the plurality of audio-text training pairs to a speech to text engine to train the user audio profile.
1 Assignment
0 Petitions
Accused Products
Abstract
An automated system to build a user audio profile for a natural or continuous language speech to text dictation/transcription system is provided. The system uses previously recorded audio files that may have been already transcribed. The previously recorded audio file is split into a plurality of smaller audio files of about 15 seconds in length. The plurality of smaller audio files are matched to the transcribed text (e.g., small text files) or the smaller audio files are transcribed. All, some, or a selection of the small audio files and the small text files are linked as a training pair. The training pair may be edited in certain embodiments herein, both the text and the audio. The training pairs are submitted to the server to build the initial user audio profile.
-
Citations
19 Claims
-
1. A method performed on at least one processor for training a user audio profile without requiring a user to read known text, comprising the steps of:
-
selecting a pre-recorded big audio file; automatically generating a plurality of little audio files from the pre-recorded big audio file; obtaining a big text file corresponding to the big audio file; generating a plurality of little text files using the big text file and endpointing metadata, wherein there is one little text file for each of the plurality of little audio files; creating a plurality of audio-text training pairs by linking each one of the plurality of little audio files with the corresponding one of the plurality of little text files; selecting at least one of the plurality of audio-text training pairs to train the user audio profile; and transmitting the at least one of the plurality of audio-text training pairs to a speech to text engine to train the user audio profile. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus comprising:
-
a processor, wherein the processor is operatively coupled to a speech to text engine; a memory operatively coupled to the processor; and a display operatively coupled to the processor and the memory; wherein the memory is configured to store audio and text files, wherein the processor is configured to fetch a big audio file from the memory and create a plurality of little audio files, wherein the processor is configured to obtain a big text file corresponding to the big audio file, wherein the processor is configured to generate a plurality of little text files using the big text file and endpointing metadata, wherein the processor is configured to link the plurality of little audio files and the plurality of little text files to create a plurality of audio-text training pairs that are displayed on the display, and wherein the processor is configured to transmit the plurality of audio-text training pairs to the speech to text engine for training a user audio profile. - View Dependent Claims (15, 16, 17)
-
-
18. A non-transitory computer program product storable in a memory and executable by a computer comprising a computer usable medium including computer readable code embodied therein for processing data to allow training of a user audio profile, the computer usable medium comprising:
-
code adapted to be executed by a processor configured to select a big audio file; code adapted to be executed by a processor configured to generate a plurality of little audio files from the big audio file; code adapted to be executed by a processor configured to obtain a big text file corresponding to the big audio file; code adapted to be executed by a processor configured to generate a plurality of little text files using the big text file and endpointing metadata, wherein there is one little text file for each of the plurality of little audio files; code adapted to be executed by a processor configured to create a plurality of audio-text training pairs by linking the plurality of little audio files with the corresponding plurality of little text files; code adapted to be executed by a processor configured to display the plurality of audio-text training pairs; code adapted to be executed by a processor configured to select at least one of the plurality of audio-text training pairs to train the user audio profile; and code adapted to be executed by a processor configured to transmit the at least one of the plurality of audio-text training pairs to a speech to text engine to train the user audio profile. - View Dependent Claims (19)
-
Specification