Automated identification of verbal records using boosted classifiers to improve a textual transcript
First Claim
1. A computerized method for automated identification of audio records to improve a textual transcript, the method comprising the steps of:
- selecting a plurality of audio records from a database, the each of the plurality of audio records comprising supporting information;
obtaining a first speech recognition output of the each of the plurality of audio records using a pooled language model;
obtaining a second speech recognition output of the each of the plurality of audio records using a balanced language model;
processing the each of the audio records into a feature vector comprising the first speech recognition output, the second speech recognition output, and speech recognition output comprising a plurality of audio record feature vectors, selected from a group consisting of a number of silences, a noise threshold, a number of silences per second, a total duration of silence per second, an average amplitude in the audio record, a standard deviation of the amplitude in the audio record, a number of long silences per second, and a total duration of long silences per second;
creating a plurality of basic classifiers based on the feature vector and selected from a group consisting of language classifiers, decision tree classifiers, and k-nearest classifiers;
evaluating the each of the plurality of basic classifiers;
creating a plurality of boosted classifiers, the each of the plurality of boosted classifiers being a combination of the each of the plurality of basic classifiers;
testing the performance of the each of the boosted classifiers on a test set of training vectors, and determining which of the each of the boosted classifiers performed the best;
adding one of the plurality of basic classifiers to the boosted classifier, based on a first vector weight;
testing the performance of the boosted classifiers;
adjusting the first vector weight and testing the performance of the each of the boosted classifiers on the test set of training vectors, and determining which of the each of the boosted classifiers performs the best;
selecting a best boosted classifier; and
saving the best boosted classifier and supporting structures.
0 Assignments
0 Petitions
Accused Products
Abstract
In at least one exemplary embodiment for automated document identification and language dictation recognition systems, the system comprises a database capable of receiving a plurality of verbal records, the verbal record comprising at least one identifier and at least one verbal feature and a processor operably coupled to the database, where the processor has and executes a software program. The processor being operational to identify a subset of the plurality of verbal records from the database, extract at least one verbal feature from the identified records, analyze the at least one verbal feature of the subset of the plurality of verbal records, process the subset of the plurality of records using the analyzed feature according to at least one reasoning approach, generate a processed verbal record using the processed subset of the plurality of records, and deliver the processed verbal record to a recipient. The processor being further operational to extract features for a pool of training documents, to turn each transcription job into a feature vector which can be used by a traditional classifier, creating classifiers with different parameters in order to explore the best possible strategy, evaluating performance of all classifiers, creating a boosting classifier, calculating performance statistics, and operating the automatic document identifier for all documents.
-
Citations
16 Claims
-
1. A computerized method for automated identification of audio records to improve a textual transcript, the method comprising the steps of:
-
selecting a plurality of audio records from a database, the each of the plurality of audio records comprising supporting information; obtaining a first speech recognition output of the each of the plurality of audio records using a pooled language model; obtaining a second speech recognition output of the each of the plurality of audio records using a balanced language model; processing the each of the audio records into a feature vector comprising the first speech recognition output, the second speech recognition output, and speech recognition output comprising a plurality of audio record feature vectors, selected from a group consisting of a number of silences, a noise threshold, a number of silences per second, a total duration of silence per second, an average amplitude in the audio record, a standard deviation of the amplitude in the audio record, a number of long silences per second, and a total duration of long silences per second; creating a plurality of basic classifiers based on the feature vector and selected from a group consisting of language classifiers, decision tree classifiers, and k-nearest classifiers; evaluating the each of the plurality of basic classifiers; creating a plurality of boosted classifiers, the each of the plurality of boosted classifiers being a combination of the each of the plurality of basic classifiers; testing the performance of the each of the boosted classifiers on a test set of training vectors, and determining which of the each of the boosted classifiers performed the best; adding one of the plurality of basic classifiers to the boosted classifier, based on a first vector weight; testing the performance of the boosted classifiers; adjusting the first vector weight and testing the performance of the each of the boosted classifiers on the test set of training vectors, and determining which of the each of the boosted classifiers performs the best; selecting a best boosted classifier; and saving the best boosted classifier and supporting structures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for automated identification of audio records to improve a textual transcript, the system comprising:
-
a database capable of receiving a plurality of audio records, the each of the plurality of audio records comprising supporting information; a processor operably coupled to the database, and configured to; obtain a first speech recognition output of the each of the plurality of audio records using a pooled language model; obtain a second speech recognition output of the each of the plurality of audio records using a balanced language model; process the each of the audio records into a feature vector comprising the first speech recognition output, the second speech recognition output, and speech recognition output comprising a plurality of audio record feature vectors, selected from a group consisting of a type of device used to dictate the audio record, a day the audio record was submitted, a time the audio record was submitted, a duration of the audio record, a number of silences, a noise threshold, a number of silences per second, a total duration of silence per second, an average amplitude in the audio record, a standard deviation of the amplitude in the audio record, a number of long silences per second, and a total duration of long silences per second; create a plurality of basic classifiers based at least in part on the feature vector; evaluate the each of the plurality of basic classifiers; create a plurality of boosted classifiers, the each of the plurality of boosted classifiers being a combination of the each of the plurality of basic classifiers; test the performance of the each of the boosted classifiers on a test set of training vectors, and determining which of the each of the boosted classifiers performed the best; add one of the plurality of basic classifiers to the boosted classifier, based on a first vector weight; test the performance of the boosted classifiers; adjust the first vector weight and testing the performance of the each of the boosted classifiers on the test set of training vectors, and determining which of the each of the boosted classifiers performs the best; select a best boosted classifier; and save the best boosted classifier and supporting structures. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification