AUTOMATED DOCUMENT IDENTIFICATION AND LANGUAGE DICTATION RECOGNITION SYSTEMS AND METHODS FOR USING THE SAME

US 20170294190A1
Filed: 06/15/2017
Published: 10/12/2017
Est. Priority Date: 10/05/2010
Status: Active Grant

First Claim

Patent Images

1. A computerized method for automated identification of verbal records to improve a textual transcript, the method comprising the steps of:

selecting a plurality of verbal records from a database, the each of the plurality of verbal records comprising supporting information;

processing the each of the verbal records into a feature vector comprising a plurality of verbal record feature vectors;

creating a plurality of basic classifiers;

evaluating the each of the plurality of basic classifiers;

creating a plurality of boosted classifiers, the each of the plurality of boosted classifiers being a combination of the each of the plurality of basic classifiers;

testing the performance of the each of the boosted classifiers on a test set of training vectors, and determining which of the each of the boosted classifiers performed the best;

adding one of the plurality of basic classifiers to the boosted classifier, based on a first vector weight;

testing the performance of the boosted classifiers;

adjusting the first vector weight and testing the performance of the each of the boosted classifiers on the test set of training vectors, and determining which of the each of the boosted classifiers performs the best;

selecting a best boosted classifier; and

saving the best boosted classifier and supporting structures.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In at least one exemplary embodiment for automated document identification and language dictation recognition systems, the system comprises a database capable of receiving a plurality of verbal records, the verbal record comprising at least one identifier and at least one verbal feature and a processor operably coupled to the database, where the processor has and executes a software program. The processor being operational to identify a subset of the plurality of verbal records from the database, extract at least one verbal feature from the identified records, analyze the at least one verbal feature of the subset of the plurality of verbal records, process the subset of the plurality of records using the analyzed feature according to at least one reasoning approach, generate a processed verbal record using the processed subset of the plurality of records, and deliver the processed verbal record to a recipient. The processor being further operational to extract features for a pool of training documents, to turn each transcription job into a feature vector which can be used by a traditional classifier, creating classifiers with different parameters in order to explore the best possible strategy, evaluating performance of all classifiers, creating a boosting classifier, calculating performance statistics, and operating the automatic document identifier for all documents.

Citations

18 Claims

1. A computerized method for automated identification of verbal records to improve a textual transcript, the method comprising the steps of:
- selecting a plurality of verbal records from a database, the each of the plurality of verbal records comprising supporting information;
  
  processing the each of the verbal records into a feature vector comprising a plurality of verbal record feature vectors;
  
  creating a plurality of basic classifiers;
  
  evaluating the each of the plurality of basic classifiers;
  
  creating a plurality of boosted classifiers, the each of the plurality of boosted classifiers being a combination of the each of the plurality of basic classifiers;
  
  testing the performance of the each of the boosted classifiers on a test set of training vectors, and determining which of the each of the boosted classifiers performed the best;
  
  adding one of the plurality of basic classifiers to the boosted classifier, based on a first vector weight;
  
  testing the performance of the boosted classifiers;
  
  adjusting the first vector weight and testing the performance of the each of the boosted classifiers on the test set of training vectors, and determining which of the each of the boosted classifiers performs the best;
  
  selecting a best boosted classifier; and
  
  saving the best boosted classifier and supporting structures.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the verbal record feature vectors, is selected from a group consisting of speech recognition output using a balanced language mode, speech recognition output using a pooled language model, the type of device used to dictate the verbal record, the day the verbal record was submitted, the time the verbal record was submitted, the duration of the verbal record, the number of silences, the total duration of silences, the noise threshold, the number of silences per second, the total duration of silence per second, the average amplitude in the verbal record, the standard deviation of the amplitude in the verbal record, the number of long silences per second, and the total duration of long silences per second.
  - 3. The method of claim 1, further comprising processing the plurality of records using the best boosted classifier;
  - 4. The method of claim 1, wherein the basic classifiers are selected from a group consisting of language classifiers, decision tree classifiers, and k-nearest classifiers.
  - 5. The method of claim 4, wherein the language classifier receives input speech recognition output, solely as input.
  - 6. The method of claim 4, wherein the decision tree classifier receives non-speech recognition features as input.
  - 7. The method of claim 4, wherein the k-nearest neighbor classifier receives non-speech recognition features, the features supporting the feature vector for distance calculation in a k-nearest algorithm.
  - 8. The method of claim 1, wherein evaluating the each of the plurality of basic classifiers, further comprises testing the each of the plurality of basic classifiers on a test set of training vectors.
  - 9. The method of claim 8, further comprising the step of taking into account any changing weights of the each of the basic classifiers and the corresponding training vectors.

10. A system for automated identification of verbal records to improve a textual transcript, the system comprising:
- a database capable of receiving a plurality of verbal records, the each of the plurality of verbal records comprising supporting information;
  
  a processor operably coupled to the database, and configured to;
  
  process the each of the verbal records into a feature vector comprising a plurality of verbal record feature vectors;
  
  create a plurality of basic classifiers;
  
  evaluate the each of the plurality of basic classifiers;
  
  create a plurality of boosted classifiers, the each of the plurality of boosted classifiers being a combination of the each of the plurality of basic classifiers;
  
  test the performance of the each of the boosted classifiers on a test set of training vectors, and determining which of the each of the boosted classifiers performed the best;
  
  add one of the plurality of basic classifiers to the boosted classifier, based on a first vector weight;
  
  test the performance of the boosted classifiers;
  
  adjust the first vector weight and testing the performance of the each of the boosted classifiers on the test set of training vectors, and determining which of the each of the boosted classifiers performs the best;
  
  select a best boosted classifier; and
  
  save the best boosted classifier and supporting structures.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein the verbal record feature vectors, is selected from a group consisting of speech recognition output using a balanced language mode, speech recognition output using a pooled language model, the type of device used to dictate the verbal record, the day the verbal record was submitted, the time the verbal record was submitted, the duration of the verbal record, the number of silences, the total duration of silences, the noise threshold, the number of silences per second, the total duration of silence per second, the average amplitude in the verbal record, the standard deviation of the amplitude in the verbal record, the number of long silences per second, and the total duration of long silences per second.
  - 12. The system of claim 10, wherein the processor is further configured to process the plurality of records using the best boosted classifier;
  - 13. The system of claim 10, wherein the processor is further configured to selected the basic classifiers from a group consisting of language classifiers, decision tree classifiers, and k-nearest classifiers.
  - 14. The system of claim 13, wherein the processor is further configured receive speech recognition output, solely as input.
  - 15. The system of claim 13, wherein the processor is further configured to receive non-speech recognition features as input.
  - 16. The system of claim 13, wherein the processor is further configured to receive non-speech recognition features, the features supporting the feature vector for distance calculation in a k-nearest algorithm.
  - 17. The system of claim 10, wherein the processor is further configured to test the each of the plurality of basic classifiers on a test set of training vectors.

18. The system of claim 19, wherein the processor is further configured to take into account any changing weights of the each of the basic classifiers and the corresponding training vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
InfraWare, Inc.
Original Assignee
InfraWare, Inc.
Inventors
Lindle, Nathan, Mahurin, Nick

Granted Patent

US 10,224,036 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G01L 15/00   Devices or apparatus for me...

G06F 18/214   Generating training pattern...

G06F 18/24147   Distances to closest patter...

G06F 18/28   Determining representative ...

G06F 40/149   Adaptation of the text data...

G06F 40/205   Parsing

G06N 20/00   Machine learning

G10L 15/00   Speech recognition G10L17/0...

G10L 15/02   Feature extraction for spee...

G10L 15/1822   Parsing for meaning underst...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/025   Phonemes, fenemes or fenone...

AUTOMATED DOCUMENT IDENTIFICATION AND LANGUAGE DICTATION RECOGNITION SYSTEMS AND METHODS FOR USING THE SAME

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATED DOCUMENT IDENTIFICATION AND LANGUAGE DICTATION RECOGNITION SYSTEMS AND METHODS FOR USING THE SAME

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links