Systems and methods for filtering dictated and non-dictated sections of documents

US 8,036,889 B2
Filed: 02/27/2006
Issued: 10/11/2011
Est. Priority Date: 02/27/2006
Status: Active Grant

First Claim

Patent Images

1. A method for filtering dictated and non-dictated sections of documents, the method comprising steps of:

gathering speech recognition output and a first set of corresponding documents;

conforming at least one associated document from the first set of corresponding documents to a selected speech recognition format;

comparing the speech recognition output and the at least one associated document;

determining, using a processing unit, long homogeneous sequences of misaligned tokens from the speech recognition output and the at least one associated document;

detecting boundaries between dictated and non-dictated sections in the at least one associated document; and

annotating the at least one associated document with the boundaries.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for filtering documents to determine section boundaries between dictated and non-dictated text. The system and method identifies portions of a text report that correspond to an original dictation and, correspondingly, those portions that are not part of the original dictation. The system and method include comparing tokenized and normalized forms of the original dictation and the final report, determining mismatches between the two forms, and applying machine-learning techniques to identify document headers, footers, page turns, macros, and lists automatically and accurately.

52 Citations

View as Search Results

30 Claims

1. A method for filtering dictated and non-dictated sections of documents, the method comprising steps of:
- gathering speech recognition output and a first set of corresponding documents;
  
  conforming at least one associated document from the first set of corresponding documents to a selected speech recognition format;
  
  comparing the speech recognition output and the at least one associated document;
  
  determining, using a processing unit, long homogeneous sequences of misaligned tokens from the speech recognition output and the at least one associated document;
  
  detecting boundaries between dictated and non-dictated sections in the at least one associated document; and
  
  annotating the at least one associated document with the boundaries.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method according to claim 1, wherein the conforming step further comprises pre-processing the at least one associated document.
  - 3. The method according to claim 1, wherein the comparing step further comprises performing label smoothing on the speech recognition output and the at least one associated document.
  - 4. The method according to claim 3, wherein the label smoothing is performed using a sliding average.
  - 5. The method according to claim 4, wherein the label smoothing is performed using a window size of 3.
  - 6. The method according to claim 1, wherein the determining long homogeneous sequences of misaligned tokens comprises detecting formatting anchors.
  - 7. The method according to claim 1, wherein the detecting step further comprises identifying end points of the determined long homogeneous sequences of misaligned tokens.
  - 8. The method according to claim 1, further comprising outputting the dictated sections to at least one automatic speech recognition process.
  - 9. The method according to claim 8, wherein the at least one automatic speech recognition process is selected from the group consisting of language model identification, language model adaptation, acoustic model adaptation, automatic error correction, and speaker evaluation.
  - 10. The method according to claim 1, further comprising creating classification models in order to distinguish between dictated and non-dictated sections of text in the at least one associated document.
  - 11. The method according to claim 10, further comprising, based on the classification models, categorizing text of a second set of documents to identify dictated and non-dictated sections of text within at least one document from the second set of documents.
  - 12. The method according to claim 11, further comprising outputting dictated sections of the at least one document from the second set of documents to an automatic speech recognition process.
  - 13. The method according to claim 12, wherein the first set of documents does not equal the second set of documents.
  - 14. The method according to claim 1, wherein determining long homogeneous sequences of misaligned tokens comprises:
    - based on the comparing, labeling at least some tokens in the at least one associated document as misaligned tokens; and
      
      identifying sequences of a predetermined number or more of consecutive tokens in the at least one associated document that are labeled as misaligned tokens as the long homogeneous sequences of misaligned tokens.

15. A system for filtering dictated and non-dictated sections of electronic documents to determine dictated and non-dictated text in the documents, the system comprising:
- a central processing unit;
  
  a computer code operatively associated with the central processing unit, the computer code including;
  
  a first set of instructions configured to gather speech recognition output and a first set of documents corresponding to the speech recognition output;
  
  a second set of instructions configured to conform at least one associated document from the first set of corresponding documents to a selected speech recognition format;
  
  a third set of instructions configured to compare the speech recognition output and the at least one associated document;
  
  a fourth set of instructions configured to determine long homogeneous sequences of misaligned tokens from the speech recognition output and the at least one associated document;
  
  a fifth set of instructions configured to detect boundaries between dictated and non-dictated sections in the at least one associated document; and
  
  a sixth set of instructions configured to annotate the at least one associated document with the boundaries.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The system according to claim 15, wherein the second set of instructions further comprises instructions to pre-process the at least one associated document.
  - 17. The system according to claim 15, wherein the third set of instructions further comprises instructions to perform label smoothing on the speech recognition output and the at least one associated document.
  - 18. The system according to claim 17, wherein the label smoothing is performed using a sliding average.
  - 19. The system according to claim 18, wherein the label smoothing is performed using a window size of 3.
  - 20. The system according to claim 15, wherein the fourth set of instructions further comprises instructions to detect formatting anchors.
  - 21. The system according to claim 15, wherein the fifth set of instructions further comprises instructions to identify end points of the determined long homogeneous sequences of misaligned tokens.
  - 22. The system according to claim 15, wherein the computer code further includes a seventh set of instructions configured to output the dictated sections to at least one automatic speech recognition process.
  - 23. The system according to claim 22, wherein the at least one automatic speech recognition process is selected from the group consisting of language model identification, language model adaptation, acoustic model adaptation, smart rewrite and speaker evaluation.
  - 24. The system according to claim 15, wherein the computer code includes an eighth set of instructions configured to create classification models in order to distinguish between dictated and non-dictated sections of text in the at least one associated document.
  - 25. The system according to claim 24, wherein the computer code includes a ninth set of instructions configured to, based on the classification models, categorize text of a second set of documents to identify dictated and non-dictated sections of text within at least one document from the second set of documents.
  - 26. The system according to claim 25, wherein the computer code includes a tenth set of instructions configured to output dictated sections of the at least one document from the second set of documents to an automatic speech recognition process.
  - 27. The system according to claim 26, wherein the first set of documents does not equal the second set of documents.
  - 28. The system according to claim 15, wherein the fourth set of instructions comprises instructions to:
    - based on the comparing, label at least some tokens in the at least one associated document as misaligned tokens; and
      
      identify sequences of a predetermined number or more of consecutive tokens in the at least one associated document that are labeled as misaligned tokens as the long homogeneous sequences of misaligned tokens.

29. A method for identifying dictated and non-dictated sections of at least one document, the method comprising:
- comparing, using a processing unit, speech recognition output and at least one associated document to label at least some tokens in the at least one associated document as misaligned tokens;
  
  identifying at least one sequence of a predetermined number or more of consecutive tokens in the at least one associated document that are labeled as misaligned tokens;
  
  based at least in part on the at least one identified sequence, identifying at least one boundary between at least one dictated section and at least one non-dictated section in the at least one associated document; and
  
  annotating the at least one associated document with the at least one boundary.

30. A system for identifying dictated and non-dictated sections of at least one document, the system comprising:
- a central processing unit; and
  
  a computer code operatively associated with the central processing unit, the computer code including instructions to cause the central processing unit to;
  
  compare speech recognition output and at least one associated document to label at least some tokens in the at least one associated document as misaligned tokens;
  
  identify at least one sequence of a predetermined number or more of consecutive tokens in the at least one associated document that are labeled as misaligned tokens;
  
  based at least in part on the at least one identified sequence, identify at least one boundary between at least one dictated section and at least one non-dictated section in the at least one associated document; and
  
  annotate the at least one associated document with the at least one boundary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Lapshina, Larissa, Rechea, Bernardo, Carus, Alwin B.
Primary Examiner(s)
Lerner; Martin

Application Number

US11/362,646
Publication Number

US 20070203707A1
Time in Patent Office

2,052 Days
Field of Search

704/234, 704/235, 704/241, 704/243, 704/244, 704/255, 715/230, 715/727
US Class Current

704/235
CPC Class Codes

G06F 40/103   Formatting, i.e. changing o...

G10L 15/18   using natural language mode...

G10L 15/22   Procedures used during a sp...

Systems and methods for filtering dictated and non-dictated sections of documents

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

52 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for filtering dictated and non-dictated sections of documents

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

52 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links