Method and apparatus for processing the output of a speech recognition engine

US 6,961,700 B2
Filed: 03/18/2002
Issued: 11/01/2005
Est. Priority Date: 09/24/1996
Status: Expired due to Fees

First Claim

Patent Images

1. A data processing arrangement comprising:

a data processing apparatus, the data processing apparatus comprising;

input means for receiving recognition data from a speech recognition engine and audio data, said recognition data including a string of recognized characters and audio identifiers identifying audio components corresponding to a character component of the recognized characters;

processing means for receiving and processing the input recognized characters to at least one of replace, insert, move and position the recognized characters to form a processed character string;

link means for forming link data linking the audio identifiers to the character component positions in the character string, and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;

storage means for storing said recognition data and audio data received from said input means, and for storing said link data;

display means for displaying the characters received and processed by said processing means;

user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and

audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string; and

an editor work station comprising;

data reading means for reading the characters, link data, and audio data from said data processing apparatus;

editor processing means for processing the characters;

editor link means for linking the audio data to the character component position using the link data;

editor display means for displaying the characters being processed;

editor correction means for selecting and correcting any displayed characters which have been incorrectly recognized;

editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;

editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and

data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine;

said data processing apparatus including correction file reading means for reading said character correction file to pass the data contained therein to said speech recognition engine for the updating of the models used by said speech recognition engine.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Data processing apparatus for receiving recognition data from a speech recognition engine and its corresponding dictated audio data where the recognition data includes recognized words or characters. A display displays the recognized words or characters and the recognized words or characters are stored as a file together with the corresponding audio data. The recognized words or characters can be processed and link data is formed to link the position of the words or characters in the file and the position of the corresponding audio component in the audio data.

219 Citations

31 Claims

1. A data processing arrangement comprising:
- a data processing apparatus, the data processing apparatus comprising;
  
  input means for receiving recognition data from a speech recognition engine and audio data, said recognition data including a string of recognized characters and audio identifiers identifying audio components corresponding to a character component of the recognized characters;
  
  processing means for receiving and processing the input recognized characters to at least one of replace, insert, move and position the recognized characters to form a processed character string;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string, and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  storage means for storing said recognition data and audio data received from said input means, and for storing said link data;
  
  display means for displaying the characters received and processed by said processing means;
  
  user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string; and
  
  an editor work station comprising;
  
  data reading means for reading the characters, link data, and audio data from said data processing apparatus;
  
  editor processing means for processing the characters;
  
  editor link means for linking the audio data to the character component position using the link data;
  
  editor display means for displaying the characters being processed;
  
  editor correction means for selecting and correcting any displayed characters which have been incorrectly recognized;
  
  editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
  
  editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
  
  data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
  
  said data processing apparatus including correction file reading means for reading said character correction file to pass the data contained therein to said speech recognition engine for the updating of the models used by said speech recognition engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. A data processing arrangement as claimed in claim 1 wherein said recognition data includes alternative characters, said editor display means including means to display a choice list comprising the alternative characters, and said editor correcting means including means to select one of the alternative characters or to enter a new character.
  - 3. The data processing arrangement as claimed in claim 2 including editor contextual update means operable by a user to select recognized characters which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer moans being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction file to read the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 4. The data processing arrangement as claimed in claim 3 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of characters in the recognized characters by scanning the likelihood indicators in said recognition data for the characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a character to replace an incorrectly recognized character highlighted in the text; and
      
      second editor correction means for replacing the incorrectly recognized character with the selected character to correct the recognized characters.
  - 5. The data processing arrangement as claimed in claim 4 wherein said data processing apparatus includes file storage means for storing the recognized characters in a file;
    - means for selectively disabling one of the receipt of the recognized characters by said processing means and the recognition of speech by said speech recognition engine for a period of time;
      
      means for storing the received audio data during said period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing means, and for reading said audio message for playback by said audio playback means;
      
      said editor work station including audio message reading means for reading the audio message associated with characters being processed by said editor processing means for playback by said editor audio playback means.
  - 6. The data processing arrangement as claimed in claim 5 wherein said audio message reading means is controllable by a user to read said audio message at any time the associated characters are being processed by said editor processing means.
  - 7. A data processing arrangement as claimed in claim 1 including editor contextual update means operable by a user to select recognized characters which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer means being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction file to road the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 8. The data processing arrangement as claimed in claim 7 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of characters in the recognized characters by scanning the likelihood indicators in said recognition data for the characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a character to replace an incorrectly recognized character highlighted in the text; and
      
      second editor correction means for replacing the incorrectly recognized character with the selected character to correct the recognized characters.
  - 9. The data processing arrangement as claimed in claim 8 wherein said data processing apparatus includes file storage means for storing the recognized characters in a file;
    - means for selectively disabling one of the receipt of the recognized characters by said processing means and the recognition of speech by said speech recognition engine for a period of time;
      
      means for storing the received audio data during said period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing means, and for reading said audio message for playback by said audio playback means;
      
      said editor work station including audio message reading means for reading the audio message associated with characters being processed by said editor processing means for playback by said editor audio playback means.
  - 10. The data processing arrangement as claimed in claim 9 wherein said audio message reading means is controllable by a user to read said audio message at any time the associated characters are being processed by said editor processing means.
  - 11. A data processing arrangement as claimed in claim 1 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of characters in the recognized characters by scanning the likelihood indicators in said recognition data for the characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a character to replace an incorrectly recognized character highlighted in the text; and
      
      second editor correction means for replacing the incorrectly recognized character with the selected character to correct the recognized characters.
  - 12. A data processing arrangement as claimed in claim 1 wherein said data processing apparatus includes file storage means for storing the recognized characters in a file;
    - means for selectively disabling one of the receipt of the recognized characters by said processing means and the recognition of speech by said speech recognition engine for a period of time;
      
      means for storing the received audio data during said period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing means, and for reading said audio message for playback by said audio playback means;
      
      said editor work station including audio message reading means for reading the audio message associated with characters being processed by said editor processing means for playback by said editor audio playback means.
  - 13. A data processing arrangement as claimed in claim 12 wherein said audio message reading means is controllable by a user to read said audio message at any time the associated characters are being processed by said editor processing means.
  - 14. An editor work station for use with the data processing arrangement as claimed in claim 1, said editor work station comprising:
    - data reading means for reading the characters, link data, and audio data from said data processing apparatus;
      
      editor processing means for processing characters;
      
      editor link means for linking the audio data to the character component position using the link data;
      
      editor display means for displaying the read characters;
      
      editor correction means for selecting and correcting any displayed characters which have been incorrectly recognized;
      
      editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
      
      editor speech recognition update means for storing the corrected character and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
      
      data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine.
  - 15. An editor work station as claimed in claim 14 wherein said recognition data includes alternative characters, said editor display means including means to display a choice list comprising the alternative characters, and said editor correcting means including means to select one of the alternative characters or to enter a new character.
  - 16. An editor work station as claimed in claim 14 including editor contextual update means operable by a user to select recognized characters which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer means being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction tile to read the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 17. An editor work station as claimed in claim 14 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of characters in the recognized characters by scanning the likelihood indicators in said recognition data for the characters and detecting if the likelihood indicator for a character is below a likelihood threshold, wherein said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a character to replace an incorrectly recognized character highlighted in the character string; and
      
      second editor correction means for replacing the incorrectly recognized character with the selected character to correct the recognized characters.
  - 18. A data processing arrangement as claimed in claim 1 comprising a plurality of said data processing apparatus connected to a network, and at least one editor work station, wherein each editor work station can access and edit stored characters and audio data on a plurality of said data processing apparatus.

19. A method of processing data comprising:
- at an author work station, carrying out a method including;
  
  receiving recognition data including a string of recognized characters and audio identifiers identifying audio components corresponding to a character component of the recognized characters;
  
  storing the received audio data;
  
  inputting the recognized characters to a processor for the processing of the characters to at least one of replace, insert, move and position the characters to form a processed character string;
  
  forming link data linking the audio identifiers to the character component positions in the character string and updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  displaying the characters input to and processed by the processor;
  
  selecting displayed characters for audio playback, whereby said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  playing back the selected audio components in the order of the character component positions in the character string or processed character string;
  
  wherein the recognized characters, the link data and the audio data are stored; and
  
  at an editor work station, obtaining the stored characters, link data and audio data from the author work station;
  
  inputting the characters into a processor;
  
  linking the audio data to the character component positions using the link data;
  
  displaying the characters being processed;
  
  selecting any displayed characters which have been incorrectly recognized;
  
  playing back any audio component corresponding to the selected characters to aid correction;
  
  correcting the incorrectly recognized characters;
  
  storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
  
  transferring the character correction file to the author work station for later updating of models used by said speech recognition engine;
  
  wherein, at a later time, said character correction file is read at said author work station to pass the data contained therein to said speech recognition engine for updating of said models.
- View Dependent Claims (20, 21, 22, 23, 24, 25)
- - 20. A method as claimed in claim 19 wherein said recognition data includes alternative characters, the correcting step at said editor work station, comprising the steps of displaying a choice list comprising the alternative characters, and selecting one of the alternative characters or entering a new character.
  - 21. A method as claimed in claim 19 including the steps at said editor work station of selecting recognized characters which are to be used to provide contextual correcting parameters to said speech recognition engine at said author work station;
    - storing said contextual correcting parameters in a contextual correction file; and
      
      transferring said contextual correction file to said author work station for later updating of models used by said speech recognition engine; and
      
      at said author work station, at a later time, reading the transferred contextual correction file and passing the data contained therein to said speech recognition engine.
  - 22. A method as claimed in claim 19 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, the method including the steps at said editor work station ofautomatically detecting possible errors in recognition of characters by scanning the likelihood indicators for the characters;
    - detecting if the likelihood indicator for a character is below a likelihood threshold, whereby characters having a likelihood indicator below the likelihood threshold are displayed highlighted;
      
      selecting a character to replace an incorrectly recognized character highlighted in the character string; and
      
      replacing the incorrectly recognized character with the selected character to correct the characters.
  - 23. A method as claimed in claim 19 wherein the method includes:
    - at said author work station, storing the characters as a file;
      
      selectively disabling one of the importation of recognized characters into the processor and the recognition of speech by said speech recognition engine for a period of time;
      
      storing the received audio data during said period of time as an audio message associated with the file;
      
      at a later time, reading said file for input to the processor; and
      
      at said editor work station, reading the audio message associated with the file being processed by the processor, and playing back the read audio message.
  - 24. A method as claimed in claim 23 wherein the audio message can be read and played back at any time said file is open in the processor.
  - 25. A method as claimed in claim 19 including the step of allowing a user of the editor work station to playback the audio data for the most recent passage of dictated characters.

26. A data processing arrangement comprising:
- data processing apparatus comprising;
  
  input means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognized characters and audio identifiers identifying audio components corresponding to character components of the recognized characters;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string;
  
  storage means for storing said audio data received from said input means, said link data, and said recognized characters; and
  
  display means for displaying the recognized characters; and
  
  an editor work station comprising;
  
  data reading means for obtaining the characters, link data, and audio data from said data processing apparatus;
  
  editor processing means for processing the characters;
  
  editor link means for linking the audio data to the character component position using the link data;
  
  editor display means for displaying the characters being processed;
  
  editor correction means for selecting and correcting any displayed characters which have been incorrectly recognized;
  
  editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
  
  editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
  
  data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
  
  said data processing apparatus including correction file reading means far reading said character correction file to pass the data contained therein to said speech recognition engine.
- View Dependent Claims (27, 28, 29)
- - 27. A data processing arrangement as claimed in claim 26 wherein said data processing apparatus includesprocessing means for receiving and processing the input recognized characters to replace, insert, move and/or position the recognized characters;
    - user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and
      
      audio playback means for playing back the selected audio components in the order of the character component positions in the character string.
  - 28. An editor work station for use with the data processing arrangement as claimed in claim 26, said editor work station comprising:
    - data reading means for reading the characters, link data, and audio data from said data processing apparatus;
      
      editor processing means for processing characters;
      
      editor link means for linking the audio data to the character component position using the link data;
      
      editor display means for displaying the read characters;
      
      editor correction means for selecting and correcting any displayed characters which have been incorrectly recognized;
      
      editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
      
      editor speech recognition update means for storing the corrected character and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
      
      data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine.
  - 29. Data processing apparatus for use with the data processing arrangement of claim 26, said data processing apparatus comprising:
    - input means for receiving recognition data and corresponding audio data from a speech recognition engine, said recognition data including a string of recognized characters and audio identifiers identifying audio components corresponding to character components of the recognised characters;
      
      link means for forming link data linking the audio identifiers to the character component positions in the character string;
      
      storage means for storing said audio data received from said input means, said link data, and said characters;
      
      display means for displaying the recognized characters; and
      
      correction file reading means for reading said character correction file and for passing the data contained therein to said speech recognition engine.

30. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data, the instructions comprising instructions forcausing the processor to receive the signals from a speech recognition engine;
- causing the processor to generate an image of the characters on a display;
  
  causing the processor to store the characters as a file;
  
  causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time; and
  
  causing the processor to store the received audio data during said period of time as an audio message associated with the file.
- View Dependent Claims (31)
- - 31. A computer usable medium as claimed in claim 30 including instructions forcausing the processor to read the stored characters and audio signal;
    - causing the processor to generate an image of the characters for display; and
      
      causing the processor to send the audio data to an audio play back device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Allvoice Developments US LLC
Original Assignee
Allvoice Computing PLC (Allvoice Developments US LLC)
Inventors
Mitchell, John C., Daniel, Nicholas John, Corbett, Steven Norman, Heard, Alan James
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US10/100,546
Publication Number

US 20020099542A1
Time in Patent Office

1,324 Days
Field of Search

704/235, 704/270, 704/275, 704/244
US Class Current

704/235
CPC Class Codes

G06F 40/137   Hierarchical processing, e....

G06F 40/58   Use of machine translation,...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 2015/225   Feedback of the input speech

Method and apparatus for processing the output of a speech recognition engine

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

219 Citations

31 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for processing the output of a speech recognition engine

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

219 Citations

31 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others