Method and apparatus for processing the output of a speech recognition engine

US 20020099542A1
Filed: 03/18/2002
Published: 07/25/2002
Est. Priority Date: 09/24/1996
Status: Active Grant

First Claim

Patent Images

1. Data processing apparatus comprising:

input means for receiving recognition data from a speech recognition engine and audio data, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters;

storage means for storing said audio data received from said input means;

processing means for receiving and processing the input recognised characters to at least one of replace, insert, move and position the recognised characters to form a processed character string;

link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;

display means for displaying the characters received and processed by said processing means;

user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and

audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

data processing apparatus is disclosed for receiving recognition data from a speech recognition engine and its corresponding dictated audio data where the recognition data includes recognized words or characters. A display displays the recognized words or characters and the recognized words or characters re stored as a file together with the corresponding audio data. The recognized words or characters can be processed and link data is formed to link the position of the words or characters in the file and the position of the corresponding audio component in the audio data.

Citations

78 Claims

1. Data processing apparatus comprising:
- input means for receiving recognition data from a speech recognition engine and audio data, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters;
  
  storage means for storing said audio data received from said input means;
  
  processing means for receiving and processing the input recognised characters to at least one of replace, insert, move and position the recognised characters to form a processed character string;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  display means for displaying the characters received and processed by said processing means;
  
  user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 18, 20, 21, 57)
- - 2. Data processing apparatus as claimed in claim 1 wherein said storage means stores the characters, the link data and the audio data, and said storage reading means for reading the stored characters into said processing means and for reading the stored link data for use by said processing means and said link means, whereby said user operable selection means can select displayed characters for audio playback and said audio playback means reads and plays back the audio components corresponding to the selected characters.
  - 3. Data processing apparatus as claimed in claim 1 or claim 2 including user operable correction means for selecting and correcting any displayed recognised characters which have been incorrectly recognised;
    - correction audio playback means for controlling said audio playback means to play back any audio component corresponding to the selected characters to aid correction; and
      
      speech recognition update means for sending the corrected characters and the audio identifier for the audio component corresponding to the corrected character to the speech recognition engine.
  - 4. Data processing apparatus as claimed in claim 3 wherein said recognition data includes alternative characters, said display means including means to display a choice list comprising the alternative characters, and said selecting and correcting means including means to select one of the alternative characters or to enter a new character.
  - 5. Data processing apparatus as claimed in any preceding claim wherein said link means comprises memory means storing a list of character locations in the character string and positions of the corresponding audio components in the audio data.
  - 6. Data processing apparatus as claimed in claim 5 wherein said character string is formed of a plurality of separately dictated passages of characters, the apparatus including audio storage means storing said audio data for each dictated passage of characters in a separate file, and said memory means storing a list identifying the files and positions in the files of the audio components in said audio data corresponding to the character locations in the character string.
  - 7. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine, the apparatus including status detection means for detecting said recognition status indicators, and display control means to control said display means to display characters which are still being recognised differently to characters which have been recognised, said link means being responsive to said recognition status indicators to link the recognised characters to the corresponding audio component in the audio data.
  - 8. Data processing apparatus as claimed in any preceding claim including contextual update means operable by a user to select recognised characters which are to be used to provide contextual correcting parameters to said speech recognition engine, and to send said contextual correcting parameters to said speech recognition engine.
  - 9. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link means stores the likelihood indicators, the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said display means highlights the character having a likelihood indicator below the likelihood threshold;
    - second user operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters; and
      
      correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters.
  - 10. Data processing apparatus as claimed in any preceding claim including file storage means for storing the recognised characters in a file;
    - means for selectively disabling one of the receipt of the recognised characters by said processing means and the recognition of speech by said speech recognition engine for a period of time;
      
      means for storing the received audio data during said period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing means, and for reading said audio message for playback by said audio playback means.
  - 11. Data processing apparatus as claimed in claim 10 wherein said storage reading means is controllable by a user to read said audio message at any time after said file has been input to said processing means until said processing means is no longer processing said file.
  - 12. Data processing apparatus as claimed in any preceding claim wherein said user operable selection means is operative to allow a user to select to playback the audio data for the most recent passage of dictated characters, or to select characters and play back the corresponding audio components.
  - 15. A data processing arrangement as claimed in claim 13 or claim 14 including editor contextual update means operable by a user to select recognised characters which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer means being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction file to read the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 18. A data processing arrangement as claimed in claim 17 wherein said audio message reading means is controllable by a user to read said audio message at any time the associated characters are being processed by said editor processing means.
  - 20. An editor work station as claimed in claim 19 wherein said recognition data includes alternative characters, said editor display means including means to display a choice list comprising the alternative characters, and said editor correcting means including means to select one of the alternative characters or to enter a new character.
  - 21. An editor work station as claimed in claim 19 or claim 20 including editor contextual update means operable by a user to select recognised characters which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer means being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction file to read the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 57. A data processing apparatus as claimed in claim 1 including storage means for storing the characters, the link data and the audio data.

13. A data processing arrangement comprising:
- a data processing apparatus, the data processing apparatus comprising;
  
  input means for receiving recognition data from a speech recognition engine and audio data, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters;
  
  processing means for receiving and processing the input recognised characters to at least one of replace, insert, move and position the recognised characters to form a processed character string;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string, and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  storage means for storing said recognition data and audio data received from said input means, and for storing said link data;
  
  display means for displaying the characters received and processed by said processing means;
  
  user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string; and
  
  an editor work station comprising;
  
  data reading means for reading the characters, link data, and audio data from said data processing apparatus;
  
  editor processing means for processing the characters;
  
  editor link means for linking the audio data to the character component position using the link data;
  
  editor display means for displaying the characters being processed;
  
  editor correction means for selecting and correcting any displayed characters which have been incorrectly recognised;
  
  editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
  
  editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
  
  data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
  
  said data processing apparatus including correction file reading means for reading said character correction file to pass the data contained therein to said speech recognition engine for the updating of the models used by said speech recognition engine.
- View Dependent Claims (14, 16, 17, 19, 22, 24, 25, 26, 28, 30, 35, 38, 39, 41, 42)
- - 14. A data processing arrangement as claimed in claim 13 wherein said recognition data includes alternative characters, said editor display means including means to display a choice list comprising the alternative characters, and said editor correcting means including means to select one of the alternative characters or to enter a new character.
  - 16. A data processing arrangement as claimed in any one of claims 13 to 15 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said recognition data for the characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a character to replace an incorrectly recognised character highlighted in the text; and
      
      second editor correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters.
  - 17. A data processing arrangement as claimed in any one of claims 13 to 16 wherein said data processing apparatus includes file storage means for storing the recognised characters in a file;
    - means for selectively disabling one of the receipt of the recognised characters by said processing means and the recognition of speech by said speech recognition engine for a period of time;
      
      means for storing the received audio data during said period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing means, and for reading said audio message for playback by said audio playback means;
      
      said editor work station including audio message reading means for reading the audio message associated with characters being processed by said editor processing means for playback by said editor audio playback means.
  - 19. An editor work station for use with the data processing arrangement as claimed in any one of claims 13 to 18, said editor work station comprising:
    - data reading means for reading the characters, link data, and audio data from said data processing apparatus;
      
      editor processing means for processing characters;
      
      editor link means for linking the audio data to the character component position using the link data;
      
      editor display means for displaying the read characters;
      
      editor correction means for selecting and correcting any displayed characters which have been incorrectly recognised;
      
      editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
      
      editor speech recognition update means for storing the corrected character and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
      
      data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine.
  - 22. An editor work station as claimed in any one of claims 19 to 21 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said recognition data for the characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a character to replace an incorrectly recognised character highlighted in the character string; and
      
      second editor correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters.
  - 24. A method as claimed in claim 23 wherein the characters, the link data and the audio data is stored, the method including the step of reading the stored characters into the processor and reading the stored link data, whereby any of the read characters can be selected for audio playback, the read back data links the selected read characters to any corresponding stored audio data, and corresponding audio data is read and played back.
  - 25. A method as claimed in claim 23 or claim 24 including the steps of selecting any displayed characters which have been incorrectly recognised, playing back any audio component corresponding to the selected characters to aid correction, correcting the incorrectly recognised characters, and sending the corrected characters and audio identifier for the audio component to the corrected character to the speech recognition engine.
  - 26. A method as claimed in claim 25 wherein said recognition data includes alternative characters;
    - the method includes the step of displaying a choice list when any displayed characters have been selected for correction, said choice list comprising said alternative characters; and
      
      said correcting step comprises selecting one of the alternative characters or inputting a new character.
  - 28. A method as claimed in claim 27 wherein said recognised character string is formed of a plurality of separately dictated passages of characters, the method including the steps of storing said audio data for each dictated passage of characters in separate files, said link data including a list identifying the files and positions in the files of the audio components in said audio data corresponding to the character locations in the characters.
  - 30. A method as claimed in claim 25 or claim 26 including the steps of selecting recognised characters which are to be used to provide contextual correcting parameters to said speech recognition engine, and sending the contextual correcting parameters to said speech recognition engine.
  - 35. A method of processing data comprising the steps of:
    - at an author work station, carrying out the method as claimed in claim 23 wherein the recognised characters, the link data and the audio data are stored; and
      
      at an editor work station, obtaining the stored characters, link data and audio data from the author work station;
      
      inputting the characters into a processor;
      
      linking the audio data to the character component positions using the link data;
      
      displaying the characters being processed;
      
      selecting any displayed characters which have been incorrectly recognised;
      
      playing back any audio component corresponding to the selected characters to aid correction;
      
      correcting the incorrectly recognised characters;
      
      storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
      
      transferring the character correction file to the author work station for later updating of models used by said speech recognition engine;
      
      wherein, at a later time, said character correction file is read at said author work station to pass the data contained therein to said speech recognition engine for updating of said models.
  - 38. A method as claimed in any one of claims 35 to 37 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, the method including the steps at said editor work station of automatically detecting possible errors in recognition of characters by scanning the likelihood indicators for the characters;
    - detecting if the likelihood indicator for a character is below a likelihood threshold, whereby characters having a likelihood indicator below the likelihood threshold are displayed highlighted;
      
      selecting a character to replace an incorrectly recognised character highlighted in the character string; and
      
      replacing the incorrectly recognised character with the selected character to correct the characters.
  - 39. A method as claimed in any one of claims 35 to 38 wherein the method includes the steps of:
    - at said author work station, storing the characters as a file;
      
      selectively disabling one of the importation of recognised characters into the processor and the recognition of speech by said speech recognition engine for a period of time;
      
      storing the received audio data during said period of time as an audio message associated with the file;
      
      at a later time, reading said file for input to the processor; and
      
      , at said editor work station, reading the audio message associated with the file being processed by the processor, and playing back the read audio message.
  - 41. A method as claimed in any one of claims 35 to 40 including the step of allowing a user of the editor work station to playback the audio data for the most recent passage of dictated characters.
  - 42. A data processing arrangement as claimed in any one of claims 13 to 18 comprising a plurality of said data processing apparatus connected to a network, and at least one editor work station, wherein each editor work station can access and edit stored characters and audio data on a plurality of said data processing apparatus.

23. A data processing method comprising the steps of:
- receiving recognition data from a speech recognition engine and audio data, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters;
  
  storing the received audio data;
  
  inputting the recognised characters to a processor for the processing of the characters to at least one of replace, insert, move and position the characters to form a processed character string;
  
  forming link data linking the audio identifiers to the character component positions in the character string and updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  displaying the characters input to and processed by the processor;
  
  selecting displayed characters for audio playback, whereby said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  playing back the selected audio components in the order of the character component positions in the character string or processed character string.
- View Dependent Claims (27, 29, 31, 32, 33, 34, 36, 37, 40)
- - 27. A method as claimed in any one of claims 23 to 26 wherein said link data comprises a list of character locations in the characters and positions of the corresponding audio components in the audio data.
  - 29. A method as claimed in any one of claims 23 to 28 wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine, the method including the steps of detecting said recognition status indicators, displaying characters which are still being recognised differently to the characters which have been recognised, and forming said link data by linking the positions of the recognised characters in the characters to the positions of the corresponding audio components in the audio data.
  - 31. A method as claimed in any one of claims 23 to 30 wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct, the method including the steps of detecting possible errors in recognition of characters in the character string by scanning the likelihood indicators for the recognised characters, and detecting if the likelihood indicator for a character is below a likelihood threshold;
    - highlighting the character having a likelihood indicator below the likelihood threshold;
      
      if the highlighted character is an incorrectly recognised character, selecting a character to replace an incorrectly recognised character highlighted in the characters; and
      
      replacing the incorrectly recognised character with the selected character to correct the characters.
  - 32. A method as claimed in any one of claims 23 to 31 including the steps of storing the characters as a file;
    - selectively disabling one of the importation of recognised characters into the processor and the recognition of speech by said speech recognition engine for a period of time;
      
      storing the received audio data during said period of time as an audio message associated with the file;
      
      at a later time, reading said file for input to the processor; and
      
      allowing a user to select whether to read and playback said audio message associated with said file.
  - 33. A method as claimed in claim 32 wherein said audio message can be read and played back at any time said file is open in the processor.
  - 34. A method as claimed in any one of claims 23 to 33 including the step of allowing a user to select to playback the audio data for the most recent passage of dictated characters.
  - 36. A method as claimed in claim 35 wherein said recognition data includes alternative characters, the correcting step at said editor work station, comprising the steps of displaying a choice list comprising the alternative characters, and selecting one of the alternative characters or entering a new character.
  - 37. A method as claimed in claim 35 or claim 36 including the steps at said editor work station of selecting recognised characters which are to be used to provide contextual correcting parameters to said speech recognition engine at said author work station;
    - storing said contextual correcting parameters in a contextual correction file; and
      
      transferring said contextual correction file to said author work station for later updating of models used by said speech recognition engine; and
      
      at said author work station, at a later time, reading the transferred contextual correction file and passing the data contained therein to said speech recognition engine.
  - 40. A method as claimed in claim 39 wherein the audio message can be read and played back at any time said file is open in the processor.

43. Data processing apparatus comprising means for receiving recognition data from a speech recognition engine and corresponding audio data;
- the recognition data including recognised characters;
  
  display means for displaying the recognised characters;
  
  storage means for storing the recognised characters as a file;
  
  means for selectively disabling one of the display and storage of the recognised characters and the speech recognition engine for a period of time; and
  
  means for storing the received audio data during said period of time in said storage means as an audio message associated with the file.
- View Dependent Claims (44, 45, 47, 48)
- - 44. Data processing apparatus as claimed in claim 43 including reading means for reading the file for display on said display means and for reading said audio message associated with the file;
    - and audio play back means for playing back the read audio message.
  - 45. Data processing apparatus comprising means for reading a file and associated audio message stored using the data processing apparatus of claim 43, display means for displaying the file, and audio playback means for playing back the audio message.
  - 47. Data processing apparatus as claimed in claim 46 including reading means for reading the file for display on said display means and for reading the corresponding audio data;
    - and audio playback means for playing back the read audio data.
  - 48. Data processing apparatus comprising means for reading a file and corresponding audio data stored using the data processing apparatus of claim 46, display means for displaying the file, and audio playback means for playing back the read audio data.

46. Data processing apparatus comprising means for receiving data from a speech recognition engine and corresponding audio data, the recognition data including recognised characters;
- display means for displaying the recognised characters;
  
  storage means for storing the recognised characters as a file and for storing the corresponding audio data.

49. Data processing apparatus comprising means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including recognised characters representing the recognised characters and audio identifier identifying the audio component corresponding to a character in the recognised characters;
- storage means for storing said audio data and the recognised characters;
  
  display means for displaying the recognised characters received from said speech recognition means or retrieved from said storage means;
  
  user operable selection and correction means for selecting and correcting any displayed recognised characters;
  
  audio playback means for playing back any audio component corresponding to the selected characters to aid correction; and
  
  speech recognition update means for sending the corrected character and the audio identifier for the audio component corresponding to the corrected character to the speech recognition engine.

50. Data correction apparatus comprising means for receiving recognition data from a speech recognition engine, said recognition data including recognised characters representing the most likely characters, and a likelihood indicator for each character indicating the likelihood that the character is correct;
- display means for displaying the recognised characters;
  
  automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold, whereby said display means highlights at least the first, if any, character having a likelihood indicator below the likelihood threshold;
  
  user operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters; and
  
  correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters.
- View Dependent Claims (51)
- - 51. Data processing apparatus as claimed in claim 50 including likelihood threshold adjustment means operable by a user to adjust and set the likelihood threshold to a desired level.

52. A computer usable medium having computer readable instructions stored therein for causing a processor in a data processing apparatus to process signals defining a string of characters and corresponding audio data to display the characters and selectively play the audio data, the instructions comprising instructions for:
- a) causing the processor to receive the signals from a speech recognition engine, the recognition signals including recognised characters and audio identifier identifying the audio components corresponding to character components in the recognised characters;
  
  b) causing the processor to process the signals to manipulate the characters;
  
  c) causing the processor to process the signals to form link data linking the audio identifier to the character component positions in the character string;
  
  d) causing the processor to generate an image of the characters on a display;
  
  e) causing the processor to receive a selection signal generated by a user and to identify any audio components corresponding to the selected characters; and
  
  f) causing the processor to send the identified audio components in the order of the character component positions in the characters to an audio play back device.

53. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data, the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine;
- b) causing the processor to generate an image of the characters on a display;
  
  c) causing the processor to store the characters as a file;
  
  d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time; and
  
  e) causing the processor to store the audio signal for the period of time as an audio message associated with the file.
- View Dependent Claims (54)
- - 54. A computer usable medium as claimed in claim 53 including instructions for a) causing the processor to read the stored characters and audio signal;
    - b) causing the processor to generate an image of the characters for display; and
      
      c) causing the processor to send the audio signal to an audio play back device.

55. A computer usable medium having computer readable instructions stored therein for causing a processor in a data processing apparatus to process signals defining a string of characters and corresponding audio data to store the characters and the audio data, the instructions comprising instructions for:
- a) causing the processor to receive the signals from a speech recognition engine;
  
  b) causing the processor to generate an image of the characters for display; and
  
  c) causing the processor to store the characters as a file and to store the corresponding audio signal.
- View Dependent Claims (58, 60, 61, 62)
- - 58. An editor work station for editing the text stored by the data processing apparatus of claim 57, the editor work station comprising reading means for reading the characters, link data, and audio data;
    - editor processing means for processing the characters;
      
      editor link means for linking the audio data to the character component positions using the link data;
      
      editor display means for displaying the characters being processed;
      
      editor correction means for selecting and correcting any displayed characters which have been incorrectly recognised;
      
      editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
      
      editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected characters in a character correction file for later reading by the speech recognition engine of said data processing apparatus to update models used by said speech recognition engine; and
      
      writing means for storing the correct characters and link data and the audio data.
  - 60. A data processing arrangement as claimed in claim 59 wherein said data processing apparatus includes processing means for receiving and processing the input recognised characters to replace, insert, move and/or position the recognised characters;
    - user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and
      
      audio playback means for playing back the selected audio components in the order of the character component positions in the character string.
  - 61. An editor work station for use with the data processing arrangement as claimed in claim 59, said editor work station comprising:
    - data reading means for reading the characters, link data, and audio data from said data processing apparatus;
      
      editor processing means for processing characters;
      
      editor link means for linking the audio data to the character component position using the link data;
      
      editor display means for displaying the read characters;
      
      editor correction means for selecting and correcting any displayed characters which have been incorrectly recognised;
      
      editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
      
      editor speech recognition update means for storing the corrected character and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
      
      data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine.
  - 62. Data processing apparatus for use with the data processing arrangement of claim 59, said data processing apparatus comprising:
    - input means for receiving recognition data and corresponding audio data from a speech recognition engine, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to character components of the recognised characters;
      
      link means for forming link data linking the audio identifiers to the character component positions in the character string;
      
      storage means for storing said audio data received from said input means, said link data, and said characters;
      
      display means for displaying the recognised characters; and
      
      correction file reading means for reading said character correction file and for passing the data contained therein to said speech recognition engine.

56. A computer usable medium having computer readable instructions stored therein for causing a processor in a data processing apparatus to process signals defining a string of characters and corresponding audio data from a speech recognition engine to update the models used by speech recognition engine, the instructions comprising instructions for:
- a) causing the processor to receive the characters, audio data, and audio identifiers from the speech recognition engine, said audio identifier identifying audio components corresponding to components in the characters;
  
  b) causing the processor to store the audio data and the characters, in a storage device;
  
  c) causing the processor to generate an image for display of the characters received from the speech recognition engine or retrieved from the storage device;
  
  d) causing the processor to receive a selection signal generated by a user to select characters which have been incorrectly recognised by the speech recognition engine;
  
  e) causing the processor to retrieve any audio component from the storage device corresponding to the selected characters and to send the retrieved audio to an audio play back device;
  
  f) causing the processor to receive corrected characters input by a user and to replace the incorrect characters with the corrected characters; and
  
  g) causing the processor to send the corrected characters and the audio identifier for the audio component corresponding to the corrected characters to the speech recognition engine for the correction of models used by the speech recognition engine.

59. A data processing arrangement comprising:
- data processing apparatus comprising;
  
  input means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to character components of the recognised characters;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string;
  
  storage means for storing said audio data received from said input means, said link data, and said recognised characters; and
  
  display means for displaying the recognised characters; and
  
  an editor work station comprising;
  
  data reading means for obtaining the characters, link data, and audio data from said data processing apparatus;
  
  editor processing means for processing the characters;
  
  editor link means for linking the audio data to the character component position using the link data;
  
  editor display means for displaying the characters being processed;
  
  editor correction means for selecting and correcting any displayed characters which have been incorrectly recognised;
  
  editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
  
  editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
  
  data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
  
  said data processing apparatus including correction file reading means for reading said character correction file to pass the data contained therein to said speech recognition engine.

63. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data, the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine;
- b) causing the processor to generate an image of the characters on a display;
  
  c) causing the processor to store the characters as a file;
  
  d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time; and
  
  e) causing the processor to store the received audio data during said period of time as an audio message associated with the file.
- View Dependent Claims (64)
- - 64. A computer usable medium as claimed in claim 63 including instructions for a) causing the processor to read the stored characters and audio signal;
    - b) causing the processor to generate an image of the characters for display; and
      
      c) causing the processor to send the audio data to an audio play back device.

65. Data processing apparatus comprising input means for inputting audio data;
- means for receiving recognition data from a speech recognition engine, said recognition data including recognised characters corresponding to input audio data;
  
  storage means for storing the recognised characters in a file and for storing the audio data; and
  
  user operable selection means for selecting one of the recognised characters and corresponding audio data for storage in said storage means, or the audio data for which there are no corresponding recognised characters for storage in said storage means in association with a file of recognised characters.
- View Dependent Claims (66, 68, 70)
- - 66. Data processing apparatus according to claim 65 including means for reading the file of recognised characters and the audio data associated with the file for which there are no corresponding recognised characters, and audio output means for audibly outputting the read audio data associated with the file.
  - 68. A data processing method according to claim 67 including the step of reading the file of recognition data and the audio data associated with the file for which there is no corresponding recognition data, and audibly outputting the read audio data associated with the file.
  - 70. Speech recognition apparatus according to claim 69 including means for reading one of the stored files of recognised characters for visible output on said visible output means, and for reading any audio message associated with the read file;
    - and audible output means for audibly outputting any read audio message.

67. A data processing method comprising the steps of:
- inputting audio data; and
  
  selecting one of receiving recognition data from a speech recognition engine and storing the recognition data in a file, said recognition data including recognised characters corresponding to input audio data, or storing the input audio data for which there is no corresponding recognition data in association with a file of recognition data.

69. Speech recognition apparatus comprising:
- input means for inputting speech data;
  
  recogniser means for receiving input speech data and for selectively performing speech recognition to generate recognised characters;
  
  output means for visibly outputting the recognised characters, storage means for storing the input speech data and recognised characters;
  
  storage control means for controlling said storage means to store recognised characters from said recogniser means corresponding to a portion of input speech data as a file when said recogniser means is operating, and for controlling said storage means to store a portion of the input speech data in association with a file of recognised characters as an audio message when said recogniser means is not operating.
- View Dependent Claims (71)
- - 71. Speech recognition apparatus comprising means for reading a file of recognised characters and any associated audio message stored using the apparatus of claim 69;
    - output means for visibly outputting the recognised characters of the read file; and
      
      audible output means for audibly outputting any read audio message.

72. A speech recognition method comprising the steps of inputting speech data;
- selectively performing speech recognition on input speech data to generate the recognised characters;
  
  visibly outputting the recognised characters;
  
  storing recognised characters corresponding to a portion of input speech data as a file when speech recognition is performed; and
  
  storing a portion of the input speech data in association with a file of recognised characters as an audio message when speech recognition is not performed.
- View Dependent Claims (73, 74)
- - 73. A speech recognition method according to claim 72 including the steps of reading one of the stored files of recognised characters for visible output reading any audio message associated with the read file;
    - and audibly outputting any read audio message.
  - 74. A speech recognition method comprising the steps of reading a file of recognised characters and any associated audio message stored using the method of claim 72;
    - visibly outputting the recognised characters of the read file; and
      
      audibly outputting any read audio message.

75. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data, the instructions comprising instructions for a) causing the processor to receive audio data signals;
- b) causing the processor to receive recognition data signals from a speech recognition engine; and
  
  c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means, or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals.
- View Dependent Claims (76)
- - 76. A computer usable medium according to claim 75 including instructions for a) causing the processor to read the stored recognition data signals and audio data signals;
    - and b) causing the processor to generate an audible output using the read audio data signals.

77. A computer usable medium having computer readable instructions stored therein for causing a processor in a speech recognition apparatus to process signal defining recognised characters and audio data to store the recognised characters and audio data, the instructions comprising instructions for a) causing the processor to receive input speech data;
- b) causing the processor to perform speech recognition on the input speech data to generate recognised characters c) causing the processor to visibly output the recognised characters;
  
  d) causing the processor to store the input speech data and recognised characters, in storage means;
  
  e) causing the processor to control said storage means to store recognised characters corresponding to a portion of input speech data as a file when speech recognition is being carried out, and to control said storage means to store a portion of the input speech data in association with a file of recognised characters as an audio message when speech recognition is not being carried out.
- View Dependent Claims (78)
- - 78. A computer usable medium according to claim 77 including instructions for a) causing the processor to read one of the stored files of recognised characters and any associated audio message;
    - b) causing the processor to visibly output the read recognised characters; and
      
      c) causing the processor to audibly output any read audio message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Allvoice Developments US LLC
Original Assignee
Allvoice Computing PLC (Allvoice Developments US LLC)
Inventors
Mitchell, John C., Daniel, Nicholas John, Corbett, Steven Norman, Heard, Alan James

Granted Patent

US 6,961,700 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G06F 40/137   Hierarchical processing, e....

G06F 40/58   Use of machine translation,...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 2015/225   Feedback of the input speech

Method and apparatus for processing the output of a speech recognition engine

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

78 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for processing the output of a speech recognition engine

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

78 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links