Automated proofreading using interface linking recognized words to their audio data while text is being changed

DC CAFC

US 5,799,273 A
Filed: 09/27/1996
Issued: 08/25/1998
Est. Priority Date: 09/24/1996
Status: Expired due to Term

- Alert
- Pin

First Claim

Patent Images

1. Data processing apparatus comprising:

input means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each recognised word;

storage means for storing said audio data received from said input means;

interface application program means comprising means for receiving the input recognised words, means for placing the recognised words into positions in text in a processing application program means to allow the processing of the recognised words to change the positions of the recognised words to form a processed word string, means for determining the positions of the recognised words in said processing application program means, means for monitoring changes in the positions of the recognised words, and means for forming link data linking the audio data to the recognised words, said link data comprising the audio identifiers and the determined positions of corresponding recognised words, said interface application program means including means for updating said link data in response to monitored changes in positions of the recognised words;

display means for displaying the recognised words received and processed by said processing application program means;

user operable selection means for selecting at least one word in the displayed words, said interface application program means including means for identifying any audio components, if present, which are linked to the at least one selected word; and

audio playback means for playing back any identified audio components in the order of the word positions in the word string or the processed word string.

View all claims

5 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

Data processing apparatus is disclosed for receiving recognition data from a speech recognition engine and its corresponding dictated audio data where the recognition data includes recognised words or characters. A display displays the recognised words or characters and the recognised words or characters are stored as a file together with the corresponding audio data. Link data is formed to link the position of the words or characters in the file and the position of the corresponding audio component in the audio data. The recognised words or characters can be processed without loosing the audio data.

Citations

78 Claims

1. Data processing apparatus comprising:
- input means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each recognised word;
  
  storage means for storing said audio data received from said input means;
  
  interface application program means comprising means for receiving the input recognised words, means for placing the recognised words into positions in text in a processing application program means to allow the processing of the recognised words to change the positions of the recognised words to form a processed word string, means for determining the positions of the recognised words in said processing application program means, means for monitoring changes in the positions of the recognised words, and means for forming link data linking the audio data to the recognised words, said link data comprising the audio identifiers and the determined positions of corresponding recognised words, said interface application program means including means for updating said link data in response to monitored changes in positions of the recognised words;
  
  display means for displaying the recognised words received and processed by said processing application program means;
  
  user operable selection means for selecting at least one word in the displayed words, said interface application program means including means for identifying any audio components, if present, which are linked to the at least one selected word; and
  
  audio playback means for playing back any identified audio components in the order of the word positions in the word string or the processed word string.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. Data processing apparatus as claimed in claim 1 wherein said storage means also stores the recognised words and the link data, the apparatus including storage reading means for reading the stored recognised words into said processing application program means and for reading the stored link data for use by said interface application program means.
  - 3. Data processing apparatus as claimed in claim 1 including user operable correction means for selecting a displayed recognised word which has been incorrectly recognised;
    - correction audio playback means for controlling said audio playback means to play back any audio component corresponding to the selected word to aid correction; and
      
      speech recognition update means for sending the corrected word and the audio identifier for the audio component corresponding to the corrected word to the speech recognition engine.
  - 4. Data processing apparatus as claimed in claim 3 wherein said recognition data includes alternative words, said display means including means to display a choice list comprising the alternative words, and said selecting and correcting means includes means to select one of the alternative words or to enter a new word.
  - 5. Data processing apparatus as claimed in claim 1 wherein said audio identifiers comprise a list of positions of the corresponding audio components in the audio data.
  - 6. Data processing apparatus as claimed in claim 5 wherein said word string is formed of a plurality of separately dictated passages of words, said storage means stores said audio data for each dictated passage of words in a separate file, and said memory means stores a list identifying the files and positions in the files of the audio components in said audio data corresponding to the word locations in the word string.
  - 7. Data processing apparatus as claimed in claim 1 wherein said recognition data includes recognition status indicators to indicate whether each recognised word is a word finally selected as recognised by said speech recognition engine or a word which is the most likely at that time but which is still being recognised by said speech recognition engine, the apparatus including status detection means for detecting said recognition status indicators, and display control means to control said display means to display words which are still being recognised differently to words which have been recognised, said interface application program means being responsive to said recognition status indicators to link the recognised words to the corresponding audio component in the audio data.
  - 8. Data processing apparatus as claimed in claim 1 including contextual update means operable by a user to select recognised words which are to be used to provide contextual correcting parameters to said speech recognition engine, and to send said contextual correcting parameters to said speech recognition engine.
  - 9. Data processing apparatus as claimed in claim 1 wherein said recognition data includes a likelihood indicator for each word in the word string indicating the likelihood that the word is correct, and said link means stores the likelihood indicators, the apparatus includingautomatic error detection means for detecting possible errors in recognition of words in the recognised words by scanning the likelihood indicators in said link means for the recognised words and detecting if the likelihood indicator for a word is below a likelihood threshold, whereby said display means highlights the word having a likelihood indicator below the likelihood threshold;
    - second user operable selection means for selecting a word to replace an incorrectly recognised word highlighted in the recognised words; and
      
      correction means for replacing the incorrectly recognised word with the selected word to correct the recognised words.
  - 10. Data processing apparatus as claimed in claim 1 includingfile storage means for storing the recognised words in a file;
    - means for selectively disabling one of the receipt of the recognised words by said processing application program means and the recognition of speech by said speech recognition engine for a period of time, means for storing the audio data for the period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing application program means, and for reading said audio message for playback by said audio playback means.
  - 11. Data processing apparatus as claimed in claim 10 wherein said storage reading means is controllable by a user to read said audio message at any time after said file has been input to said processing application program means until said processing application program means is no longer processing said file.
  - 12. Data processing apparatus as claimed in claim 1 wherein said user operable selection means is operative to allow a user to select to playback the audio data for the most recent passage of dictated words, or to select words and play back the corresponding audio components.
  - 13. Data processing apparatus as claimed in claim 1 wherein said interface application program means is operative to determine and monitor the positions of the recognised words by determining and monitoring the position of a first letter of each of the recognised words in text of said processing application program means, and said link data comprises the audio identifiers and the determined positions of the first letter of corresponding recognised words.
  - 14. Data processing apparatus as claimed in claim 1 further comprising processing means operative under the control of a computer operating system, wherein said interface application program means comprises an interface application program implemented from within said computer operating system, said processing application program means comprises a processing application program implemented from within said computer operating system, and said interface application program is operative to determine and monitor the positions of the recognised words using operating system functions communicated via the computer operating system.

15. A data processing arrangement comprising:
- a data processing apparatus, the data processing apparatus comprising;
  
  input means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each recognised word;
  
  interface application program means comprising means for receiving the input recognised words, means for placing the recognised words into positions in text in a processing application program means to allow the processing of the recognised words to change the positions of the recognised words to form a processed word string, means for determining the positions of the recognised words in said processing application program means, means for monitoring changes in the positions of the recognised words, and means for forming link data linking the audio data to the recognised words, said link data comprising the audio identifiers and the determined positions of corresponding recognised words, said interface application program means including means for updating said link data in response to monitored changes in positions of the recognised words;
  
  storage means for storing said recognition data and audio data received from said input means, and for storing said link data;
  
  display means for displaying the recognised words received and processed by said processing application program means;
  
  user operable selection means for selecting at least one word in the displayed words, said interface application program means including means for identifying any audio components, if present, which are linked to the at least one selected word; and
  
  audio playback means for playing back any identified audio components in the order of the word positions in the word string or the processed word string; and
  
  an editor work station comprising;
  
  data reading means for reading the words, link data, and audio data from said data processing apparatus;
  
  editor processing means for processing the words;
  
  editor link means for linking the audio data to the word positions using the link data;
  
  editor display means for displaying the words being processed;
  
  editor correction means for selecting and correcting any displayed words which have been incorrectly recognised;
  
  editor audio playback means for playing back an audio component corresponding to any selected words to aid correction;
  
  editor speech recognition update means for storing the corrected words and the audio identifier for the audio component corresponding to the corrected word in a word correction file; and
  
  data transfer means for transferring the word correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
  
  said data processing apparatus including correction file reading means for reading said word correction file to pass the data contained therein to said speech recognition engine for the updating of the models used by said speech recognition engine.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 16. A data processing arrangement as claimed in claim 15 wherein said recognition data includes alternative words, said editor display means including means to display a choice list comprising the alternative words, and said editor correcting means includes means to select one of the alternative words or to enter a new word.
  - 17. A data processing arrangement as claimed in claim 15 including editor contextual update means operable by a user to select recognised words which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer means being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction file to read the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 18. A data processing arrangement as claimed in claim 15 wherein said recognition data includes a likelihood indicator for each word in the word string indicating the likelihood that the word is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of words in the recognised words by scanning the likelihood indicators in said recognition data for the words and detecting if the likelihood indicator for a word is below a likelihood threshold, whereby said editor display means highlights words having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a word to replace an incorrectly recognised word highlighted in the text; and
      
      second editor correction means for replacing the incorrectly recognised word with the selected word to correct the recognised words.
  - 19. A data processing arrangement as claimed in claim 15 wherein said data processing apparatus includes file storage means for storing the recognised words in a file;
    - means for selectively disabling one of the receipt of the recognised words by said processing application program means and the recognition of speech by said speech recognition engine for a period of time;
      
      means for storing the audio data during the period of time in said storage means as an audio message associated with the file; and
      
      storage reading means for reading said file for input to said processing application program means, and for reading said audio message for playback by said audio playback means;
      
      said editor work station including audio message reading means for reading the audio message associated with words being processed by said editor processing means for playback by said editor audio playback means.
  - 20. A data processing arrangement as claimed in claim 19 wherein said audio message reading means is controllable by a user to read said audio message at any time the associated words are being processed by said editor processing means.
  - 21. An editor work station for use with the data processing arrangement as claimed in claim 15, said editor work station comprising:
    - data reading means for reading the words, link data, and audio data from said data processing apparatus;
      
      editor processing means for processing words;
      
      editor link means for linking the audio data to the word positions using the link data;
      
      editor display means for displaying the read words;
      
      editor correction means for selecting and correcting any displayed words which have been incorrectly recognised;
      
      editor audio playback means for playing back any audio component corresponding to the selected words to aid correction;
      
      editor speech recognition update means for storing the corrected word and the audio identifier for the audio component corresponding to the corrected word in a character correction file; and
      
      data transfer means for transferring the word correction file to said data processing apparatus for later updating of models used by said speech recognition engine.
  - 22. An editor work station as claimed in claim 21 wherein said recognition data includes alternative words, said editor display means including means to display a choice list comprising the alternative words, and said editor correcting means includes means to select one of the alternative words or to enter a new word.
  - 23. An editor work station as claimed in claim 21 including editor contextual update means operable by a user to select recognised words which are to be used to provide contextual correcting parameters to said speech recognition engine of said data processing apparatus, and to store said contextual correcting parameters in a contextual correction file;
    - said data transfer means being responsive to the contextual correction file to transfer the contextual correction file to said data processing apparatus for later updating of models used by said speech recognition engine;
      
      said correction file reading means of said data processing apparatus being responsive to the contextual correction file to read the contextual correction file to pass the data contained therein to said speech recognition engine.
  - 24. An editor work station as claimed in claim 21 wherein said recognition data includes a likelihood indicator for each word in the word string indicating the likelihood that the word is correct, and said link data includes the likelihood indicators, said editor work station including editor automatic error detection means for detecting possible errors in recognition of words in the recognised words by scanning the likelihood indicators in said recognition data for the words and detecting if the likelihood indicator for a word is below a likelihood threshold, whereby said editor display means highlights characters having a likelihood indicator below the likelihood threshold;
    - editor selection means for selecting a word to replace an incorrectly recognised word highlighted in the word string; and
      
      second editor correction means for replacing the incorrectly recognised word with the selected word to correct the recognised word.
  - 25. A data processing arrangement as claimed in claim 15 comprising a plurality of said data processing apparatus connected to a network, and at least one editor work station, wherein each editor work station can access and edit stored words and audio data on a plurality of said data processing apparatus.
  - 26. A data processing arrangement as claimed in claim 15 wherein said interface application program means is operative to determine and monitor the positions of the recognised words by determining and monitoring the position of a first letter of each of the recognised words in text of said processing application program means, and said link data comprises the audio identifiers and the determined positions of the first letter of corresponding recognised words.
  - 27. Data processing arrangement as claimed in claim 15 further comprising processing means operative under the control of a computer operating system, wherein said interface application program means comprises an interface application program implemented from within said computer operating system, said processing application program means comprises a processing application program implemented from within said computer operating system, and said interface application program is operative to determine and monitor the positions of the recognised words using operating system functions communicated via the computer operating system.

28. A data processing method comprising:
- receiving recognition data from a speech recognition engine and corresponding audio data in an interface application program, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each recognised word;
  
  storing the audio data;
  
  inputting the recognised words into a processing application program which places the words in positions in the application, and which processes the recognised words such that positions of the recognised words are changed to form a processed word string;
  
  using the interface application program to determine the positions of the recognised words in the processing application program, monitor changes in the positions of the recognised words, and to form link data linking the audio data to the recognised words, said link data comprising the audio identifiers and the determined positions of corresponding recognised words, said link data being updated in response to monitored changes in positions of the recognised words;
  
  displaying the recognised words input to and processed by the processor application;
  
  selecting at least one displayed word, whereby said link data identifies any audio components, if present, which are linked to the at least one selected word; and
  
  playing back any selected audio components in the order of the word positions in the word string.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 29. A method as claimed in claim 28 wherein the words and the link data are also stored, the method including:
    - reading the stored words into the processor application program and reading the stored link data.
  - 30. A method as claimed in claim 28 including:
    - selecting any displayed words which have been incorrectly recognised, playing back an audio component corresponding to any selected words to aid correction, correcting the incorrectly recognised words, and sending the corrected word and audio identifier for the audio component corresponding to the corrected word to the speech recognition engine.
  - 31. A method as claimed in claim 30 wherein said recognition data includes alternative words, the method includes displaying a choice list when any displayed words have been selected for correction, said choice list comprising said alternative words;
    - and said correction of the incorrectly recognised words comprises selecting one of the alternative words or inputting a new word.
  - 32. A method as claimed in claim 30 including:
    - selecting recognised words which are to be used to provide contextual correcting parameters to said speech recognition engine, and sending the contextual correcting parameters to said speech recognition engine.
  - 33. A method as claimed in claim 28 wherein said audio identifiers comprise a list of positions of the corresponding audio components in the audio data.
  - 34. A method as claimed in claim 33 wherein said word string is formed of a plurality of separately dictated passages of words, the method including:
    - storing said audio data for each dictated passage of words in separate files, said link data including a list identifying the files and positions in the files of the audio components in said audio data corresponding to the word locations in the word string.
  - 35. A method as claimed in claim 28 wherein said recognition data includes recognition status indicators to indicate whether each recognised word is a word finally selected as recognised by said speech recognition engine or a word which is the most likely at that time but which is still being recognised by said speech recognition engine, the method including:
    - detecting said recognition status indicators, displaying words which are still being recognised differently to the words which have been recognised, and forming said link data by linking the positions of the recognised words in the word string to the positions of the corresponding audio components in the audio data.
  - 36. A method as claimed in claim 28 wherein said recognition data includes a likelihood indicator for each word in the word string indicating the likelihood that the word is correct, the method including:
    - detecting possible errors in recognition of words in the word string by scanning the likelihood indicators for the recognised words, and detecting if the liklihood indicator for a word having a likelihood threshold;
      
      highlighting the word having a likelihood indicator below the likelihood threshold;
      
      if the highlighted word is an incorrectly recognised word, selecting a word to replace an incorrectly recognised word highlighted in the recognised words; and
      
      replacing the incorrectly recognised word with the selected word to correct the recognised words.
  - 37. A method as claimed in claim 28 including:
    - storing the recognised words as a file;
      
      selectively disabling one of the importation of recognised words into the processor application program and the recognition of speech by said speech recognition engine for a period of time;
      
      storing the audio data during the period of time as an audio message associated with the file;
      
      at a later time, reading said file for input to the processor application program; and
      
      allowing a user to select whether to read and playback said audio message associated with said file.
  - 38. A method as claimed in claim 37 wherein said audio message can be read and played back at any time said file is open in the processor.
  - 39. A method as claimed in claim 28 including allowing a user to select to playback the audio data for the most recent passage of dictated words.
  - 40. A method of processing data comprising:
    - at an author work station, carrying out the method as claimed in claim 28 wherein the recognised words, the link data and the audio data are stored; and
      
      at an editor work station, obtaining the stored recognised words, link data and audio data from the author work station;
      
      inputting the recognised words into a processor application program;
      
      linking the audio data to the word positions using the link data;
      
      displaying the words being processed;
      
      selecting any displayed words which have been incorrectly recognised;
      
      playing back any audio component corresponding to the selected words to aid correction;
      
      correcting the incorrectly recognised words;
      
      storing the corrected word and the audio identifier for the audio component corresponding to the corrected word in a word correction file; and
      
      transferring the word correction file to the author work station for later updating of models used by said speech recognition engine;
      
      wherein, at a later time, said word correction file is read at said author work station to pass the data contained therein to said speech recognition engine for updating of said models.
  - 41. A method as claimed in claim 40 wherein said recognition data includes alternative words, and the correction of the incorrectly recognised words at said editor work station, comprises:
    - displaying a choice list comprising the alternative words, and selecting one of the alternative words or entering a new word.
  - 42. A method as claimed in claim 40 including at said editor work station:
    - selecting recognised words which are to be used to provide contextual correcting parameters to said speech recognition engine at said author work station;
      
      storing said contextual correcting parameters in a contextual correction file; and
      
      transferring said contextual correction file to said author work station for later updating of models used by said speech recognition engine; and
      
      at said author work station, at a later time, reading the transferred contextual correction file and passing the data contained therein to said speech recognition engine.
  - 43. A method as claimed in claim 40 wherein said recognition data includes a likelihood indicator for each word in the word string indicating the likelihood that the word is correct, the method including at said editor work station:
    - automatically detecting possible errors in recognition of words by scanning the likelihood indicators for the words;
      
      detecting if the likelihood indicator for a word is below a likelihood threshold, whereby words having a likelihood indicator below the likelihood threshold are displayed highlighted;
      
      selecting a word to replace an incorrectly recognised word highlighted in the word string; and
      
      replacing the incorrectly recognised word with the selected word to correct the recognised words.
  - 44. A method as claimed in claim 40 wherein the method includes:
    - at said author work station, storing the words as a file;
      
      selectively disabling one of the importation of recognised words into the processor application program and the recognition of speech by said speech recognition engine for a period of time;
      
      storing the audio data for the period of time as an audio message associated with the file;
      
      at a later time, reading said file for input to the processor application program; and
      
      ,at said editor work station, reading the audio message associated with the file being processed by the processor application program, and playing back the read audio message.
  - 45. A method as claimed in claim 44 wherein the audio message can be read and played back at any time said file is open in the processor application program.
  - 46. A method as claimed in claim 40 including allowing a user of the editor work station to playback the audio data for the most recent passage of dictated words.
  - 47. A method as claimed in claim 40 wherein the positions of the recognised words are determined and monitored by determining and monitoring the position of a first letter of each of the recognised words in text of the processing application program, and said link data comprises the audio identifiers and the determined positions of the first letter of corresponding recognised words.
  - 48. A method as claimed in claim 40 wherein the interface application program and the processing application program are both implemented from within a computer operating system, and the positions of the recognised words in said processing application program are determined and monitored using operating system functions communicated via the computer operating system.
  - 49. A data processing method as claimed in claim 28 wherein the positions of the recognised words are determined and monitored by determining and monitoring the position of a first letter of each of the recognised words in text of the processing application program, and said link data comprises the audio identifiers and the determined positions of the first letter of corresponding recognised words.
  - 50. A method as claimed in claim 28 wherein the interface application program and the processing application program are both implemented from within a computer operating system, and the positions of the recognised words in said processing application program are determined and monitored using operating system functions communicated via the computer operating system.

51. A computer usable medium having computer readable instructions stored therein for causing a processor in a data processing apparatus to process recognition signals defining a string of recognised words and corresponding audio data signals to display the words and selectively play the audio data, the instructions comprising instructions for:
- a) causing the processor to receive the recognition signals from a speech recognition engine and the audio data signals, the recognition signals including a string of recognised words and audio identifiers identifying audio components corresponding to each recognised word;
  
  b) causing the processor to store the audio data;
  
  c) causing the processor to implement an interface application program which receives the recognised words and places the words in positions in a processing application program which can process the recognised words such that the positions of the recognised words are changed to form a processed word string;
  
  d) causing the processor to implement the interface application program to determine the positions of the recognised words in the processing application program and to monitor changes in the positions of the recognised words;
  
  e) causing the processor to implement the interface application program to form link data linking the audio data to the recognized words, wherein said link data comprises the audio identifiers and the determined positions of corresponding recognised words, and to update said link data in response to monitored changes in positions of the recognised words;
  
  f) causing the processor to generate an image of the recognised words on a display;
  
  g) causing the processor to receive a selection signal generated by a user for selecting at least one word and to identify audio components corresponding to the at least one selected word; and
  
  h) causing the processor to send the identified audio components in the order of the word positions in the word string to an audio play back device.

52. Data processing apparatus comprising:
- input means for receiving recognition data and corresponding audio data from a speech recognition engine, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters;
  
  storage means for storing said audio data received from said input means;
  
  processing means for receiving and processing the input recognised characters to at least one of replace, insert move and position the recognised characters to form a processed character string;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  display means for displaying the characters received by said processing means;
  
  user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters;
  
  audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string;
  
  file storage means for storing the recognised characters in a file;
  
  means for selectively disabling one of the receipt of the recognised characters by said processing means and the recognition of speech by said speech recognition engine for a period of time, means for storing the audio data for the period of time in said storage means as an audio message associated with the file; and
  
  storage reading means for reading said file for input to said processing means, and for reading said audio message for playback by said audio playback means.
- View Dependent Claims (53)
- - 53. Data processing apparatus as claimed in claim 52 wherein said storage reading means is controllable by a user to read said audio message at any time after said file has been input to said processing means until said processing means is no longer processing said file.

54. A data processing arrangement comprising:
- a data processing apparatus, the data processing apparatus comprising;
  
  input means for receiving recognition data and corresponding audio data from a speech recognition engine, said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters;
  
  processing means for receiving and processing the input recognised characters to at least one of replace, insert move and position the recognised characters to form a processed character string;
  
  link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  storage means for storing said recognition data and audio data received from said input means, and for storing said link data;
  
  display means for displaying the characters received by said processing means;
  
  user operable selection means for selecting characters in the displayed characters for audio playback, where said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string;
  
  file storage means for storing the recognised characters in a file;
  
  means for selectively disabling one of the receipt of the recognised characters by said processing means and the recognition of speech by said speech recognition engine for a period of time with means for storing the audio data for the period of time in said storage means as an audio message associated with the document;
  
  storage reading means for reading said document for input to said processing means, and for reading said audio message for playback by said audio playback means; and
  
  an editor work station comprising;
  
  data reading means for reading the characters, link data, and audio data from said data processing apparatus;
  
  editor processing means for processing the characters;
  
  editor link means for linking the audio data to the character component position using the link data;
  
  editor display means for displaying the characters being processed;
  
  editor correction means for selecting and correcting any displayed characters which have been incorrectly recognised;
  
  editor audio playback means for playing back any audio component corresponding to the selected characters to aid correction;
  
  editor speech recognition update means for storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file;
  
  data transfer means for transferring the character correction file to said data processing apparatus for later updating of models used by said speech recognition engine; and
  
  audio message reading means for reading the audio message associated with characters being processed by said editor processing means for playback by said editor audio playback means;
  
  said data processing apparatus including correction file reading means for reading said character correction file to pass the data contained therein to said speech recognition engine for the updating of the models used by said speech recognition engine.
- View Dependent Claims (55)
- - 55. A data processing arrangement as claimed in claim 54 wherein said audio message reading means is controllable by a user to read said audio message at any time the associated characters are being processed by said editor processing means.

56. A data processing method comprising:
- receiving recognition data and corresponding audio data from a speech recognition engine, said recognition data including recognised characters and audio identifiers identifying audio components corresponding to text components in the recognised text;
  
  storing the audio data;
  
  inputting the recognised characters to a processor for the processing of the characters to at least one of replace, insert move and position the characters to form a processed character string;
  
  forming link data linking the audio identifiers to the character component positions in the characters and updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  displaying the characters input to the processor;
  
  selecting displayed characters for audio playback, whereby said link data identifies any selected audio components, if present, which are linked to the selected characters;
  
  playing back the selected audio components in the order of the character component positions in the character string;
  
  storing the characters as a file;
  
  selectively disabling one of the importation of recognised characters into the processor and the recognition of speech by said speech recognition engine for a period of time;
  
  storing the audio data for the period of time as an audio message associated with the file;
  
  at a later time, reading said file for input to the processor; and
  
  allowing a user to select whether to read and playback said audio message associated with said file.
- View Dependent Claims (57)
- - 57. A method as claimed in claim 56 wherein said audio message can be read and played back at any time said file is open in the processor.

58. A method of processing data comprising:
- at an author work station;
  
  receiving recognition data and corresponding audio data from a speech recognition engine, said recognition data including recognised characters and audio identifiers identifying audio components corresponding to text components in the recognised text;
  
  storing the audio data;
  
  inputting the recognised characters to a processor for the processing of the characters to at least one of replace, insert move and position the characters to form a processed character string;
  
  forming link data linking the audio identifiers to the character component positions in the characters and updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string;
  
  displaying the characters input to the processor;
  
  selecting displayed characters for audio playback, whereby said link data identifies any selected audio components, if present, which are linked to the selected characters; and
  
  playing back the selected audio components in the order of the character component positions in the character string;
  
  wherein the characters, the link data, and the audio data are stored; and
  
  at an editor work station;
  
  obtaining the stored characters, link data and audio data from the author work station;
  
  inputting the characters into a processor;
  
  linking the audio data to the character component positions using the link data;
  
  displaying the characters being processed;
  
  selecting any displayed characters which have been incorrectly recognised;
  
  playing back any audio component corresponding to the selected characters to aid correction;
  
  correcting the incorrectly recognised characters;
  
  storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file; and
  
  transferring the character correction file to the author work station for later updating of models used by said speech recognition engine;
  
  wherein, at a later time, said character correction file is read at said author work station to pass the data contained therein to said speech recognition engine for updating of said models;
  
  wherein, at said author work station, storing the characters as a file;
  
  selectively disabling one of the importation of recognised characters into the processor and the recognition of speech by said speech recognition engine for a period of time;
  
  storing the audio data for the period of time as an audio message associated with the file; and
  
  at a later time, reading said file for input to the processor; and
  
  at said editor work station, reading the audio message associated with the file being processed by the processor, and playing back the read audio message.
- View Dependent Claims (59)
- - 59. A method as claimed in claim 58 wherein the audio message can be read and played back at any time said file is open in the processor.

60. A universal speech-recognition interface that enables operative coupling of a speech-recognition engine to at least any one of a plurality of different computer-related applications, the universal speech-recognition interface comprising:
- input means for receiving speech-recognition data including recognised words;
  
  output means for outputting the recognised words into at least any one of the plurality of different computer-related applications to allow processing of the recognised words as input text; and
  
  audio playback means for playing audio data associated with the recognised words.
- View Dependent Claims (61, 62, 63)
- - 61. The universal speech-recognition interface of claim 60, further comprising:
    - means, independent of the one computer-related application, for forming link data linking a portion of the audio data to at least one the recognised words independently of the one computer-related application, the link data comprising;
      
      one or more audio identifiers which link a portion of the audio data to one or more recognised words; and
      
      one or more position identifiers which link the recognised words to corresponding positions within the one computer-related application; and
      
      means, independent of the one computer-related application, for updating the position identifiers in response to changes in positions of the recognised words within the one computer-related application.
  - 62. The universal speech-recognition interface of claim 60 further comprising:
    - user operable selection means for selecting one or more of the recognised words in the one computer-related application, wherein the audio playback means is responsive to the selection means to playback audio data associated with the one or more recognised words.
  - 63. The universal speech-recognition interface of claim 60 wherein the plurality of different computer-related applications includes a wordprocessing application and at least one of a spreadsheet processing application, an electronic-mail application, a presentation application, and a computer-aided-design application.

64. A speech-recognition interface that enables operative coupling of a speech-recognition engine to a computer-related application, the interface comprising:
- input means for receiving speech-recognition data including recognised words;
  
  output means for outputting the recognised words into a computer-related application to allow processing of the recognised words as input text, including changing positions of the recognised words; and
  
  means, independent of the computer-related application, for determining positions of the recognised words in the computer-related application.
- View Dependent Claims (65, 66, 67, 68)
- - 65. The speech-recognition interface of claim 64, further comprising:
    - means, independent of the computer-related application, for monitoring changes in positions of the recognised words in the computer-related application.
  - 66. The speech-recognition interface of claim 64, further comprising:
    - means, independent of the computer-related application, for forming link data linking a portion of the audio data to at least one of the recognised words independently of the computer-related application, the link data comprising;
      
      one or more audio identifiers which link a portion of the audio data to one or more recognised words; and
      
      one or more position identifiers which link the recognised words to corresponding positions within the computer-related application; and
      
      means, independent of the computer-related application, for updating the position identifiers in response to changes in positions of the recognised words within the computer-related application.
  - 67. The speech-recognition interface of claim 64, further comprising:
    - audio playback means for playing audio data associated with the recognised words.
  - 68. The universal speech-recognition interface of claim 67, further comprising:
    - user operable selection means for selecting one or more of the recognised words in the computer-related application, wherein the audio playback means is responsive to the selection means to playback audio data associated with the one or more recognised words.

69. Data processing apparatus comprisinginput means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each of the recognised words;
- processing means for implementing an interface application program which receives the input recognised words, inputs the recognised words into a processing application program to process the input recognised words to cause the recognised words to be moved, and forms link data linking the audio data to the recognised words, said link data comprising the audio identifiers and information identifying the corresponding recognised words;
  
  display means for displaying the words received and processed by said processing application program;
  
  user operable selection means for selectively identifying a word in the displayed words, wherein said interface application program is operative to compare the identity of the selected word with said link data to identify any corresponding audio component; and
  
  audio playback means for playing back any identified corresponding audio component.
- View Dependent Claims (70)
- - 70. Data processing apparatus as claimed in claim 69 including storage means for storing said link data, and said audio data.

71. A data processing method comprising:
- inputting recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each of the recognised words;
  
  inputting the recognised words to a processor implementing an interface application program to receive the input recognised words, to pass the recognised words to a processing application program for processing the recognised words to cause the recognised words to be moved, and to form link data linking the audio data to the recognised words, said link data comprising the audio identifiers and information identifying the corresponding recognised words;
  
  displaying the recognised words input to and processed by the processor application program;
  
  selectively identifying a word in the displayed words;
  
  using the interface application program to compare the identity of the selected word with said link data to identify any corresponding audio component; and
  
  playing back any identified corresponding audio component.
- View Dependent Claims (72)
- - 72. A method as claimed in claim 71 including storing the audio data and the link data.

73. A computer usable medium having computer readable instructions stored therein for causing a processor in a data processing apparatus to process recognition signals defining a string of recognised words and corresponding audio data to display the words and selectively play the audio data, the instructions comprising instructions for:
- a) causing the processor to input the recognition signals from a speech recognition engine and the audio data, the recognition signals including a string of recognised words and audio identifiers identifying audio components corresponding to each recognised word;
  
  b) causing the processor to implement an interface application program to receive the input recognised words and to input the recognised words into a processing application program to process the recognised words to cause the recognised words to be relatively moved;
  
  c) causing the processor to implement the interface application program to form link data linking the audio data to the recognised words, said link data comprising the audio identifiers and information identifying the corresponding recognised words;
  
  d) causing the processor to generate an image of the recognised words on a display;
  
  e) causing the processor to receive a selection signal generated by a user for selectively identifying a word in the displayed words;
  
  f) causing the processor to implement the interface application program to compare the identity of the selected word with said link data to identify any corresponding audio component; and
  
  g) causing the processor to send the identified corresponding audio component to an audio playback device.
- View Dependent Claims (74)
- - 74. A computer usable medium as claimed in claim 73 wherein the instructions include instructions for causing the processor to store said link data and said audio data.

75. Data processing apparatus comprising:
- input means for receiving recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers including audio components corresponding to each recognised word;
  
  storage means for storing the audio data received from said input means;
  
  processing means operative under the control of an operating system to implement a first application program which receives the input recognised words in text positions, and which processes the recognised words such that the positions of the recognised words are changed to form a processed word string, and a second application program which determines the positions of and monitors changes in the positions of the recognised words in said first application program using operating system functions communicated via the computer operating system, and which forms link data linking the audio data to the recognised words and updates said link data in response to monitored changes in the positions of the recognised words, said link data comprising the audio identifiers and the determined positions of corresponding recognised words;
  
  display means for displaying the recognised words;
  
  user operable selection means for selecting at least one word in the displayed words, wherein said second application program is operative to identify any selected audio components, if present, which are linked to the at least one selected word; and
  
  audio playback means for playing back any selected audio component.
- View Dependent Claims (76)
- - 76. Data processing apparatus as claimed in claim 75 including means operable by a user to allow the selection of said second application program from amongst a plurality of application programs implementable within the computer operating system.

77. A data processing method comprising:
- inputting recognition data from a speech recognition engine and corresponding audio data, said recognition data including a string of recognised words and audio identifiers identifying audio components corresponding to each of the recognised words;
  
  storing the audio;
  
  implementing a first application program within a computer operating system to receive the input recognised words in text positions, and to process the recognised words such that the positions of the recognised words are changed to form a processed word string;
  
  implementing a second application program from within the computer operating system to determine the positions of the recognised words and monitor changes in the positions of the recognised words in the first application program using operating system functions communicated via the computer operating system, to form link data linking the audio data to the recognised words, and to update the link data in response to monitored changes in the positions of the recognised words, wherein said link data comprises the audio identifiers and the determined positions of corresponding recognised words;
  
  displaying the recognised words;
  
  selecting at least one word in the displayed words, wherein the second application program identifies any selected audio components, if present, which are linked to the at least one selected word; and
  
  playing back any selected audio component.
- View Dependent Claims (78)
- - 78. A method as claimed in claim 77 including selecting the second application program from amongst a plurality of possible application programs implementable within the computer operating system.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Allvoice Developments US LLC
Original Assignee
Allvoice Computing PLC (Allvoice Developments US LLC)
Inventors
Mitchell, John C., Daniel, Nicholas John, Corbett, Steven Norman, Heard, Alan James
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/720,373
Time in Patent Office

697 Days
Field of Search

395/2.44, 395/2.84, 395/2.53, 395/2.79, 395/2.1, 704/201, 704/235, 704/244, 704/270, 704/275
US Class Current

704/235
CPC Class Codes

G06F 3/16   Sound input; Sound output s...

G06F 3/167   Audio in a user interface, ...

G06F 40/216   using statistical methods

G06F 40/279   Recognition of textual enti...

G06F 40/56   Natural language generation

G10L 15/22   Procedures used during a sp...

Automated proofreading using interface linking recognized words to their audio data while text is being changed

First Claim

5 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

78 Claims

Specification

Solutions

Use Cases

Quick Links

Automated proofreading using interface linking recognized words to their audio data while text is being changed

First Claim

5 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

78 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links