Word tagging and editing system for speech recognition
First Claim
1. An apparatus for editing text resulting from speech recognition, said apparatus comprising an audio file, a word processing file, a word processor, and a playback facility, the playback facility further comprising a speaker:
- the audio file comprising audio information;
the word processing file comprising text words and tag information for each of the text words, the tag information linking text words to respective audio information;
the word processor having user inputs and operatively coupled with the word processing file and disposed to select ones of the text words and tag information at a first location;
the word processor responsive to a first user input for copying selected text words and tag information, and responsive to a second user input for inserting said selected text words and tag information into the word processing file at a second location differing from the first location;
the word processor further responsive to a third user input for selecting ones of the text words and tag information at the second location;
the apparatus further comprising a playback facility responsive to a fourth user input and to the selected tag information at the second location, for identifying audio information in the audio file linked to the text words selected at the second location, and for playing back the audio information via the speaker.
1 Assignment
0 Petitions
Accused Products
Abstract
A word tagging and editing system for speech recognition receives recognized speech text from a speech recognition engine, and creates tagging information that follows the speech text as it is received by a word processing program or other program. The body of text to be edited in connection with the word processing program may be selected and cut and pasted and otherwise manipulated, and the tags follow the speech text. A word may be selected by a user, and the tag information used to point to a sound bite within the audio data file created initially by the speech recognition engine. The sound bite may be replayed to the user through a speaker. The practical results include that the user may confirm the correctness of a particular recognized word, in real time whilst editing text in the word processor. If the recognition is manually corrected, the correction information may be supplied to the engine for use in updating a user profile for the user who dictated the audio that was recognized. Particular tagging approaches are employed depending on the particular word processor being used.
-
Citations
42 Claims
-
1. An apparatus for editing text resulting from speech recognition, said apparatus comprising an audio file, a word processing file, a word processor, and a playback facility, the playback facility further comprising a speaker:
-
the audio file comprising audio information; the word processing file comprising text words and tag information for each of the text words, the tag information linking text words to respective audio information; the word processor having user inputs and operatively coupled with the word processing file and disposed to select ones of the text words and tag information at a first location;
the word processor responsive to a first user input for copying selected text words and tag information, and responsive to a second user input for inserting said selected text words and tag information into the word processing file at a second location differing from the first location;the word processor further responsive to a third user input for selecting ones of the text words and tag information at the second location; the apparatus further comprising a playback facility responsive to a fourth user input and to the selected tag information at the second location, for identifying audio information in the audio file linked to the text words selected at the second location, and for playing back the audio information via the speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus for editing text resulting from speech recognition, said apparatus comprising an audio file, a word processing file, a word processor, and a playback facility, the playback facility further comprising a speaker:
-
the audio file comprising audio information; the word processing file comprising text words and tag information for each of the text words, the tag information linking text words to respective audio information; the word processor having user inputs and operatively coupled with the word processing file and disposed to select ones of the text words and tag information at a first location;
the word processor responsive to a first user input for copying and deleting selected text words and tag information, and responsive to a second user input for inserting said selected text words and tag information into the word processing file at a second location differing from the first location;the word processor further responsive to a third user input for selecting ones of the text words and tag information at the second location; the apparatus further comprising a playback facility responsive to a fourth user input and to the selected tag information at the second location, for identifying audio information in the audio file linked to the text words selected at the second location, and for playing back the audio information via the speaker. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus for editing text resulting from speech recognition, said apparatus comprising an audio file, a word processing file, a mirror file, a word processor, a mirror application, and a playback facility, the playback facility further comprising a speaker:
-
the audio file comprising audio information; the word processing file comprising text words; the mirror file comprising tag information for each of the text words, the tag information linking text words to respective audio information; the word processor having user inputs and operatively coupled with the word processing file and disposed to select ones of the text words at a first location;
the word processor responsive to a first user input for copying selected text words, and responsive to a second user input for inserting said selected text words into the word processing file at a second location differing from the first location;the mirror application responsive to the user inputs and operatively coupled with the mirror file and disposed to select ones of the tag information corresponding to the text words at the first location, the mirror application responsive to the first user input for copying tag information corresponding to the selected text words, and responsive to the second user input for inserting said tag information into the mirror file at a second location differing from the first location; the word processor further responsive to a third user input for selecting ones of the text words at the second location; the mirror application responsive to the third user input for selecting tag information corresponding to the ones of the text words at the second location; the apparatus further comprising a playback facility responsive to a fourth user input and to the selected tag information at the second location, for identifying audio information in the audio file linked to the text words selected at the second location, and for playing back the audio information via the speaker. - View Dependent Claims (18)
-
-
19. An apparatus for editing text resulting from speech recognition, said apparatus comprising an audio file, a word processing file, a mirror file, a word processor, a mirror application, and a playback facility, the playback facility further comprising a speaker:
-
the audio file comprising audio information; the word processing file comprising text words; the mirror file comprising tag information for each of the text words, the tag information linking text words to respective audio information; the word processor having user inputs and operatively coupled with the word processing file and disposed to select ones of the text words at a first location;
the word processor responsive to a first user input for copying and deleting selected text words, and responsive to a second user input for inserting said selected text words into the word processing file at a second location differing from the first location;the mirror application responsive to the user inputs and operatively coupled with the mirror file and disposed to select ones of the tag information corresponding to the text words at the first location, the mirror application responsive to the first user input for copying and deleting tag information corresponding to the selected text words, and responsive to the second user input for inserting said tag information into the mirror file at a second location differing from the first location; the word processor further responsive to a third user input for selecting ones of the text words at the second location; the mirror application responsive to the third user input for selecting tag information corresponding to the ones of the text words at the second location; the apparatus further comprising a playback facility responsive to a fourth user input and to the selected tag information at the second location, for identifying audio information in the audio file linked to the text words selected at the second location, and for playing back the audio information via the speaker. - View Dependent Claims (20)
-
-
21. A method of editing text resulting from speech recognition for use with an audio file, a word processing file, a word processor, a playback facility, and a speaker, the method comprising the steps of:
-
establishing, within the word processing file, text words and tag information for each of the text words, the tag information linking text words to respective audio information; selecting ones of the text words and tag information at a first location; copying the selected text words and tag information; inserting said selected text words and tag information into the word processing file at a second location differing from the first location; selecting ones of the text words and tag information at the second location; identifying audio information in the audio file linked to the text words selected at the second location; and playing back the audio information via the speaker. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A method of editing text resulting from speech recognition for use with an audio file, a word processing file, a word processor, a playback facility, and a speaker, the method comprising the steps of:
-
establishing, within the word processing file, text words and tag information for each of the text words, the tag information linking text words to respective audio information; selecting ones of the text words and tag information at a first location; copying the selected text words and tag information; deleting the selected text words and tag information at the first location; inserting said selected text words and tag information into the word processing file at a second location differing from the first location; selecting ones of the text words and tag information at the second location; identifying audio information in the audio file linked to the text words selected at the second location; and playing back the audio information via the speaker. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
-
-
37. A method for editing text resulting from speech recognition, the method used with an audio file, a word processing file, a mirror file, a word processor, a mirror application, a playback facility, and a speaker, the method comprising the steps of:
-
storing recognized text in the word processing file; storing in the mirror file tag information for each of the text words, the tag information linking text words to respective audio information; selecting ones of the text words at a first location; copying the selected text words; inserting said selected text words into the word processing file at a second location differing from the first location; select ones of the tag information corresponding to the text words at the first location; copying tag information corresponding to the selected text words; inserting said tag information into the mirror file at a second location differing from the first location; selecting ones of the text words at the second location; selecting tag information corresponding to the ones of the text words at the second location; identifying audio information in the audio file linked to the text words selected at the second location; and playing back the audio information via the speaker. - View Dependent Claims (38)
-
-
39. A method for editing text resulting from speech recognition, the method used with an audio file, a word processing file, a mirror file, a word processor, a mirror application, a playback facility, and a speaker, the method comprising the steps of:
-
storing recognized text in the word processing file; storing in the mirror file tag information for each of the text words, the tag information linking text words to respective audio information; selecting ones of the text words at a first location; copying the selected text words; deleting the selected text words; inserting said selected text words into the word processing file at a second location differing from the first location; selecting ones of the tag information corresponding to the text words at the first location; copying the tag information corresponding to the text words at the first location; deleting the tag information corresponding to the text words at the first location; inserting said tag information into the mirror file at a second location differing from the first location; selecting ones of the text words at the second location; selecting tag information corresponding to the ones of the text words at the second location; identifying audio information in the audio file linked to the text words selected at the second location; and playing back the audio information via the speaker. - View Dependent Claims (40)
-
-
41. A speech recognition method for use with a speech recognition engine having an enrollment mode and a recognition mode and with a digital tape recorder having a microphone, the method comprising the steps of:
-
operatively coupling the digital tape recorder with the engine; configuring the digital tape recorder so that audio information received at its microphone is communicated via the operative coupling to the engine; performing the enrollment; disconnecting the recorder from the engine; dictating into the microphone of the recorder thereby recording dictated audio information therein; operatively coupling the recorder with the engine; playing back the audio information from the recorder and communicating said information via the operative coupling to the engine; and recognizing the information within the engine. - View Dependent Claims (42)
-
Specification