Voice recognition apparatus and method
First Claim
1. An apparatus comprising:
- at least one processor;
a memory coupled to the at least one processor; and
a voice recognition processor executed by the at least one processor, the voice recognition processor processing a voice audio stream looking for a plurality of defined words and generating an output file that includes text corresponding to the plurality of defined words, the output file further including at least one audio marker that is linked to at least one portion of the voice audio stream that does not correspond to the plurality of defined words.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice recognition apparatus and method processes a voice audio stream. As sounds in the voice audio stream are identified that correspond to defined words, the voice recognition system writes the text for the words to an output file. If a sound is encountered that is not recognized as a defined word, a visual marker is placed in the output file to mark the location, and a corresponding audio clip is generated and correlated to the visual marker. When the output file is displayed, any sounds not recognized as defined words are represented by an icon that represents an audio clip. If the user cannot determine from the context what the missing word or phrase is, the user may click on the audio icon, which causes the stored audio clip to be played. In this manner a user can dictate into a voice recognition system with complete confidence that any unrecognized words or phrases will be preserved in their original audio format so the user can later listen and enter the missing information into the document. In a second embodiment, the voice recognition apparatus processes digital audio information and reduces the size of the digital audio information by replacing portions of the digital audio information with corresponding text, while leaving any portion that does not correspond to a defined word.
97 Citations
37 Claims
-
1. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor; and
a voice recognition processor executed by the at least one processor, the voice recognition processor processing a voice audio stream looking for a plurality of defined words and generating an output file that includes text corresponding to the plurality of defined words, the output file further including at least one audio marker that is linked to at least one portion of the voice audio stream that does not correspond to the plurality of defined words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
a voice recognition processor executed by the at least one processor, the voice recognition processor comprising;
a plurality of defined words;
a digital audio processor that processes a voice audio stream looking for the plurality of defined words;
a text generator that generates text in an output file for portions of the voice audio stream that correspond to any of the plurality of defined words; and
a digital audio editor that creates an audio clip from the voice audio stream for each portion of the voice audio stream that does not correspond to any of the plurality of defined words, wherein the digital audio editor creates an audio marker that is placed in the output file at a position that identifies the position of each audio clip relative to text generated by the text generator. - View Dependent Claims (10, 11)
-
-
12. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
digital audio information residing in the memory that corresponds to a voice audio stream;
a voice recognition processor executed by the at least one processor, the voice recognition processor comprising;
a plurality of defined words;
a digital audio processor that processes the digital audio information looking for the plurality of defined words;
a digital audio compressor that reduces the size of the digital audio information by replacing at least one portion of the digital audio information with text corresponding to at least one of the plurality of defined words.
-
-
13. A method for processing a voice audio stream comprising:
-
processing the voice audio stream looking for a plurality of defined words;
generating an output file that includes text corresponding to the plurality of defined words and that includes at least one audio marker that is linked to a portion of the voice audio stream for each portion of the voice audio stream that does not correspond to the plurality of defined words. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A method for processing a voice audio stream comprising:
-
processing a voice audio stream looking for a plurality of defined words;
generating text in an output file for portions of the voice audio stream that correspond to any of the plurality of defined words;
creating an audio clip from the voice audio stream for each portion of the voice audio stream that does not correspond to any of the plurality of defined words; and
creating an audio marker that is placed in the output file at a position that identifies the position of each audio clip relative to text in the output file. - View Dependent Claims (20)
-
-
21. A method for reducing the size of digital voice audio information comprising:
-
processing the digital voice audio information looking for a plurality of defined words; and
replacing at least one portion of the digital audio information with text corresponding to at least one of the plurality of defined words.
-
-
22. A method for visually indicating to a user the efficiency of converting digital voice audio information to text, the method comprising:
-
processing the digital voice audio information looking for a plurality of defined words;
replacing at least one portion of the digital audio information with text corresponding to at least one of the plurality of defined words;
calculating the efficiency from the proportion of replaced digital audio information to total digital audio information; and
displaying the efficiency to the user.
-
-
23. A computer-readable program product comprising:
-
(A) a voice recognition processor that processes a voice audio stream looking for a plurality of defined words, the voice recognition processor generating an output file that includes text corresponding to the plurality of defined words, the output file further including at least one audio marker that is linked to at least one portion of the voice audio stream that does not correspond to the plurality of defined words; and
(B) signal bearing media bearing the voice recognition processor. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer-readable program product comprising:
-
(A) a voice recognition processor comprising;
a plurality of defined words;
a digital audio processor that processes a voice audio stream looking for the plurality of defined words;
a text generator that generates text in an output file for portions of the voice audio stream that correspond to any of the plurality of defined words; and
a digital audio editor that creates an audio clip from the voice audio stream for each portion of the voice audio stream that does not correspond to any of the plurality of defined words, wherein the digital audio editor creates an audio marker that is placed in the output file at a position that identifies the position of each audio clip relative to text generated by the text generator; and
(B) signal bearing media bearing the voice recognition processor. - View Dependent Claims (33, 34, 35, 36)
-
-
37. A computer-readable program product comprising:
-
(A) a voice recognition processor comprising;
a plurality of defined words;
a digital audio processor that processes digital voice audio information looking for the plurality of defined words;
a digital audio compressor that reduces the size of the digital voice audio information by replacing at least one portion of the digital voice audio information with text corresponding to at least one of the plurality of defined words; and
(B) signal bearing media bearing the voice recognition processor.
-
Specification