Voice recognition apparatus and method

US 20030046071A1
Filed: 09/06/2001
Published: 03/06/2003
Est. Priority Date: 09/06/2001
Status: Abandoned Application

First Claim

Patent Images

1. An apparatus comprising:

at least one processor;

a memory coupled to the at least one processor; and

a voice recognition processor executed by the at least one processor, the voice recognition processor processing a voice audio stream looking for a plurality of defined words and generating an output file that includes text corresponding to the plurality of defined words, the output file further including at least one audio marker that is linked to at least one portion of the voice audio stream that does not correspond to the plurality of defined words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice recognition apparatus and method processes a voice audio stream. As sounds in the voice audio stream are identified that correspond to defined words, the voice recognition system writes the text for the words to an output file. If a sound is encountered that is not recognized as a defined word, a visual marker is placed in the output file to mark the location, and a corresponding audio clip is generated and correlated to the visual marker. When the output file is displayed, any sounds not recognized as defined words are represented by an icon that represents an audio clip. If the user cannot determine from the context what the missing word or phrase is, the user may click on the audio icon, which causes the stored audio clip to be played. In this manner a user can dictate into a voice recognition system with complete confidence that any unrecognized words or phrases will be preserved in their original audio format so the user can later listen and enter the missing information into the document. In a second embodiment, the voice recognition apparatus processes digital audio information and reduces the size of the digital audio information by replacing portions of the digital audio information with corresponding text, while leaving any portion that does not correspond to a defined word.

97 Citations

View as Search Results

37 Claims

1. An apparatus comprising:
- at least one processor;
  
  a memory coupled to the at least one processor; and
  
  a voice recognition processor executed by the at least one processor, the voice recognition processor processing a voice audio stream looking for a plurality of defined words and generating an output file that includes text corresponding to the plurality of defined words, the output file further including at least one audio marker that is linked to at least one portion of the voice audio stream that does not correspond to the plurality of defined words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The apparatus of claim 1 wherein the voice recognition processor, when a defined word is found in the voice audio stream, replaces in the output file the defined word in the voice audio stream with text corresponding to the defined word.
  - 3. The apparatus of claim 1 wherein the voice recognition processor generates an audio clip for at least one portion of the voice audio stream that contains sounds that do not correlate to any defined word, and wherein each audio marker in the output file is linked to a corresponding audio clip.
  - 4. The apparatus of claim 3 wherein the voice recognition processor determines how much of the voice audio stream is included in each audio clip according to user-defined preferences.
  - 5. The apparatus of claim 3 wherein the voice recognition processor plays an audio clip when the corresponding audio marker is selected by a user.
  - 6. The apparatus of claim 5 wherein the voice recognition processor determines how much of the corresponding audio clip is played according to user-defined preferences.
  - 7. The apparatus of claim 1 wherein the voice audio stream comprises digital audio information.
  - 8. The apparatus of claim 1 wherein the voice recognition processor displays a clarity meter that visually indicates to a user the efficiency of the voice recognition processor in converting the voice audio stream to text.

9. An apparatus comprising:
- at least one processor;
  
  a memory coupled to the at least one processor;
  
  a voice recognition processor executed by the at least one processor, the voice recognition processor comprising;
  
  a plurality of defined words;
  
  a digital audio processor that processes a voice audio stream looking for the plurality of defined words;
  
  a text generator that generates text in an output file for portions of the voice audio stream that correspond to any of the plurality of defined words; and
  
  a digital audio editor that creates an audio clip from the voice audio stream for each portion of the voice audio stream that does not correspond to any of the plurality of defined words, wherein the digital audio editor creates an audio marker that is placed in the output file at a position that identifies the position of each audio clip relative to text generated by the text generator.
- View Dependent Claims (10, 11)
- - 10. The apparatus of claim 9 wherein the voice recognition processor plays an audio clip when the corresponding audio marker is selected by a user during the display of the output file to a user.
  - 11. The apparatus of claim 9 wherein the voice recognition processor displays a clarity meter that visually indicates to a user the efficiency of the voice recognition processor in converting the voice audio stream to text.

12. An apparatus comprising:
- at least one processor;
  
  a memory coupled to the at least one processor;
  
  digital audio information residing in the memory that corresponds to a voice audio stream;
  
  a voice recognition processor executed by the at least one processor, the voice recognition processor comprising;
  
  a plurality of defined words;
  
  a digital audio processor that processes the digital audio information looking for the plurality of defined words;
  
  a digital audio compressor that reduces the size of the digital audio information by replacing at least one portion of the digital audio information with text corresponding to at least one of the plurality of defined words.

13. A method for processing a voice audio stream comprising:
- processing the voice audio stream looking for a plurality of defined words;
  
  generating an output file that includes text corresponding to the plurality of defined words and that includes at least one audio marker that is linked to a portion of the voice audio stream for each portion of the voice audio stream that does not correspond to the plurality of defined words.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13 further comprising:
    - when one of the plurality of defined words is found in the voice audio stream, replacing in the output file the portion of the voice audio stream that corresponds with the defined word with text corresponding to the defined word.
  - 15. The method of claim 13 further comprising:
    - generating an audio clip for at least one portion of the voice audio stream that contains sounds that do not correlate to any defined word; and
      
      linking each audio marker in the output file to a corresponding audio clip.
  - 16. The method of claim 15 further comprising:
    - determining how much of the voice audio stream to include in each audio clip according to user-defined preferences.
  - 17. The method of claim 15 further comprising playing an audio clip when the corresponding audio marker is selected by a user.
  - 18. The method of claim 17 further comprising determining how much of the corresponding audio clip is played according to user-defined preferences.

19. A method for processing a voice audio stream comprising:
- processing a voice audio stream looking for a plurality of defined words;
  
  generating text in an output file for portions of the voice audio stream that correspond to any of the plurality of defined words;
  
  creating an audio clip from the voice audio stream for each portion of the voice audio stream that does not correspond to any of the plurality of defined words; and
  
  creating an audio marker that is placed in the output file at a position that identifies the position of each audio clip relative to text in the output file.
- View Dependent Claims (20)
- - 20. The method of claim 19 further comprising playing an audio clip when the corresponding audio marker is selected by a user during the display of the output file to the user.

21. A method for reducing the size of digital voice audio information comprising:
- processing the digital voice audio information looking for a plurality of defined words; and
  
  replacing at least one portion of the digital audio information with text corresponding to at least one of the plurality of defined words.

22. A method for visually indicating to a user the efficiency of converting digital voice audio information to text, the method comprising:
- processing the digital voice audio information looking for a plurality of defined words;
  
  replacing at least one portion of the digital audio information with text corresponding to at least one of the plurality of defined words;
  
  calculating the efficiency from the proportion of replaced digital audio information to total digital audio information; and
  
  displaying the efficiency to the user.

23. A computer-readable program product comprising:
- (A) a voice recognition processor that processes a voice audio stream looking for a plurality of defined words, the voice recognition processor generating an output file that includes text corresponding to the plurality of defined words, the output file further including at least one audio marker that is linked to at least one portion of the voice audio stream that does not correspond to the plurality of defined words; and
  
  (B) signal bearing media bearing the voice recognition processor.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
- - 24. The computer-readable program product of claim 23 wherein the signal bearing media comprises recordable media.
  - 25. The computer-readable program product of claim 23 wherein the signal bearing media comprises transmission media.
  - 26. The computer-readable program product of claim 23 wherein the voice recognition processor, when a defined word is found in the voice audio stream, replaces in the output file the defined word in the voice audio stream with text corresponding to the defined word.
  - 27. The computer-readable program product of claim 23 wherein the voice recognition processor generates an audio clip for at least one portion of the voice audio stream that contains sounds that do not correlate to any defined word, and wherein each audio marker in the output file is linked to a corresponding audio clip.
  - 28. The computer-readable program product of claim 27 wherein the voice recognition processor determines how much of the voice audio stream is included in each audio clip according to user-defined preferences.
  - 29. The computer-readable program product of claim 27 wherein the voice recognition processor plays an audio clip when the corresponding audio marker is selected by a user.
  - 30. The computer-readable program product of claim 29 wherein the voice recognition processor determines how much of the corresponding audio clip is played according to user-defined preferences.
  - 31. The computer-readable program product of claim 23 wherein the voice recognition processor displays a clarity meter that visually indicates to a user the efficiency of the voice recognition processor in converting the voice audio stream to text.

32. A computer-readable program product comprising:
- (A) a voice recognition processor comprising;
  
  a plurality of defined words;
  
  a digital audio processor that processes a voice audio stream looking for the plurality of defined words;
  
  a text generator that generates text in an output file for portions of the voice audio stream that correspond to any of the plurality of defined words; and
  
  a digital audio editor that creates an audio clip from the voice audio stream for each portion of the voice audio stream that does not correspond to any of the plurality of defined words, wherein the digital audio editor creates an audio marker that is placed in the output file at a position that identifies the position of each audio clip relative to text generated by the text generator; and
  
  (B) signal bearing media bearing the voice recognition processor.
- View Dependent Claims (33, 34, 35, 36)
- - 33. The computer-readable program product of claim 32 wherein the signal bearing media comprises recordable media.
  - 34. The computer-readable program product of claim 32 wherein the signal bearing media comprises transmission media.
  - 35. The computer-readable program product of claim 32 wherein the voice recognition processor plays an audio clip when the corresponding audio marker is selected by a user during the display of the output file to a user.
  - 36. The computer-readable program product of claim 32 wherein the voice recognition processor displays a clarity meter that visually indicates to a user the efficiency of the voice recognition processor in converting the voice audio stream to text.

37. A computer-readable program product comprising:
- (A) a voice recognition processor comprising;
  
  a plurality of defined words;
  
  a digital audio processor that processes digital voice audio information looking for the plurality of defined words;
  
  a digital audio compressor that reduces the size of the digital voice audio information by replacing at least one portion of the digital voice audio information with text corresponding to at least one of the plurality of defined words; and
  
  (B) signal bearing media bearing the voice recognition processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Wyman, Blair

Application Number

US09/947,987
Publication Number

US 20030046071A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/225 Feedback of the input speech

Voice recognition apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

97 Citations

37 Claims

Specification

Use Cases

Quick Links

Others

Voice recognition apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

97 Citations

37 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others