Combined speech recognition and sound recording

US 20050159957A1
Filed: 12/05/2004
Published: 07/21/2005
Est. Priority Date: 09/05/2001
Status: Active Grant

First Claim

Patent Images

1. A hand held computing device for performing large vocabulary speech recognition comprising:

one or more processing devices;

memory readable by the processing devices;

a microphone or audio input for providing an electronic signal representing a sound input;

a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;

programming recorded in one or more of the memory devices including;

speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and

audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and

audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;

wherein the devices programming has instructions for enabling a user to select between two of the following three possible modes of recording sound input as it is received;

a first mode that places text output in response to speech recognition of said sound input into a user navigable document at a current cursor location, without a representation of a recording of said sound input;

a second mode that places a representation of a recording of said sound input into said user navigable document at said cursor without text responding to speech recogntion of said sound input; and

a third mode that places text output in response to speech recognition of said sound input into the user navigable document at the current cursor location, with the words of the text output themselves representing the portions a recording of the sound input from which each such word has been recognized; and

wherein the audio playback programming includes instructions for enabling a user to select to play recorded sound represented by the sound representations placed in the document by the second and third recording modes by having the cursor located on such representations when in a playback mode.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A handheld device with both large-vocabulary speech recognition and audio recoding allows users to switch between at least two of the following three modes: (1) recording audio without corresponding speech recognition; (2) recording with speech recognition; and (3) speech recognition without audio recording. A handheld device with both large-vocabulary speech recognition and audio recoding enables a user to select a portion of previously recorded sound and have speech recognition performed upon it. A system enables a user to search for a text label associated with portions of unrecognized recorded sound by uttering the label'"'"'s words. A large-vocabulary system allows users to switch between playing back recorded audio and speech recognition with a single input, with successive audio playbacks automatically starting slightly before the end of prior playback. And a cell phone that allows both large-vocabulary speech recognition and audio recording and playback.

255 Citations

21 Claims

1. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
  
  wherein the devices programming has instructions for enabling a user to select between two of the following three possible modes of recording sound input as it is received;
  
  a first mode that places text output in response to speech recognition of said sound input into a user navigable document at a current cursor location, without a representation of a recording of said sound input;
  
  a second mode that places a representation of a recording of said sound input into said user navigable document at said cursor without text responding to speech recogntion of said sound input; and
  
  a third mode that places text output in response to speech recognition of said sound input into the user navigable document at the current cursor location, with the words of the text output themselves representing the portions a recording of the sound input from which each such word has been recognized; and
  
  wherein the audio playback programming includes instructions for enabling a user to select to play recorded sound represented by the sound representations placed in the document by the second and third recording modes by having the cursor located on such representations when in a playback mode.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A device as in claim 1 wherein the device'"'"'s instructions for enabling a user to select to switch back and forth between the second mode to either the first or third with less than one second'"'"'s delay for each switch.
  - 3. A device as in claim 1 wherein the device'"'"'s programming further includes instructions for enabling a user to select a portion of audio recorded without corresponding recognition to have speech recognition performed on the selected portion of audio recording so as to produce a text output corresponding to the selected audio.
  - 4. A device as in claim 1 wherein the device'"'"'s programming further includes instructions for enabling a user to select a sub-portion of text output by speech recognition in the third mode that has recorded sound associated with its words and to have the recorded sound associated with the selected text removed.
  - 5. A device as in claim 1 wherein the device'"'"'s programming further includes instructions for enabling a user to select a sub-portion of text output by speech recognition in the third mode that has recorded sound associated with its words and to have the selected text removed and to replace its location in the document with the type of representation of the recorded sound produced by recording in the second mode.
  - 6. A device as in claim 1 wherein the representations of sound placed in the document by the second recording mode are audiographic representations that vary in length as a function of the duration of the respective portions of recorded sound they represent.
  - 7. A computing device as in claim 1 wherein the device is a handheld device.
  - 8. A computing device as in claim 7 wherein the device is a cell phone.

9. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
  
  wherein the device'"'"'s programming further includes instructions for enabling a user to select a portion of audio recorded without corresponding recognition and to have speech recognition performed on the selected portion of audio recording so as to produce a text output corresponding to the selected audio.

10. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
  
  wherein said device'"'"'s programming further includes instructions for;
  
  enabling a user to associate recorded portions of text output by said speech recognition with portions of the recorded sound representation that have not previously been labeled by voice;
  
  enabling a user to select to cause text output by said speech recognition to be used as a text search string; and
  
  performing a search for recorded text output that matches the search string;
  
  whereby the user can select to find a portion of recorded sound representation by searching for its associated recorded text.

11. A computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices;
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output; and
  
  instructions for switching back and forth between said audio playback and said speech recognition with one user input causing each such switch, with successive audio playbacks starting slightly before the end of the prior playback.
- View Dependent Claims (12)
- - 12. A computing device as in claim 11 wherein said instructions for switching back and forth between said audio playback and said speech recognition make both such switch in response to a user selection of the same input device.

13. A computing device that functions as a cell phone comprising:
- a user perceivable output device;
  
  a set of phone keys including at least a standard twelve key phone key pad;
  
  one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input from which said telephone can receive electronic representations of sound;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  transmitting and receiving circuitry;
  
  programming recorded in the memory including;
  
  telephone programming having instructions for performing telephone functions including making and receiving calls; and
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices;
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. A computing device as in claim 13 wherein said play programming includes instructions for:
    - enabling a user to select a sub-portion of said recorded sound representation; and
      
      enabling a user to select to play a selected sub-portion of said sound representation to the other side of a cellular telephone call.
  - 15. A computing device as in claim 13 wherein said recording programming includes instructions for:
    - enabling a user to select to record an electronically readable representation of one or both sides of a cellular phone conversation.
  - 16. A computing device as in claim 13 wherein the device'"'"'s programming further includes instructions for enabling a user to associate recorded portions of text output by said speech recognition with portions of the recorded sound representation that have not previously been labeled by voice.
  - 17. A computing device as in claim 16 wherein the device'"'"'s programming further includes instructions for:
    - enabling a user to select to cause text output by said speech recognition to be used as a text search string; and
      
      performing a search for recorded text output corresponding to said search string;
      
      whereby said user can select to find a portion of recorded sound representation by searching for its associated recorded text.
  - 18. A computing device as in claim 13 wherein the device'"'"'s programming further includes instructions for enabling a user to select a sub-portion of said recorded sound representation which had not previously been recognized and to have said large vocabulary speech recognition performed upon said selected sub-portion.
  - 19. A computing device as in claim 18 wherein:
    - said speech recognition programming includes instructions for performing speech recognition at different levels of quality, with the higher quality recognition taking more time to recognize a given length of sound; and
      
      said instructions for enabling a user to select to have speech recognition performed on a selected sub-portion of recorded sound includes instructions for enabling the selected recorded sound to be recognized said higher quality.
  - 20. A computing device as in claim 18 wherein said speech recognition programming includes instructions for:
    - marking the time alignment between individual recognized words in text output by said speech recognition and the portions of the recorded sound associated with each recognized word in said text; and
      
      enabling a user select a sequence of one or more words and to have the recorded sound associated with those words played back.
  - 21. A computing device as in claim 13 wherein the device'"'"'s programming further includes instructions for switching back and forth between audio playback and speech recognition, with successive audio playbacks starting slightly before the end of the prior playback.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Voice Signal Technologies Incorporated (Microsoft Corporation)
Inventors
Roth, Daniel L., Johnston, David F., Cohen, Jordan R., Porter, Edward W.

Granted Patent

US 7,505,911 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/276
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/225   Feedback of the input speech

Combined speech recognition and sound recording

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

255 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Combined speech recognition and sound recording

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

255 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links