Combined speech recognition and sound recording
First Claim
1. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
memory readable by the processing devices;
a microphone or audio input for providing an electronic signal representing a sound input;
a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in one or more of the memory devices including;
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
wherein the devices programming has instructions for enabling a user to select between two of the following three possible modes of recording sound input as it is received;
a first mode that places text output in response to speech recognition of said sound input into a user navigable document at a current cursor location, without a representation of a recording of said sound input;
a second mode that places a representation of a recording of said sound input into said user navigable document at said cursor without text responding to speech recogntion of said sound input; and
a third mode that places text output in response to speech recognition of said sound input into the user navigable document at the current cursor location, with the words of the text output themselves representing the portions a recording of the sound input from which each such word has been recognized; and
wherein the audio playback programming includes instructions for enabling a user to select to play recorded sound represented by the sound representations placed in the document by the second and third recording modes by having the cursor located on such representations when in a playback mode.
1 Assignment
0 Petitions
Accused Products
Abstract
A handheld device with both large-vocabulary speech recognition and audio recoding allows users to switch between at least two of the following three modes: (1) recording audio without corresponding speech recognition; (2) recording with speech recognition; and (3) speech recognition without audio recording. A handheld device with both large-vocabulary speech recognition and audio recoding enables a user to select a portion of previously recorded sound and have speech recognition performed upon it. A system enables a user to search for a text label associated with portions of unrecognized recorded sound by uttering the label'"'"'s words. A large-vocabulary system allows users to switch between playing back recorded audio and speech recognition with a single input, with successive audio playbacks automatically starting slightly before the end of prior playback. And a cell phone that allows both large-vocabulary speech recognition and audio recording and playback.
255 Citations
21 Claims
-
1. A hand held computing device for performing large vocabulary speech recognition comprising:
-
one or more processing devices;
memory readable by the processing devices;
a microphone or audio input for providing an electronic signal representing a sound input;
a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in one or more of the memory devices including;
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
wherein the devices programming has instructions for enabling a user to select between two of the following three possible modes of recording sound input as it is received;
a first mode that places text output in response to speech recognition of said sound input into a user navigable document at a current cursor location, without a representation of a recording of said sound input;
a second mode that places a representation of a recording of said sound input into said user navigable document at said cursor without text responding to speech recogntion of said sound input; and
a third mode that places text output in response to speech recognition of said sound input into the user navigable document at the current cursor location, with the words of the text output themselves representing the portions a recording of the sound input from which each such word has been recognized; and
wherein the audio playback programming includes instructions for enabling a user to select to play recorded sound represented by the sound representations placed in the document by the second and third recording modes by having the cursor located on such representations when in a playback mode. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A hand held computing device for performing large vocabulary speech recognition comprising:
-
one or more processing devices;
memory readable by the processing devices;
a microphone or audio input for providing an electronic signal representing a sound input;
a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in one or more of the memory devices including;
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
wherein the device'"'"'s programming further includes instructions for enabling a user to select a portion of audio recorded without corresponding recognition and to have speech recognition performed on the selected portion of audio recording so as to produce a text output corresponding to the selected audio.
-
-
10. A hand held computing device for performing large vocabulary speech recognition comprising:
-
one or more processing devices;
memory readable by the processing devices;
a microphone or audio input for providing an electronic signal representing a sound input;
a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in one or more of the memory devices including;
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
wherein said device'"'"'s programming further includes instructions for;
enabling a user to associate recorded portions of text output by said speech recognition with portions of the recorded sound representation that have not previously been labeled by voice;
enabling a user to select to cause text output by said speech recognition to be used as a text search string; and
performing a search for recorded text output that matches the search string;
whereby the user can select to find a portion of recorded sound representation by searching for its associated recorded text.
-
-
11. A computing device for performing large vocabulary speech recognition comprising:
-
one or more processing devices;
memory readable by the processing devices;
a microphone or audio input for providing an electronic signal representing a sound input;
a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in one or more of the memory devices including;
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices;
audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output; and
instructions for switching back and forth between said audio playback and said speech recognition with one user input causing each such switch, with successive audio playbacks starting slightly before the end of the prior playback. - View Dependent Claims (12)
-
-
13. A computing device that functions as a cell phone comprising:
-
a user perceivable output device;
a set of phone keys including at least a standard twelve key phone key pad;
one or more processing devices;
memory readable by the processing devices;
a microphone or audio input from which said telephone can receive electronic representations of sound;
a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
transmitting and receiving circuitry;
programming recorded in the memory including;
telephone programming having instructions for performing telephone functions including making and receiving calls; and
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices;
audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification