Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing
First Claim
1. A media capture device, comprising:
- a media capture mechanism;
an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity;
a plurality of focused speech recognition lexica respectively relating to media capture activities;
a speech recognizer adapted to recognize the user speech based on a selected one of the focused speech recognition lexica;
a media tagger adapted to tag captured media with text generated by said speech recognizer based on close temporal relation between receipt of recognized user speech and capture of the captured media; and
a media annotator adapted to annotate the captured media with a sample of the user speech that is suitable for input to a speech recognizer based on close temporal relation between receipt of the user speech and capture of the captured media.
2 Assignments
0 Petitions
Accused Products
Abstract
A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
119 Citations
35 Claims
-
1. A media capture device, comprising:
-
a media capture mechanism;
an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity;
a plurality of focused speech recognition lexica respectively relating to media capture activities;
a speech recognizer adapted to recognize the user speech based on a selected one of the focused speech recognition lexica;
a media tagger adapted to tag captured media with text generated by said speech recognizer based on close temporal relation between receipt of recognized user speech and capture of the captured media; and
a media annotator adapted to annotate the captured media with a sample of the user speech that is suitable for input to a speech recognizer based on close temporal relation between receipt of the user speech and capture of the captured media. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A media tagging system, comprising:
-
a portable media capture device adapted to capture media, to receive user speech in close temporal relation to a media capture activity, and adapted to annotate captured media with a sample of the user speech that is suitable for input to a speech recognizer based on close temporal relation between receipt of the user speech and capture of the captured media; and
a post processor adapted to receive annotations from the device, perform speech recognition on the annotations, and tag related captured media with text generated during speech recognition performed on the annotations. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A media tagging method for use with a media capture device, comprising:
-
capturing media with the media capture device during a media capture activity conducted by a user of the device;
receiving user speech via an audio input of the device in close temporal relation to the media capture activity;
annotating captured media by storing the captured media in memory of the device in association with a sample of the user speech that is suitable for input to a speech recognizer;
recognizing the user speech with a speech recognizer of the device employing a focused speech recognition lexicon relating to the media capture activity; and
tagging captured media with recognition text generated during recognition of the user speech by storing the captured media in memory of the device in association with the recognition text. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification