Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing

US 20050075881A1
Filed: 10/02/2003
Published: 04/07/2005
Est. Priority Date: 10/02/2003
Status: Active Grant

First Claim

Patent Images

1. A media capture device, comprising:

a media capture mechanism;

an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity;

a plurality of focused speech recognition lexica respectively relating to media capture activities;

a speech recognizer adapted to recognize the user speech based on a selected one of the focused speech recognition lexica;

a media tagger adapted to tag captured media with text generated by said speech recognizer based on close temporal relation between receipt of recognized user speech and capture of the captured media; and

a media annotator adapted to annotate the captured media with a sample of the user speech that is suitable for input to a speech recognizer based on close temporal relation between receipt of the user speech and capture of the captured media.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.

119 Citations

35 Claims

1. A media capture device, comprising:
- a media capture mechanism;
  
  an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity;
  
  a plurality of focused speech recognition lexica respectively relating to media capture activities;
  
  a speech recognizer adapted to recognize the user speech based on a selected one of the focused speech recognition lexica;
  
  a media tagger adapted to tag captured media with text generated by said speech recognizer based on close temporal relation between receipt of recognized user speech and capture of the captured media; and
  
  a media annotator adapted to annotate the captured media with a sample of the user speech that is suitable for input to a speech recognizer based on close temporal relation between receipt of the user speech and capture of the captured media.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The device of claim 1, further comprising an input receptive of a user identity, wherein said speech recognizer is adapted to recognize user speech based on the user identity.
  - 3. The device of claim 2, wherein said speech recognizer is adapted to employ focused lexica based on the user identity.
  - 4. The device of claim 1, wherein said speech recognizer is adapted to select a lexicon based on the user speech and a predefined heuristic relating to voice tags associated with the lexica.
  - 5. The device of claim 1, further comprising a user interface adapted to permit a user to navigate between and select a lexicon.
  - 6. The device of claim 1, further comprising a media retrieval mechanism adapted to retrieve captured media from memory of the device by matching a tag of the captured media to recognition text generated form user speech received and recognized during a retrieval mode of the device.
  - 7. The device of claim 1, further comprising a media retrieval mechanism adapted to retrieve captured media from memory of the device by matching an annotation of the captured media to user speech received during a retrieval mode of the device using sound similarity metrics to align an annotation with a spoken query.
  - 8. The device of claim 1, further comprising a lexicon editor adapted to supplement a lexicon based on an annotation, letter to sound rules, and user speech corresponding to spelled word input received and recognized during a lexicon edit mode of the device.
  - 9. The device of claim 1, further comprising an external data interface adapted to transmit annotations to a post processor having greater speech recognition capabilities than said device.
  - 10. The device of claim 1, further comprising:
    - an external data interface receptive of lexicon contents; and
      
      a lexicon editor adapted to store the lexicon contents in device memory.

11. A media tagging system, comprising:
- a portable media capture device adapted to capture media, to receive user speech in close temporal relation to a media capture activity, and adapted to annotate captured media with a sample of the user speech that is suitable for input to a speech recognizer based on close temporal relation between receipt of the user speech and capture of the captured media; and
  
  a post processor adapted to receive annotations from the device, perform speech recognition on the annotations, and tag related captured media with text generated during speech recognition performed on the annotations.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The system of claim 11, comprising a source of predefined, focused lexica relating to media capture activities and adapted to communicate focused lexica to said media capture device according to device type over a communications network.
  - 13. The system of claim 11, comprising a source of predefined, focused lexica relating to media capture activities and adapted to communicate focused lexica to said post-processor over a communications network.
  - 14. The system of claim 11, comprising a lexicon editor provided to at least one of the device and the post processor and adapted to customize a focused lexicon for a user of the device.
  - 15. The system of claim 11, comprising a mapping module adapted to convert textual tags associated with captured media to alternative textual tags based on predetermined criteria relating to a media capture activity.
  - 16. The system of claim 11, wherein said device is adapted to perform a relatively limited amount of speech recognition on the annotation compared to an amount of speech recognition performed by said post-processor, the relatively limited amount being limited in at least one of time and search space due to at least one of relatively lower processing power and relatively lower memory capacity of said device, and to tag related captured media with recognition text generated during the relatively limited amount of speech recognition.
  - 17. The system of claim 11, wherein said post-processor is receptive of captured media from said device, and is adapted to organize the captured media according to at least one of annotations and textual tags associated with the captured media, including clustering at least one of annotations and textual tags based on at least one of acoustic similarity measures and semantic similarity measures.

18. A media tagging method for use with a media capture device, comprising:
- capturing media with the media capture device during a media capture activity conducted by a user of the device;
  
  receiving user speech via an audio input of the device in close temporal relation to the media capture activity;
  
  annotating captured media by storing the captured media in memory of the device in association with a sample of the user speech that is suitable for input to a speech recognizer;
  
  recognizing the user speech with a speech recognizer of the device employing a focused speech recognition lexicon relating to the media capture activity; and
  
  tagging captured media with recognition text generated during recognition of the user speech by storing the captured media in memory of the device in association with the recognition text.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 19. The method of claim 18, further comprising selecting a focused speech recognition lexicon relating to the media capture activity from a plurality of focused lexica relating to media capture activities that are stored in memory of the device.
  - 20. The method of claim 19, wherein said step of selecting the focused speech recognition lexicon is based on the user speech and a predefined heuristic relating to voice tags associated with the lexica.
  - 21. The method of claim 19, wherein said step of selecting the focused speech recognition lexicon is based on user navigation of the lexica via a user interface of the device.
  - 22. The method of claim 18, further comprising receiving a user identity, wherein said step of recognizing the user speech is based on the user identity.
  - 23. The method of claim 22, further comprising selecting, based on the user identity, a focused speech recognition lexicon relating to the media capture activity from a plurality of focused lexica relating to media capture activities that are stored in memory of the device.
  - 24. The method of claim 18, further comprising retrieving captured media from memory of the device by matching a tag of the captured media to recognition text generated form user speech received and recognized during a retrieval mode of the device.
  - 25. The method of claim 18, further comprising retrieving captured media from memory of the device by matching an annotation of the captured media to user speech received during a retrieval mode of the device using sound similarity metrics to align an annotation with a spoken query.
  - 26. The method of claim 18, further comprising supplementing a lexicon stored in device memory based on an annotation, letter to sound rules, and user speech corresponding to spelled word input received and recognized during a lexicon edit mode of the device.
  - 27. The method of claim 18, further comprising receiving lexicon contents and storing the lexicon contents in device memory.
  - 28. The method of claim 18, further comprising transferring annotations from the device to a post processor having greater speech recognition capability than the device.
  - 29. The method of claim 28, further comprising:
    - performing speech recognition on annotations received from the device; and
      
      tagging related captured media with text generated during speech recognition performed on the annotations.
  - 30. The method of claim 28, comprising transferring focused lexica from a source of predefined, focused lexica to the post processor.
  - 31. The method of claim 18, comprising transferring focused lexica from a source of predefined, focused lexica to the device.
  - 32. The method of claim 18, comprising customizing a focused lexicon for a user of the device.
  - 33. The method of claim 18, comprising convert textual tags associated with captured media to alternative textual tags based on predetermined criteria relating to a media capture activity.
  - 34. The method of claim 18, further comprising organizing the captured media according to textual tags associated with the captured media, including clustering textual tags based on semantic similarity measures.
  - 35. The method of claim 18, further comprising organizing the captured media according to annotations associated with the captured media, including clustering annotations based on acoustic similarity measures.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Rigazio, Luca, Junqua, Jean-Claude, Boman, Robert, Nguyen, Patrick

Granted Patent

US 7,324,943 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270
CPC Class Codes

G06F 16/7844 using original textual cont...

G10L 15/26 Speech to text systems G10L...

Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

119 Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

119 Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links