Method and apparatus for voice searching for stored content using uniterm discovery
First Claim
1. In an electronic device, a method comprising:
- storing, by the electronic device, content, wherein said content includes one or more of text, images, audio, videos, and multimedia content;
tagging, by the electronic device, the content with an audio tag;
receiving, by the electronic device, a voice query to retrieve content stored on the device;
completing, by the electronic device, a voice-to-voice search utilizing uniterms of the audio tag and a phoneme latent lattice model generated from the voice query to identify audio tags tagged to stored content, which audio tags provide one or more uniterms that score within the phoneme lattice model; and
outputting, by the electronic device, retrieved content associated with the identified audio tags having uniterms that score within the phoneme lattice model, wherein the retrieved content is outputted in an order corresponding to an order in which the uniterms are structured within the voice query;
wherein said completing further comprises;
generating, by the electronic device, one or more first phoneme lattices from audio tags;
determining, by the electronic device, one or more best paths from the one or more first phoneme lattices;
extracting, by the electronic device, one or more uniterms from the one or more first phoneme lattices;
storing, by the electronic device, the one or more uniterms and the one or more best paths in a uniterm index database; and
re-associating, by the electronic device, the one or more uniterms with corresponding stored content with the associated audio tag from which the uniterm was generated; and
wherein extracting one or more uniterms comprises;
generating, by the electronic device, a next latent statistical lattice model from the one or more phoneme lattices generated from the audio tags;
extracting, by the electronic device, phoneme strings with a length that is at least equal to a pre-set minimum length from the phoneme lattices as the one or more best paths;
scoring, by the electronic device, the one or more best paths against the next latent statistical lattice model; and
identifying, by the electronic device, a preset number of best strings as the uniterms selected to represent the phoneme lattice.
4 Assignments
0 Petitions
Accused Products
Abstract
A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.
22 Citations
14 Claims
-
1. In an electronic device, a method comprising:
-
storing, by the electronic device, content, wherein said content includes one or more of text, images, audio, videos, and multimedia content; tagging, by the electronic device, the content with an audio tag; receiving, by the electronic device, a voice query to retrieve content stored on the device; completing, by the electronic device, a voice-to-voice search utilizing uniterms of the audio tag and a phoneme latent lattice model generated from the voice query to identify audio tags tagged to stored content, which audio tags provide one or more uniterms that score within the phoneme lattice model; and outputting, by the electronic device, retrieved content associated with the identified audio tags having uniterms that score within the phoneme lattice model, wherein the retrieved content is outputted in an order corresponding to an order in which the uniterms are structured within the voice query; wherein said completing further comprises; generating, by the electronic device, one or more first phoneme lattices from audio tags; determining, by the electronic device, one or more best paths from the one or more first phoneme lattices; extracting, by the electronic device, one or more uniterms from the one or more first phoneme lattices; storing, by the electronic device, the one or more uniterms and the one or more best paths in a uniterm index database; and re-associating, by the electronic device, the one or more uniterms with corresponding stored content with the associated audio tag from which the uniterm was generated; and wherein extracting one or more uniterms comprises; generating, by the electronic device, a next latent statistical lattice model from the one or more phoneme lattices generated from the audio tags; extracting, by the electronic device, phoneme strings with a length that is at least equal to a pre-set minimum length from the phoneme lattices as the one or more best paths; scoring, by the electronic device, the one or more best paths against the next latent statistical lattice model; and identifying, by the electronic device, a preset number of best strings as the uniterms selected to represent the phoneme lattice. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A device comprising:
-
a processor; an audio input device for receiving audio data including audio tags and voice queries; a storage mechanism for storing content and corresponding audio tags; an output mechanism for outputting stored content based on a search of tags associated with the content; and a voice search and content ordering (VSCO) utility executing on the processor and having logic for completing the following functions; storing content, wherein said content includes one or more of text, images, audio, videos, and multimedia content; tagging the content with an audio tag; receiving a voice query to retrieve content stored on the device; triggering completion of a voice-to-voice search utilizing uniterms of the audio tag and a phoneme latent lattice model generated from the voice query to identify audio tags tagged to stored content, which audio tags provide one or more uniterms that score within the phoneme lattice model; and outputting retrieved content associated with the identified audio tags having uniterms that score within the phoneme lattice model, wherein the retrieved content is outputted in an order corresponding to an order in which the uniterms are structured within the voice query; wherein said logic of the VSCO utility for triggering completion of a voice-to-voice search further comprises functional logic for performing the functions of; generating one or more first phoneme lattices from audio tags; determining one or more best paths from the one or more first phoneme lattices; extracting one or more uniterms from the one or more first phoneme lattices; storing the one or more uniterms and the one or more best paths in a uniterm index database; and re-associating the one or more uniterms with corresponding, stored content with the associated audio tag from which the uniterm was generated; and wherein said logic for extracting one or more uniterms comprises logic for performing the functions of; generating a next latent statistical lattice model from the one or more phoneme lattices generated from the audio tags; extracting phoneme strings with a length that is at least equal to a pre-set minimum length from the phoneme lattices as the one or more best paths; scoring the one or more best paths against the next latent statistical lattice model; and identifying a preset number of best strings as the uniterms selected to represent the phoneme lattice. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification