Method and apparatus for voice searching for stored content using uniterm discovery

US 8,015,005 B2
Filed: 02/15/2008
Issued: 09/06/2011
Est. Priority Date: 02/15/2008
Status: Active Grant

First Claim

Patent Images

1. In an electronic device, a method comprising:

storing, by the electronic device, content, wherein said content includes one or more of text, images, audio, videos, and multimedia content;

tagging, by the electronic device, the content with an audio tag;

receiving, by the electronic device, a voice query to retrieve content stored on the device;

completing, by the electronic device, a voice-to-voice search utilizing uniterms of the audio tag and a phoneme latent lattice model generated from the voice query to identify audio tags tagged to stored content, which audio tags provide one or more uniterms that score within the phoneme lattice model; and

outputting, by the electronic device, retrieved content associated with the identified audio tags having uniterms that score within the phoneme lattice model, wherein the retrieved content is outputted in an order corresponding to an order in which the uniterms are structured within the voice query;

wherein said completing further comprises;

generating, by the electronic device, one or more first phoneme lattices from audio tags;

determining, by the electronic device, one or more best paths from the one or more first phoneme lattices;

extracting, by the electronic device, one or more uniterms from the one or more first phoneme lattices;

storing, by the electronic device, the one or more uniterms and the one or more best paths in a uniterm index database; and

re-associating, by the electronic device, the one or more uniterms with corresponding stored content with the associated audio tag from which the uniterm was generated; and

wherein extracting one or more uniterms comprises;

generating, by the electronic device, a next latent statistical lattice model from the one or more phoneme lattices generated from the audio tags;

extracting, by the electronic device, phoneme strings with a length that is at least equal to a pre-set minimum length from the phoneme lattices as the one or more best paths;

scoring, by the electronic device, the one or more best paths against the next latent statistical lattice model; and

identifying, by the electronic device, a preset number of best strings as the uniterms selected to represent the phoneme lattice.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.

22 Citations

View as Search Results

14 Claims

1. In an electronic device, a method comprising:
- storing, by the electronic device, content, wherein said content includes one or more of text, images, audio, videos, and multimedia content;
  
  tagging, by the electronic device, the content with an audio tag;
  
  receiving, by the electronic device, a voice query to retrieve content stored on the device;
  
  completing, by the electronic device, a voice-to-voice search utilizing uniterms of the audio tag and a phoneme latent lattice model generated from the voice query to identify audio tags tagged to stored content, which audio tags provide one or more uniterms that score within the phoneme lattice model; and
  
  outputting, by the electronic device, retrieved content associated with the identified audio tags having uniterms that score within the phoneme lattice model, wherein the retrieved content is outputted in an order corresponding to an order in which the uniterms are structured within the voice query;
  
  wherein said completing further comprises;
  
  generating, by the electronic device, one or more first phoneme lattices from audio tags;
  
  determining, by the electronic device, one or more best paths from the one or more first phoneme lattices;
  
  extracting, by the electronic device, one or more uniterms from the one or more first phoneme lattices;
  
  storing, by the electronic device, the one or more uniterms and the one or more best paths in a uniterm index database; and
  
  re-associating, by the electronic device, the one or more uniterms with corresponding stored content with the associated audio tag from which the uniterm was generated; and
  
  wherein extracting one or more uniterms comprises;
  
  generating, by the electronic device, a next latent statistical lattice model from the one or more phoneme lattices generated from the audio tags;
  
  extracting, by the electronic device, phoneme strings with a length that is at least equal to a pre-set minimum length from the phoneme lattices as the one or more best paths;
  
  scoring, by the electronic device, the one or more best paths against the next latent statistical lattice model; and
  
  identifying, by the electronic device, a preset number of best strings as the uniterms selected to represent the phoneme lattice.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising:
    - retrieving, by the electronic device, one or more keywords from the voice query;
      
      identifying, the electronic device, and maintaining a keyword order of the one or more keywords within the voice query as an output order for the retrieved content; and
      
      when the retrieved content is identified, outputting, by the electronic device, the retrieved content in the keyword order.
  - 3. The method of claim 1 wherein said generating further comprises:
    - forwarding, by the electronic device, the voice query to a speech recognizer, which speech recognizer evaluates received audio and generates one or more phoneme lattices from the received audio; and
      
      generating, by the electronic device, the phoneme lattice from the received audio;
      
      wherein the statistical latent lattice model represents an application of a series of statistical probabilities to the phoneme lattice.
  - 4. The method of claim 1 wherein said completing further comprises:
    - generating, by the electronic device, a latent statistical lattice model from one or more second phoneme lattices generated from the voice query;
      
      scoring, by the electronic device, a plurality of uniterms from the first phoneme lattices against the latent statistical lattice model to determine a set of best scoring uniterms; and
      
      retrieving, by the electronic device, content associated with the set of best scoring uniterms as a response to the voice query.
  - 5. The method of claim 4 wherein said scoring further comprises:
    - performing by the electronic device, a coarse search of the statistical latent lattice model with the plurality of uniterms and the one or more best paths to generate a plurality of coarse search candidates; and
      
      performing, by the electronic device, a fine search on the coarse search candidates, which fine search involves comparison of the coarse search candidates against the phoneme lattice generated from the voice query to generate a fine search output from among the coarse search candidates.
  - 6. The method of claim 5:
    - wherein performing the coarse search further comprises;
      
      retrieving, by the electronic device, a uniterm phoneme tree from a uniterm index database, wherein the uniterm phoneme tree is a tree that includes substantially all the uniterms discovered from the audio database;
      
      scoring, by the electronic device, the uniterms of the uniterm phoneme tree against the statistical latent lattice model, wherein a statistical probability of a match of the uniterms and branches of the uniterm phoneme tree to the latent lattice model is provided;
      
      evaluating, by the electronic device, a resulting score to determine which branches of the uniterm phoneme tree are the top branches, having one of a highest score relative to other branches and a score above a pre-set minimum score; and
      
      identifying, by the electronic device, the top branches as a result of the coarse search, representing coarse search candidates for utilization as inputs for performing the fine search; and
      
      wherein performing the fine search further comprises;
      
      matching, by the electronic device, the top branches resulting from the coarse search against the one or more second phoneme lattices of the voice query; and
      
      outputting, by the electronic device, a top set of audio segments resulting from the fine search as the response to the voice query.
  - 7. The method of claim 1 wherein the audio tag is a voice identifier.

8. A device comprising:
- a processor;
  
  an audio input device for receiving audio data including audio tags and voice queries;
  
  a storage mechanism for storing content and corresponding audio tags;
  
  an output mechanism for outputting stored content based on a search of tags associated with the content; and
  
  a voice search and content ordering (VSCO) utility executing on the processor and having logic for completing the following functions;
  
  storing content, wherein said content includes one or more of text, images, audio, videos, and multimedia content;
  
  tagging the content with an audio tag;
  
  receiving a voice query to retrieve content stored on the device;
  
  triggering completion of a voice-to-voice search utilizing uniterms of the audio tag and a phoneme latent lattice model generated from the voice query to identify audio tags tagged to stored content, which audio tags provide one or more uniterms that score within the phoneme lattice model; and
  
  outputting retrieved content associated with the identified audio tags having uniterms that score within the phoneme lattice model, wherein the retrieved content is outputted in an order corresponding to an order in which the uniterms are structured within the voice query;
  
  wherein said logic of the VSCO utility for triggering completion of a voice-to-voice search further comprises functional logic for performing the functions of;
  
  generating one or more first phoneme lattices from audio tags;
  
  determining one or more best paths from the one or more first phoneme lattices;
  
  extracting one or more uniterms from the one or more first phoneme lattices;
  
  storing the one or more uniterms and the one or more best paths in a uniterm index database; and
  
  re-associating the one or more uniterms with corresponding, stored content with the associated audio tag from which the uniterm was generated; and
  
  wherein said logic for extracting one or more uniterms comprises logic for performing the functions of;
  
  generating a next latent statistical lattice model from the one or more phoneme lattices generated from the audio tags;
  
  extracting phoneme strings with a length that is at least equal to a pre-set minimum length from the phoneme lattices as the one or more best paths;
  
  scoring the one or more best paths against the next latent statistical lattice model; and
  
  identifying a preset number of best strings as the uniterms selected to represent the phoneme lattice.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The device of claim 8 further comprising:
    - a uniterm discovery and search (UDS) engine executing on the processor and having functional logic for completing the following functions;
      
      generating one or more first phoneme lattices from audio data stored within an audio database;
      
      determining one or more best paths from the one or more first phoneme lattices;
      
      extracting one or more uniterms from the one or more first phoneme lattices; and
      
      storing the one or more uniterms and the one or more best paths in a uniterm index database;
      
      wherein said triggering activates the UDS engine to complete the voice-to-voice search.
  - 10. The device of claim 9 wherein said functional logic for generating further comprises logic for performing the functions of:
    - forwarding the voice query to a speech recognizer, which speech recognizer evaluates received audio and generates one or more phoneme lattices from the received audio; and
      
      generating the phoneme lattice from the received audio;
      
      wherein the statistical latent lattice model represents an application of a series of statistical probabilities to the phoneme lattice.
  - 11. The device of claim 8 wherein said functional logic for completing further comprises logic for performing the functions of:
    - generating a latent statistical lattice model from one or more second phoneme lattices generated from the voice query;
      
      scoring a plurality of uniterms from the first phoneme lattices against the latent statistical lattice model to determine a set of best scoring uniterms; and
      
      retrieving content associated with the set of best scoring uniterms as a response to the voice query.
  - 12. The device of claim 11 wherein the functional logic for scoring comprises logic for performing the functions of:
    - performing a coarse search of the statistical latent lattice model with the plurality of uniterms and the one or more best paths to generate a plurality of coarse search candidates; and
      
      performing a fine search on the coarse search candidates, which fine search involves comparison of the coarse search candidates against the phoneme lattice generated from the voice query to generate a fine search output from among the coarse search candidates.
  - 13. The device of claim 12:
    - wherein the functional logic for performing the coarse search further comprises logic for;
      
      retrieving a uniterm phoneme tree from a uniterm index database, wherein the uniterm phoneme tree is a tree that includes substantially all the uniterms discovered from the audio database;
      
      scoring the uniterms of the uniterm phoneme tree against the statistical latent lattice model, wherein a statistical probability of a match of the uniterms and branches of the uniterm phoneme tree to the latent lattice model is provided;
      
      evaluating a resulting score to determine which branches of the uniterm phoneme tree are the top branches, having one of a highest score relative to other branches and a score above a pre-set minimum score; and
      
      identifying the top branches as a result of the coarse search, representing coarse search candidates for utilization as inputs for performing the fine search; and
      
      wherein the functional logic for performing the fine search further comprises logic for;
      
      matching the top branches resulting from the coarse search against the one or more second phoneme lattices of the voice query; and
      
      outputting a top set of audio segments resulting from the fine search as the response to the voice query.
  - 14. The device of claim 8 wherein the audio tag is a voice identifier and the device is a mobile communication device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility, Inc. (Lenovo Group Ltd.)
Inventors
Ma, Changxue
Primary Examiner(s)
Vo; Huyen X.

Application Number

US12/032,258
Publication Number

US 20090210226A1
Time in Patent Office

1,299 Days
Field of Search

704/253, 704/254, 704/249, 704/231, 704/270, 704/270.1, 704/235, 704/243, 704/244, 704/236, 704/257, 707/708
US Class Current

704/236
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/088   Word spotting

Method and apparatus for voice searching for stored content using uniterm discovery

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

14 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for voice searching for stored content using uniterm discovery

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others