Method and apparatus for uniterm discovery and voice-to-voice search on mobile device

US 8,019,604 B2
Filed: 12/21/2007
Issued: 09/13/2011
Est. Priority Date: 12/21/2007
Status: Expired due to Fees

First Claim

Patent Images

1. In an electronic device, a method comprising:

generating, by the electronic device, one or more first phoneme lattices from audio data stored within an audio database;

determining, by the electronic device, one or more best paths from the one or more first phoneme lattices;

extracting, by the electronic device, one or more uniterms from the one or more first phoneme lattices; and

storing, by the electronic device, the one or more uniterms and the one or more best paths in a uniterm index database;

wherein extracting one or more uniterms comprises;

generating, by the electronic device, a next latent statistical lattice model from the one or more phoneme lattices generated from the audio dataextracting, by the electronic device, phoneme strings with a length that is at least equal to a pre-set minimum length from the one or more phoneme as candidates for the one or more uniterms;

scoring, by the electronic device, the candidates for the one or more uniterms against the next latent statistical lattice model; and

identifying, by the electronic device, a preset number of candidates with best scores as the one or more uniterms selected to represent the phoneme lattice.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, system and communication device for enabling uniterm discovery from audio content and voice-to-voice searching of audio content stored on a device using discovered uniterms. Received audio/voice input signal is sent to a uniterm discovery and search (UDS) engine within the device. The audio data may be associated with other content that is also stored within the device. The UDS engine retrieves a number of uniterms from the audio data and associates the uniterms with the stored content. When a voice search is initiated at the device, the UDS engine generates a statistical latent lattice model from the voice query and scores the uniterms from the audio database against the latent lattice model. Following a further refinement, the best group of uniterms is then determined and segments of the stored audio data and/or other content corresponding to the best group of uniterms are outputted.

Citations

18 Claims

1. In an electronic device, a method comprising:
- generating, by the electronic device, one or more first phoneme lattices from audio data stored within an audio database;
  
  determining, by the electronic device, one or more best paths from the one or more first phoneme lattices;
  
  extracting, by the electronic device, one or more uniterms from the one or more first phoneme lattices; and
  
  storing, by the electronic device, the one or more uniterms and the one or more best paths in a uniterm index database;
  
  wherein extracting one or more uniterms comprises;
  
  generating, by the electronic device, a next latent statistical lattice model from the one or more phoneme lattices generated from the audio dataextracting, by the electronic device, phoneme strings with a length that is at least equal to a pre-set minimum length from the one or more phoneme as candidates for the one or more uniterms;
  
  scoring, by the electronic device, the candidates for the one or more uniterms against the next latent statistical lattice model; and
  
  identifying, by the electronic device, a preset number of candidates with best scores as the one or more uniterms selected to represent the phoneme lattice.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - receiving, by the electronic device, a voice query for retrieval of stored content;
      
      generating, by the electronic device, a latent statistical lattice model from one or more second phoneme lattices generated from the voice query;
      
      scoring, by the electronic device, a plurality of uniterms against the latent statistical lattice model to determine a set of best scoring uniterms; and
      
      retrieving, by the electronic device, content associated with the set of best scoring uniterms as a response to the voice query.
  - 3. The method of claim 1 further comprising:
    - storing, by the electronic device, the one or more uniterms in a uniterms phoneme tree structure; and
      
      forwarding, by the electronic device, the uniterms phoneme tree structure and the one or more best paths to a coarse search function that scores the one or more uniterms of the uniterms phoneme tree structure against the statistical latent lattice model.
  - 4. The method of claim 2 wherein generating one or more first phoneme lattices further comprises forwarding the audio data from the audio database to a speech recognizer, which speech recognizer evaluates received audio and generates the one or more phoneme lattices from the received audio.
  - 5. The method of claim 2 wherein said generating further comprises:
    - forwarding, by the electronic device, the voice query to a speech recognizer, which speech recognizer evaluates received audio and generates one or more phoneme lattices from the received audio; and
      
      generating, by the electronic device, the phoneme lattice from the received audio;
      
      wherein the statistical latent lattice model represents an application of a series of statistical probabilities to the phoneme lattice.
  - 6. The method of claim 2 wherein said scoring further comprises:
    - performing, by the electronic device, a coarse search of the statistical latent lattice model with the plurality of uniterms and the one or more best paths to generate a plurality of coarse search candidates; and
      
      performing, by the electronic device, a fine search on the coarse search candidates, which fine search involves comparison of the coarse search candidates against the phoneme lattice generated from the voice query to generate a fine search output from among the coarse search candidates.
  - 7. The method of claim 6 wherein performing the coarse search further comprises:
    - retrieving, by the electronic device, a uniterm phoneme tree from a uniterm index database, wherein the uniterm phoneme tree is a tree that includes substantially all the uniterms discovered from the audio database;
      
      scoring, by the electronic device, the uniterms of the uniterm phoneme tree against the statistical latent lattice model, wherein a statistical probability of a match of the uniterms and branches of the uniterm phoneme tree to the latent lattice model is provided;
      
      evaluating, by the electronic device, a resulting score to determine which branches of the uniterm phoneme tree are the top branches, having one of a highest score relative to other branches and a score above a pre-set minimum score; and
      
      identifying, by the electronic device, the top branches as a result of the coarse search, representing coarse search candidates for utilization as inputs for performing the fine search.
  - 8. The method of claim 7 wherein performing the fine search further comprises:
    - matching, by the electronic device, the top branches resulting from the coarse search against the one or more second phoneme lattices of the voice query; and
      
      outputting, by the electronic device, a top set of audio segments resulting from the fine search as the response to the voice query.
  - 9. The method of claim 8 wherein outputting the top set of audio segments further comprises:
    - retrieving, by the electronic device, non-audio content associated with the top set of audio segments; and
      
      outputting, by the electronic device, the non-audio content as the response to the voice query.

10. A device comprising:
- a processor;
  
  an audio input device for receiving audio data including voice input data and voice queries;
  
  a storage mechanism for storing content including the audio data; and
  
  a uniterm discovery and search (UDS) engine executing on the processor and having functional components for completing the following functions;
  
  generating one or more first phoneme lattices from audio data stored within an audio database;
  
  determining one or more best paths from the one or more first phoneme lattices;
  
  extracting one or more uniterms from the one or more first phoneme lattices; and
  
  storing the one or more uniterms and the one or more best paths in a uniterm index database;
  
  wherein the functional component for extracting one or more uniterms further performs the functions of;
  
  generating a next latent statistical lattice model from the one or more phoneme lattices generated from the audio dataextracting phoneme strings with a length that is at least equal to a pre-set minimum length from the one or more phoneme lattices as candidates for the one or more uniterms;
  
  scoring the candidates for the one or more uniterms against the next latent statistical lattice model;
  
  identifying a preset number of candidates with best scores as the one or more uniterms selected to represent the phoneme lattice;
  
  storing the one or more uniterms in a uniterms phoneme tree structure; and
  
  forwarding the uniterms phoneme tree structure and the one or more best paths to a coarse search function that scores the one or more uniterms of the uniterms phoneme tree structure against the statistical latent lattice model.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The device of claim 10 said UDS engine further comprising functional components for performing the functions of:
    - receiving a voice query for retrieval of stored content;
      
      generating a latent statistical lattice model from one or more second phoneme lattices generated from the voice query;
      
      scoring a plurality of uniterms against the latent statistical lattice model to determine a set of best scoring uniterms; and
      
      retrieving content associated with the set of best scoring uniterms as a response to the voice query.
  - 12. The device of claim 11 wherein said functional component for generating further performs the functions of:
    - forwarding the voice query to a speech recognizer, which speech recognizer evaluates received audio and generates one or more phoneme lattices from the received audio; and
      
      generating the phoneme lattice from the received audio;
      
      wherein the statistical latent lattice model represents an application of a series of statistical probabilities to the phoneme lattice.
  - 13. The device of claim 11 wherein said functional component for scoring further performs the functions of:
    - performing a coarse search of the statistical latent model with the plurality of uniterms and the one or more best paths to generate a plurality of coarse search candidates; and
      
      performing a fine search on the coarse search candidates, which fine search involves comparison of the coarse search candidates against the phoneme lattice generated from the voice query to generate a fine search output from among the coarse search candidates.
  - 14. The device of claim 13 wherein the functional component for performing the coarse search further performs the functions of:
    - retrieving a uniterm phoneme tree from a uniterm index database, wherein the uniterm phoneme tree is a tree that includes substantially all the uniterms discovered from the audio database;
      
      scoring the uniterms of the uniterm phoneme tree against the statistical latent lattice model, wherein a statistical probability of a match of the uniterms and branches of the uniterm phoneme tree to the latent lattice model is provided;
      
      evaluating a resulting score to determine which branches of the uniterm phoneme tree are the top branches, having one of a highest score relative to other branches and a score above a pre-set minimum score; and
      
      identifying the top branches as a result of the coarse search, representing coarse search candidates for utilization as inputs for performing the fine search.
  - 15. The device of claim 14 wherein the functional component for performing the fine search further performs the functions of:
    - matching the top branches resulting from the coarse search against the one or more second phoneme lattices of the voice query; and
      
      outputting a top set of audio segments resulting from the fine search as the response to the voice query.
  - 16. The method of claim 15 wherein the functional component for outputting the top set of audio segments further performs the functions of:
    - retrieving non-audio content associated with the top set of audio segments; and
      
      outputting the non-audio content as the response to the voice query.
  - 17. The device of claim 10 wherein the functional component for generating one or more first phoneme lattices further performs the function of forwarding the audio data from the audio database to a speech recognizer, which speech recognizer evaluates received audio and generates the one or more phoneme lattices from the received audio.
  - 18. The device of claim 10 wherein the device is a mobile communication device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility, Inc. (Lenovo Group Ltd.)
Inventors
Ma, Changxue
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/962,866
Publication Number

US 20090164218A1
Time in Patent Office

1,362 Days
Field of Search

704/253, 704/254, 704/249, 704/255, 704/231, 704/235, 704/242, 704/243, 704/245
US Class Current

704/254
CPC Class Codes

G06F 16/632   Query formulation

G06F 16/685   using automatically derived...

G10L 15/02   Feature extraction for spee...

Method and apparatus for uniterm discovery and voice-to-voice search on mobile device

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for uniterm discovery and voice-to-voice search on mobile device

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links