Method and apparatus for performing speech keyword retrieval

US 9,355,637 B2
Filed: 02/11/2015
Issued: 05/31/2016
Est. Priority Date: 08/19/2013
Status: Active Grant

First Claim

Patent Images

1. A method for retrieving keyword from speech, comprising:

configuring, by an apparatus comprising a processor circuitry, language models, wherein at least two of the language models each comprises a recognition model and a decoding model that corresponds to the recognition model;

extracting a speech feature, by the apparatus, from to-be-processed speech data;

determining, by the apparatus, which of the recognition models has a highest language matching rate by performing, using the recognition models, language matching on the extracted speech feature;

identifying, by the apparatus, the decoding model which corresponds to the determined recognition model;

decoding, by the apparatus, the extracted speech feature by using the identified decoding model, and obtaining a word recognition result; and

matching, by the apparatus, a keyword in a keyword dictionary and the word recognition result with each other, and outputting a matched keyword on a display of the apparatus.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and an apparatus are provided for retrieving keyword. The apparatus configures at least two types of language models in a model file, where each type of language model includes a recognition model and a corresponding decoding model; the apparatus extracts a speech feature from the to-be-processed speech data; performs language matching on the extracted speech feature by using recognition models in the model file one by one, and determines a recognition model based on a language matching rate; and determines a decoding model corresponding to the recognition model; decoding the extracted speech feature by using the determined decoding model, and obtains a word recognition result after the decoding; and matches a keyword in a keyword dictionary and the word recognition result, and outputs a matched keyword.

Citations

15 Claims

1. A method for retrieving keyword from speech, comprising:
- configuring, by an apparatus comprising a processor circuitry, language models, wherein at least two of the language models each comprises a recognition model and a decoding model that corresponds to the recognition model;
  
  extracting a speech feature, by the apparatus, from to-be-processed speech data;
  
  determining, by the apparatus, which of the recognition models has a highest language matching rate by performing, using the recognition models, language matching on the extracted speech feature;
  
  identifying, by the apparatus, the decoding model which corresponds to the determined recognition model;
  
  decoding, by the apparatus, the extracted speech feature by using the identified decoding model, and obtaining a word recognition result; and
  
  matching, by the apparatus, a keyword in a keyword dictionary and the word recognition result with each other, and outputting a matched keyword on a display of the apparatus.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method according to claim 1, further comprising:
    - training and creating a new recognition model and a new decoding model, both the new recognition model and the new decoding model corresponding to a new dialect; and
      
      adding a new language model into a model file, wherein the new language model comprises the new recognition model and the new decoding model.
  - 3. The method according to claim 1, wherein extracting the speech feature from the to-be-processed speech data comprises:
    - performing speech waveform processing on the to-be-processed speech data, and obtaining a speech feature sequence by extracting the speech feature sequence from a speech waveform.
  - 4. The method according to claim 1, wherein decoding the extracted speech feature by using the identified decoding model comprises:
    - searching for an optimal matching path for the extracted speech feature by using the identified decoding model, and obtaining a word net as the word recognition result, wherein the word net comprises a start node, an end node, and an intermediate node between the start node and the end node, and each node represents a word corresponding to a period of time.
  - 5. The method according to claim 4, wherein matching the keyword in the keyword dictionary and the word recognition result with each other comprises:
    - performing a minimum error alignment operation on the word net, and generating a confusion network, wherein the confusion network performs sequencing according to time, and gives a word recognition result and a probability of the word recognition result during a period of time; and
      
      matching a keyword in the keyword dictionary and the word recognition result in the confusion network with each other, and determining a matched word recognition result as the matched keyword.

6. An apparatus for performing speech keyword retrieval, comprising a processor and a non-transitory storage medium accessible to the processor, the non-transitory storage medium configured to store units comprising:
- a file configuring unit that configures language models, and at least two of the language models each comprises a recognition model and a decoding model that corresponds to the recognition model;
  
  a feature extracting unit that extracts a speech feature from to-be-processed speech data, and sends the extracted speech feature to a language recognition unit;
  
  the language recognition unit that determines which of the recognition models has a highest language matching rate by performing, using the recognition models, language matching on the extracted speech feature;
  
  identifies the decoding model that corresponds to the determined recognition model, and sends the extracted speech feature to a decoding unit;
  
  the decoding unit that decodes the extracted speech feature by using the identified decoding model, obtains a word recognition result, and sends the word recognition result to a keyword search unit; and
  
  the keyword search unit that matches a keyword in a keyword dictionary and the word recognition result with each other, and outputs a matched keyword on a display of the apparatus.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The apparatus according to claim 6, further comprising a language extension unit that trains and creates a new recognition model and a new decoding model, and adds a language model into a model file, wherein the language model comprises the new recognition model and the new decoding model, wherein both the new recognition model and the new decoding model correspond to a new dialect.
  - 8. The apparatus according to claim 6, wherein the feature extracting unit comprises a feature extracting module that performs speech waveform processing on the to-be-processed speech data, and obtains a speech feature sequence by extracting the speech feature sequence from a speech waveform.
  - 9. The apparatus according to claim 6, wherein the decoding unit comprises a path search module that searches for an optimal matching path for the speech feature, and obtains a word net as the word recognition result, wherein the word net comprises a start node, an end node, and an intermediate node between the start node and the end node, and each node represents a word corresponding to a period of time.
  - 10. The apparatus according to claim 9, wherein the keyword search unit comprises a confusion network generating module and a keyword matching module;
    - the confusion network generating module performs a minimum error alignment operation on the word net of the optimal matching path, and generates a confusion network, wherein the confusion network performs sequencing according to time, and gives a word recognition result and a probability of the word recognition result during a period of time; and
      
      the keyword matching module matches a keyword in the keyword dictionary and the word recognition result in the confusion network with each other, and determines a matched word recognition result as a matched keyword.

11. A phone for performing speech keyword retrieval, comprising a processor and a non-transitory storage medium accessible to the processor, the phone configured to perform acts comprising:
- configuring language models, and at least two of the language models each comprises a recognition model and a decoding model that corresponds to the recognition model;
  
  extracting a speech feature from to-be-processed speech data, and sending the extracted speech feature;
  
  determining which of the recognition models has a highest language matching rate by performing, using the recognition models, language matching on the extracted speech feature;
  
  identifying the decoding model that corresponds to the determined recognition model, and sending the extracted speech feature;
  
  decoding the extracted speech feature by using the identified decoding model, obtaining a word recognition result, and sending the word recognition result; and
  
  matching a keyword in a keyword dictionary and the word recognition result with each other, and outputting a matched keyword on a display of the phone.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The phone according to claim 11, the acts further comprising training and creating a new recognition model and a new decoding model, and adding a language model into a model file, wherein the language model comprises the new recognition model and the new decoding model, wherein both the new recognition model and the new decoding model correspond to a new dialect.
  - 13. The phone according to claim 12, the acts further comprising performing speech waveform processing on the to-be-processed speech data, and obtaining a speech feature sequence by extracting the speech feature sequence from a speech waveform.
  - 14. The phone according to claim 11, wherein decoding the extracted speech feature comprises:
    - searching for an optimal matching path for the speech feature, and obtaining a word net as the word recognition result, wherein the word net comprises a start node, an end node, and an intermediate node between the start node and the end node, and each node represents a word corresponding to a period of time.
  - 15. The phone according to claim 14, wherein the acts further comprising:
    - performing a minimum error alignment operation on the word net of the optimal matching path, and generates a confusion network, wherein the confusion network performs sequencing according to time, and giving a word recognition result and a probability of the word recognition result during a period of time; and
      
      matching a keyword in the keyword dictionary and the word recognition result in the confusion network with each other, and determining a matched word recognition result as the matched keyword.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Inventors
Ma, Jianxiong, Li, Lu, Lu, Li, Zhang, Xiang, Yue, Shuai, Rao, Feng, Wang, Eryu, Kong, Linghui
Primary Examiner(s)
YEN, ERIC L

Application Number

US14/620,000
Publication Number

US 20150154955A1
Time in Patent Office

475 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 15/28   Constructional details of s...

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

Method and apparatus for performing speech keyword retrieval

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for performing speech keyword retrieval

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links