Method and system for automatic speech recognition

US 9,472,190 B2
Filed: 04/28/2014
Issued: 10/18/2016
Est. Priority Date: 01/30/2013
Status: Active Grant

First Claim

Patent Images

1. A method of recognizing speech, comprising:

generating a decoding network for decoding speech input, the decoding network comprising a primary sub-network and one or more classification sub-networks, wherein;

the primary sub-network includes a plurality of classification nodes, each classification node corresponding to a respective classification sub-network of the one or more classification sub-networks, wherein each respective classification sub-network is distinct from the primary sub-network; and

each classification sub-network of the one or more classification sub-networks corresponds to a group of uncommon words;

receiving a speech input; and

decoding the speech input by;

instantiating a token corresponding to the speech input in the primary sub-network;

passing the token through the primary sub-network;

when the token reaches a respective classification node of the plurality of classification nodes, transferring the token to the corresponding classification sub-network;

passing the token through the corresponding classification sub-network;

when the token reaches an accept node of the classification sub-network, returning a result of the token passing through the classification sub-network to the primary sub-network, wherein the result includes one or more words in the group of uncommon words corresponding to the classification sub-network;

outputting a string corresponding to the speech input that includes the one or more words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of recognizing speech is provided that includes generating a decoding network that includes a primary sub-network and a classification sub-network. The primary sub-network includes a classification node corresponding to the classification sub-network. The classification sub-network corresponds to a group of uncommon words. A speech input is received and decoded by instantiating a token in the primary sub-network and passing the token through the primary network. When the token reaches the classification node, the method includes transferring the token to the classification sub-network and passing the token through the classification sub-network. When the token reaches an accept node of the classification sub-network, the method includes returning a result of the token passing through the classification sub-network to the primary sub-network. The result includes one or more words in the group of uncommon words. A string corresponding to the speech input is output that includes the one or more words.

25 Citations

View as Search Results

15 Claims

1. A method of recognizing speech, comprising:
- generating a decoding network for decoding speech input, the decoding network comprising a primary sub-network and one or more classification sub-networks, wherein;
  
  the primary sub-network includes a plurality of classification nodes, each classification node corresponding to a respective classification sub-network of the one or more classification sub-networks, wherein each respective classification sub-network is distinct from the primary sub-network; and
  
  each classification sub-network of the one or more classification sub-networks corresponds to a group of uncommon words;
  
  receiving a speech input; and
  
  decoding the speech input by;
  
  instantiating a token corresponding to the speech input in the primary sub-network;
  
  passing the token through the primary sub-network;
  
  when the token reaches a respective classification node of the plurality of classification nodes, transferring the token to the corresponding classification sub-network;
  
  passing the token through the corresponding classification sub-network;
  
  when the token reaches an accept node of the classification sub-network, returning a result of the token passing through the classification sub-network to the primary sub-network, wherein the result includes one or more words in the group of uncommon words corresponding to the classification sub-network;
  
  outputting a string corresponding to the speech input that includes the one or more words.

2. The method of claim 1, wherein the returned result is a respective result in a plurality of possible token-passing results through the classification sub-network, the returned result having a higher rollback probability than any other result in the plurality of possible token passing results through the classification sub-network.

3. The method of claim 1, wherein:
- transferring the token to the corresponding classification sub-network further includes preserving one or more phones obtained prior to the token reaching the classification node as a starting index for the classification sub-network; and
  
  returning the result of the token passing through the classification sub-network to the primary sub-network includes preserving one or more phones obtained prior to the token reaching the accept node of the classification sub-network as a returning index for the primary decoding sub-network.

4. The method of claim 1, wherein the decoding network is a weighted finite state transducer.

5. The method of claim 1, wherein the one or more classification sub-networks include a medical terminology sub-network, a personal names sub-network, a place names sub-network, and a computer terminology sub-network.

6. An electronic device, comprising:
- one or more processors;
  
  memory; and
  
  one or more programs, wherein the one or more programs are stored in memory and configured to be executed by the one or more processors, the one or more programs including an operating system and instructions that when executed by the one or more processors cause the electronic device to;
  
  generate a decoding network for decoding speech input, the decoding network comprising a primary sub-network and one or more classification sub-networks, wherein;
  
  the primary sub-network includes a plurality of classification nodes, each classification node corresponding to a respective classification sub-network of the one or more classification sub-networks, wherein each respective classification sub-network is distinct from the primary sub-network; and
  
  each classification sub-network of the one or more classification sub-networks corresponds to a group of uncommon words;
  
  receive a speech input; and
  
  decode the speech input by;
  
  instantiating a token corresponding to the speech input in the primary sub-network;
  
  passing the token through the primary sub-network;
  
  when the token reaches a respective classification node of the plurality of classification nodes, transferring the token to the corresponding classification sub-network;
  
  passing the token through the corresponding classification sub-network;
  
  when the token reaches an accept node of the classification sub-network, returning a result of the token passing through the classification sub-network to the primary sub-network, wherein the result includes one or more words in the group of uncommon words corresponding to the classification sub-networks;
  
  output a string corresponding to the speech input that includes the one or more words.

7. The electronic device of claim 6, wherein the returned result is a respective result in a plurality of possible token-passing results through the classification sub-network, the returned result having a higher rollback probability than any other result in the plurality of possible token passing results through the classification sub-network.

8. The electronic device of claim 6, wherein:
- transferring the token to the corresponding classification sub-network further includes preserving one or more phones obtained prior to the token reaching the classification node as a starting index for the classification sub-network; and
  
  returning the result of the token passing through the classification sub-network to the primary sub-network includes preserving one or more phones obtained prior to the token reaching the accept node of the classification sub-network as a returning index for the primary decoding sub-network.

9. The electronic device of claim 6, wherein the decoding network is a weighted finite state transducer.

10. The electronic device of claim 6, wherein the one or more classification sub-networks include a medical terminology sub-network, a personal names sub-network, a place names sub-network, and a computer terminology sub-network.

11. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device with one or more processors and memory, cause the electronic device to:
- generate a decoding network for decoding speech input, the decoding network comprising a primary sub-network and one or more classification sub-networks, wherein;
  
  the primary sub-network includes a plurality of classification nodes, each classification node corresponding to a respective classification sub-network of the one or more classification sub-networks, wherein each respective classification sub-network is distinct from the primary sub-network; and
  
  each classification sub-network of the one or more classification sub-networks corresponds to a group of uncommon words;
  
  receive a speech input; and
  
  decode the speech input by;
  
  instantiating a token corresponding to the speech input in the primary sub-network;
  
  passing the token through the primary sub-network;
  
  when the token reaches a respective classification node of the plurality of classification nodes, transferring the token to the corresponding classification sub-network;
  
  passing the token through the corresponding classification sub-network;
  
  when the token reaches an accept node of the classification sub-network, returning a result of the token passing through the classification sub-network to the primary sub-network, wherein the result includes one or more words in the group of uncommon words corresponding to the classification sub-network;
  
  output a string corresponding to the speech input that includes the one or more words.

12. The non-transitory computer readable storage medium of claim 11, wherein the returned result is a respective result in a plurality of possible token-passing results through the classification sub-network, the returned result having a higher rollback probability than any other result in the plurality of possible token passing results through the classification sub-network.

13. The non-transitory computer readable storage medium of claim 11, wherein:
- transferring the token to the corresponding classification sub-network further includes preserving one or more phones obtained prior to the token reaching the classification node as a starting index for the classification sub-network; and
  
  returning the result of the token passing through the classification sub-network to the primary sub-network includes preserving one or more phones obtained prior to the token reaching the accept node of the classification sub-network as a returning index for the primary decoding sub-network.

14. The non-transitory computer readable storage medium of claim 11, wherein the decoding network is a weighted finite state transducer.

15. The non-transitory computer readable storage medium of claim 11, wherein the one or more classification sub-networks include a medical terminology sub-network, a personal names sub-network, a place names sub-network, and a computer terminology sub-network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Company Limited (Tencent Holdings Limited)
Inventors
Yue, Shuai, Lu, Li, Zhang, Xiang, Xie, Dadong, Chen, Bo, Rao, Feng
Primary Examiner(s)
Baker, Matthew

Application Number

US14/263,958
Publication Number

US 20140236591A1
Time in Patent Office

904 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/083 Recognition networks G10L15...

G10L 15/193 Formal grammars, e.g. finit...

Method and system for automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links