Speech recognition method and system based on user personalized information

US 9,564,127 B2
Filed: 12/20/2013
Issued: 02/07/2017
Est. Priority Date: 12/28/2012
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving, by a processing device, a speech signal;

decoding, by the processing device, the speech signal according to a basic static decoding network to obtain a decoding path on each active node in the basic static decoding network, wherein the basic static decoding network is formed by extending words in a basic name language model into corresponding acoustic units, and wherein the basic name language model comprises a first statistical probability between two common words and a second statistical probability between a common word and a name;

generating, by the processing device, a user-specific name language model comprising a third statistical probability between the name and a user identifier;

building, by the processing device, an affiliated static decoding network associated with the user-specific name language model by extending words in the user-specific name language model into corresponding acoustic units, wherein building the affiliated static decoding network further comprises;

setting a first pronunciation of a first word at a beginning of a sentence in the user-specific name language model to a first virtual pronunciation;

setting a second pronunciation of a second word at an end of the sentence in the user-specific name language model to a second virtual pronunciation; and

extending a special pronunciation unit on an outgoing arc of a node corresponding to the beginning of the sentence and an incoming arc of the node corresponding to the end of the sentence to obtain the affiliated static decoding network associated with the user-specific name language model; and

responsive to identifying that a decoding path enters a name node in the basic static decoding network, extending, by the processing device, an extra network associated with the name node according to the affiliated static decoding network; and

returning, by the processing device, a recognition result after the decoding is completed, wherein a recognition accuracy rate for names is improved.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a speech recognition method and system based on user personalized information. The method comprises the following steps: receiving a speech signal; decoding the speech signal according to a basic static decoding network to obtain a decoding path on each active node in the basic static decoding network, wherein the basic static decoding network is a decoding network associated with a basic name language model; if a decoding path enters a name node in the basic static decoding network, network extending is carried out on the name node according to an affiliated static decoding network of a user, wherein the affiliated static decoding network is a decoding network associated with a name language model of a particular user; and returning a recognition result after the decoding is completed. The recognition accuracy rate of user personalized information in continuous speech recognition may be raised by using the present invention.

Citations

16 Claims

1. A method, comprising:
- receiving, by a processing device, a speech signal;
  
  decoding, by the processing device, the speech signal according to a basic static decoding network to obtain a decoding path on each active node in the basic static decoding network, wherein the basic static decoding network is formed by extending words in a basic name language model into corresponding acoustic units, and wherein the basic name language model comprises a first statistical probability between two common words and a second statistical probability between a common word and a name;
  
  generating, by the processing device, a user-specific name language model comprising a third statistical probability between the name and a user identifier;
  
  building, by the processing device, an affiliated static decoding network associated with the user-specific name language model by extending words in the user-specific name language model into corresponding acoustic units, wherein building the affiliated static decoding network further comprises;
  
  setting a first pronunciation of a first word at a beginning of a sentence in the user-specific name language model to a first virtual pronunciation;
  
  setting a second pronunciation of a second word at an end of the sentence in the user-specific name language model to a second virtual pronunciation; and
  
  extending a special pronunciation unit on an outgoing arc of a node corresponding to the beginning of the sentence and an incoming arc of the node corresponding to the end of the sentence to obtain the affiliated static decoding network associated with the user-specific name language model; and
  
  responsive to identifying that a decoding path enters a name node in the basic static decoding network, extending, by the processing device, an extra network associated with the name node according to the affiliated static decoding network; and
  
  returning, by the processing device, a recognition result after the decoding is completed, wherein a recognition accuracy rate for names is improved.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, further comprising:
    - determining the affiliated static decoding network one of before or after the speech signal is decoded according to the basic static decoding network.
  - 3. The method according to claim 2, wherein determining the affiliated static decoding network further comprises:
    - determining an identity of the user according to a feature of the speech signal, and then determining the affiliated static decoding network according to the identity of the user;
      
      ordetermining the identity of the user according to at least one of an equipment code or an account number associated with the user, and then determining the affiliated static decoding network according to the identity of the user.
  - 4. The method according to claim 1, further comprising:
    - generating the basic name language model; and
      
      building the basic static decoding network associated with the basic name language model.
  - 5. The method according to claim 4, wherein generating the basic name language model further comprises:
    - respectively collecting a name database and a training corpus for a language model;
      
      statistically analyzing conventional words and associated relationships between the conventional words and name words according to the name database and the training corpus for the language model to generate a statistical result; and
      
      generating the basic name language model according to the statistical result.
  - 6. The method according to claim 5, wherein statistically analyzing conventional words and the associated relationships between the conventional words and name words according to the name database and the training corpus for language model further comprises:
    - performing name detection in the training corpus according to names in the name database;
      
      replacing all specific names in the training corpus with a unified virtual name to update the training corpus; and
      
      statistically analyzing conventional words and the associated relationships between the conventional words and name words according to the updated training corpus.
  - 7. The method according to claim 6, wherein generating the basic static decoding network further comprises:
    - providing a virtual pronunciation for the virtual name to allow the virtual name participating in static network extension of an acoustic model as a common word;
      
      determining special nodes in the extended static network according to the virtual pronunciation, wherein the special nodes comprise;
      
      a node which enters a name unit and an ending node of the name unit; and
      
      extending a virtual pronunciation unit on an incoming arc or outgoing arc of the special node to obtain the basic static decoding network associated with the basic name language model.
  - 8. The method according to claim 4, wherein generating the user-specific name language model further comprises:
    - extracting a name from name-association information associated with the user, and recording the name as a name entry;
      
      setting a word frequency probability for the name entry; and
      
      generating the user-specific name language model according to a word frequency probability of the name entry.

9. A speech recognition system based on user personalized information, comprising:
- a memory; and
  
  a processing device, communicatively coupled to the memory, to;
  
  receive a speech signal;
  
  decode the speech signal according to a basic static decoding network to obtain a decoding path on each active node in the basic static decoding network, wherein the basic static decoding network is formed by extending words in a basic name language model into corresponding acoustic units, and wherein the basic name language model comprises a first statistical probability between two common words and a second statistical probability between a common word and a name;
  
  generate a user-specific name language model comprising a third statistical probability between the name and a user identifier;
  
  build an affiliated static decoding network associated with the user-specific name language model by extending words in the user-specific name language model into corresponding acoustic units, wherein to build the affiliated static decoding network, the processing device is further to;
  
  set a first pronunciation of a first word at a beginning of a sentence in the user-specific name language model to a first virtual pronunciation;
  
  set a second pronunciation of a second word at an end of the sentence in the user-specific name language model to a second virtual pronunciation; and
  
  extend a special pronunciation unit on an outgoing arc of a node corresponding to the beginning of the sentence and an incoming arc of the node corresponding to the end of the sentence to obtain the affiliated static decoding network associated with the user-specific name language model; and
  
  responsive to identifying that a decoding path enters a name node in the basic static decoding network, extend an extra network associated with the name node according to the affiliated static decoding network; and
  
  return a recognition result after the decoding is completed, wherein a recognition accuracy rate for names is improved.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system according to claim 9, wherein to determine the affiliated static decoding network, the processing device is further to determining the affiliated static decoding network one of before or after the speech signal is decoded according to the basic static decoding network.
  - 11. The system according to claim 10, wherein to determine the affiliated static decoding network, the processing device is further to:
    - determine an identity of the user according to a feature of the speech signal, and then determining the affiliated static decoding network according to the identity of the user;
      
      ordetermine the identity of the user according to at least one of an equipment code or an account number associated with the user, and then determining the affiliated static decoding network according to the identity of the user.
  - 12. The system according to claim 9, wherein the processing device is further to:
    - generate the basic name language model; and
      
      build the basic static decoding network associated with the basic name language model.
  - 13. The system according to claim 12, wherein to generate the basic name language model, the processing device is further to:
    - respectively collect a name database and a training corpus for a language model;
      
      statistically analyze conventional words and associated relationships between the conventional words and name words according to the name database and the training corpus for the language model to generate a statistical result; and
      
      generate the basic name language model according to the statistical result.
  - 14. The system according to claim 13, wherein to analyze conventional words and the associated relationships, the processing device is further to:
    - perform name detection in the training corpus according to names in the name database;
      
      replace all specific names in the training corpus with a unified virtual name to update the training corpus; and
      
      statistically analyze conventional words and the associated relationships between the conventional words and name words according to the updated training corpus.
  - 15. The system according to claim 14, wherein to generate the basic static decoding network, the processing device is further to:
    - provide a virtual pronunciation for the virtual name to allow the virtual name participating in static network extension of an acoustic model as a common word;
      
      determine special nodes in the extended static network according to the virtual pronunciation, wherein the special nodes comprise;
      
      a node which enters a name unit and an ending node of the name unit; and
      
      extend a virtual pronunciation unit on an incoming arc or outgoing arc of the special node to obtain the basic static decoding network associated with the basic name language model.
  - 16. The system according to claim 12, wherein to generate the user-specific name language model, the processing device is further to:
    - extract a name from name-association information associated with the user, and recording the name as a name entry;
      
      set a word frequency probability for the name entry; and
      
      generate the user-specific name language model according to a word frequency probability of the name entry.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Iflytek Co., Ltd.
Original Assignee
Iflytek Co., Ltd.
Inventors
Pan, Qinghua, He, Tingting, Hu, Guoping, Hu, Yu, Liu, Qingfeng
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KIM, JONATHAN C

Application Number

US14/655,946
Publication Number

US 20150348542A1
Time in Patent Office

1,145 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/083   Recognition networks G10L15...

G10L 15/14   using statistical models, e...

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/197   Probabilistic grammars, e.g...

G10L 19/00   Speech or audio signals ana...

G10L 2015/227   of the speaker; Human-fact...

Speech recognition method and system based on user personalized information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method and system based on user personalized information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links