Chinese speech recognition system and method
First Claim
1. A Chinese speech recognition system comprisinga language model storage device containing a plurality of language models, including a factored language model;
- a hierarchical prosodic model comprising a plurality of prosodic models, including a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model and a syllable-juncture prosodic-acoustic model;
a speech recognition device receiving a speech signal, recognizing said speech signal and outputting a word lattice; and
a rescorer connected with said language model storage device, said hierarchical prosodic model and said speech recognition device, receiving said word lattice, rescoring and reranking word arcs of said word lattice according to said prosodic break model, said prosodic state model, said syllable prosodic-acoustic model and said syllable juncture prosodic-acoustic model, and outputting a language tag, a prosodic tag and a phonetic segmentation tag corresponding to said speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.
-
Citations
24 Claims
-
1. A Chinese speech recognition system comprising
a language model storage device containing a plurality of language models, including a factored language model; -
a hierarchical prosodic model comprising a plurality of prosodic models, including a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model and a syllable-juncture prosodic-acoustic model; a speech recognition device receiving a speech signal, recognizing said speech signal and outputting a word lattice; and a rescorer connected with said language model storage device, said hierarchical prosodic model and said speech recognition device, receiving said word lattice, rescoring and reranking word arcs of said word lattice according to said prosodic break model, said prosodic state model, said syllable prosodic-acoustic model and said syllable juncture prosodic-acoustic model, and outputting a language tag, a prosodic tag and a phonetic segmentation tag corresponding to said speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A Chinese speech recognition method comprising steps:
-
receiving a speech signal, recognizing said speech signal and outputting a word lattice by a speech recognition device; and receiving said word lattice, rescoring word arcs of said word lattice according to a prosodic break model, a prosodic state model, a syllable prosodic acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model stored in a language model storage device, reranking said word arcs, and outputting a language tag, a prosodic tag and a phonetic segmentation tag by a rescorer. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification