LANGUAGE MODEL CREATION APPARATUS, LANGUAGE MODEL CREATION METHOD, SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM
First Claim
1. A language model creation apparatus comprising an arithmetic processing unit which reads out input text data saved in a storage unit and creates an N-gram language model,said arithmetic processing unit comprising:
- a frequency counting unit which counts occurrence frequencies in the input text data for respective words or word chains contained in the input text data;
a context diversity calculation unit which calculates, for the respective words or word chains, diversity indices each indicating diversity of words capable of preceding a word or word chain;
a frequency correction unit which calculates corrected occurrence frequencies by correcting occurrence frequencies of the respective words or word chains based on the diversity indices of the respective words or word chains; and
an N-gram language model creation unit which creates an N-gram language model based on the corrected occurrence frequencies of the respective words or word chains.
1 Assignment
0 Petitions
Accused Products
Abstract
A frequency counting unit (15A) counts occurrence frequencies (14B) in input text data (14A) for respective words or word chains contained in the input text data (14A). A context diversity calculation unit (15B) calculates, for the respective words or word chains, diversity indices (14C) each indicating the context diversity of a word or word chain. A frequency correction unit (15C) corrects the occurrence frequencies (14B) of the respective words or word chains based on the diversity indices (14C) of the respective words or word chains. An N-gram language model creation unit (15D) creates an N-gram language model (14E) based on the corrected occurrence frequencies (14D) obtained for the respective words or word chains.
221 Citations
16 Claims
-
1. A language model creation apparatus comprising an arithmetic processing unit which reads out input text data saved in a storage unit and creates an N-gram language model,
said arithmetic processing unit comprising: -
a frequency counting unit which counts occurrence frequencies in the input text data for respective words or word chains contained in the input text data; a context diversity calculation unit which calculates, for the respective words or word chains, diversity indices each indicating diversity of words capable of preceding a word or word chain; a frequency correction unit which calculates corrected occurrence frequencies by correcting occurrence frequencies of the respective words or word chains based on the diversity indices of the respective words or word chains; and an N-gram language model creation unit which creates an N-gram language model based on the corrected occurrence frequencies of the respective words or word chains. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A language model creation method of causing an arithmetic processing unit which reads out input text data saved in a storage unit and creates an N-gram language model, to execute
a frequency counting step of counting occurrence frequencies in the input text data for respective words or word chains contained in the input text data, a context diversity calculation step of calculating, for the respective words or word chains, diversity indices each indicating diversity of words capable of preceding a word or word chain, a frequency correction step of calculating corrected occurrence frequencies by correcting occurrence frequencies of the respective words or word chains based on the diversity indices of the respective words or word chains, and an N-gram language model creation step of creating an N-gram language model based on the corrected occurrence frequencies of the respective words or word chains.
-
11. (canceled)
-
14. (canceled)
-
15. A recording medium recording a program for causing a computer including an arithmetic processing unit which reads out input text data saved in a storage unit and creates an N-gram language model, to execute, by using the arithmetic processing unit,
a frequency counting step of counting occurrence frequencies in the input text data for respective words or word chains contained in the input text data, a context diversity calculation step of calculating, for the respective words or word chains, diversity indices each indicating diversity of words capable of preceding a word or word chain, a frequency correction step of calculating corrected occurrence frequencies by correcting occurrence frequencies of the respective words or word chains based on the diversity indices of the respective words or word chains, and an N-gram language model creation step of creating an N-gram language model based on the corrected occurrence frequencies of the respective words or word chains.
Specification