Method and system for enhancing a speech database
First Claim
Patent Images
1. A method comprising:
- labeling, via a processor, audio speech files in a primary speech database, to yield labeled audio speech files;
identifying segments in the labeled audio speech files that have varying pronunciations within a language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones;
creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database;
enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and
storing the enhanced primary speech database for use in speech synthesis.
7 Assignments
0 Petitions
Accused Products
Abstract
A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
-
Citations
20 Claims
-
1. A method comprising:
-
labeling, via a processor, audio speech files in a primary speech database, to yield labeled audio speech files; identifying segments in the labeled audio speech files that have varying pronunciations within a language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones; creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database; enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and storing the enhanced primary speech database for use in speech synthesis. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium having stored instructions which, when executed by a computing device, cause the computing device to perform a method comprising:
-
labeling audio speech files in a primary speech database, to yield labeled audio speech files; identifying segments in the labeled audio speech files that have varying pronunciations within a same language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones; creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database; enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and storing the enhanced primary speech database for use in speech synthesis. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system that enhances a speech database for speech synthesis, comprising:
-
a processor; a primary speech database in a language; and a computer-readable medium to store instructions which, when executed by the processor, perform a method comprising; labeling audio speech files in the primary speech database, to yield labeled audio speech files; identifying segments in the labeled audio speech files that have varying pronunciations within the language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones; creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database, to yield modified segments; enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and storing the enhanced primary speech database for use in speech synthesis. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification