Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship
First Claim
1. A method for modifying a voice file comprising a plurality of words, the method comprising:
- applying a language model to the voice file as a whole, the language model comprising a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units;
generating a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units;
detecting silences in the voice file;
dividing the voice file into multiple segments based on at least the detected silences;
identifying in the segments one or more second feature units of the plurality of feature units;
applying the language model to the segments, the application of the language model to the segments for generating a second aggregate weight R2 including a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more second feature units;
generating a third aggregate weight R3 determined according to R3=a×
R1+(1−
a)×
R2 where 0<
a<
1; and
modifying the voice file so as to include one or more final punctuations based on at least the third aggregate weight R3.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on a linear combination associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.
29 Citations
23 Claims
-
1. A method for modifying a voice file comprising a plurality of words, the method comprising:
-
applying a language model to the voice file as a whole, the language model comprising a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; generating a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; detecting silences in the voice file; dividing the voice file into multiple segments based on at least the detected silences; identifying in the segments one or more second feature units of the plurality of feature units; applying the language model to the segments, the application of the language model to the segments for generating a second aggregate weight R2 including a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more second feature units; generating a third aggregate weight R3 determined according to R3=a×
R1+(1−
a)×
R2 where 0<
a<
1; andmodifying the voice file so as to include one or more final punctuations based on at least the third aggregate weight R3. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for modifying a voice file comprising a plurality of words, the system comprising:
-
a silence-detection module configured to detect silences in the voice file and to divide the voice file into multiple segments based on at least the detected silences, an identification module configured to; apply a language model to the voice file as a whole, the language model including a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; and identifying in the segments one or more second feature units of the plurality of feature units; and a punctuation-addition module configured to; generate a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; apply the language model to the segments, the application of the language model to the segments to generate a second aggregate weight R2 including a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more second feature units; generate a third aggregate weight R3 determined according to R3=a×
R1+(1−
a)×
R2 where 0<
a<
1; andmodifying the voice file so as to include one or more final punctuations based on at least the third aggregate weight R3. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer readable storage medium comprising programming instructions for modifying a voice file comprising a plurality of words, the programming instructions configured to cause one or more data processors to execute operations comprising:
-
applying a language model to the voice file as a whole, the language model comprising a plurality of feature units, a preliminary punctuation state, and a preliminary weight of the preliminary punctuation state, each of the feature units including a word or phrase, a part of speech or sentence element of the word or phrase, the application of the language model to the voice file as a whole identifying in the voice file as a whole one or more first feature units of the plurality of feature units; generating a first aggregate weight R1 based on a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more first feature units; detecting silences in the voice file; dividing the voice file into multiple segments based on at least the detected silences; identifying in the segments one or more second feature units of the plurality of feature units; applying the language model to the segments, the application of the language model to the segments for generating a second aggregate weight R2 including a combination of the preliminary weights of the preliminary punctuation states corresponding to the identified one or more second feature units; generating a third aggregate weight R3 determined according to R3=a×
R1+(1−
a)×
R2 where 0<
a<
1; andmodifying the voice file so as to include one or more final punctuations based on at least the third aggregate weight R3.
-
Specification