Chinese prosodic words forming method and apparatus
First Claim
1. A method of forming Chinese prosodic words, implemented using a computer, said method comprising:
- inputting Chinese text;
performing, via a computer, a process of word segmentation and part of speech annotation for the input Chinese text submitted to the computer to generate an initial prosodic word sequence;
inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence including inserting at least one eliminable indicator in the grid prosodic word sequence;
annotating grids ready to be deleted in the grid prosodic word sequence based on a prosodic word forming means;
comprehensively judging grids which actually need to be deleted in the grids ready to be deleted based on a plurality of prosodic word forming means, the plurality of prosodic word forming means including a prosodic word forming based on a binary prosodic tree, a prosodic word forming based on statistical probability, and a prosodic word forming based on rules, andwherein said comprehensively judging includes providing a trust degree for the grids ready to be deleted and judging whether the grids ready to be deleted actually need to be deleted based on said trust degree by checking whether a current grid has been marked with the at least one eliminable indicator; and
the grids which actually need to be deleted in the grid prosodic word sequence are deleted when said comprehensively judging indicates deletion, and forming the words between every two grids in the remaining grids to generate prosodic words.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a method and apparatus of forming Chinese prosodic words, which method comprises the steps of inputting Chinese text; performing process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence; annotating the grids ready to be deleted in the grid prosodic word sequence based on the prosodic word forming means; judging the grids which actually need to be deleted in the grids ready to be deleted based on the prosodic word forming means; deleting the grids which actually need to be deleted in the grid prosodic word sequence, and word forming the words between every two grids in the remaining grids to generate prosodic words. The present invention avoids the defect whereby the type of insertion error of the prosodic word would render the pronunciation hard to understand or unnatural as far as possible, and reduces the number of the type of insertion error of prosodic word boundaries.
10 Citations
10 Claims
-
1. A method of forming Chinese prosodic words, implemented using a computer, said method comprising:
-
inputting Chinese text; performing, via a computer, a process of word segmentation and part of speech annotation for the input Chinese text submitted to the computer to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence including inserting at least one eliminable indicator in the grid prosodic word sequence; annotating grids ready to be deleted in the grid prosodic word sequence based on a prosodic word forming means; comprehensively judging grids which actually need to be deleted in the grids ready to be deleted based on a plurality of prosodic word forming means, the plurality of prosodic word forming means including a prosodic word forming based on a binary prosodic tree, a prosodic word forming based on statistical probability, and a prosodic word forming based on rules, and wherein said comprehensively judging includes providing a trust degree for the grids ready to be deleted and judging whether the grids ready to be deleted actually need to be deleted based on said trust degree by checking whether a current grid has been marked with the at least one eliminable indicator; and the grids which actually need to be deleted in the grid prosodic word sequence are deleted when said comprehensively judging indicates deletion, and forming the words between every two grids in the remaining grids to generate prosodic words. - View Dependent Claims (2, 3)
-
-
4. An apparatus to form Chinese prosodic words, comprising:
-
an input part to input Chinese text; a word segmentation and part of speech annotating part to perform a process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; a prosodic word grid insert part to insert grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence including inserting at least one eliminable indicator in the grid prosodic word sequence; a prosodic word grid delete part to annotate grids ready to be deleted in the grid prosodic word sequence based on a prosodic word forming means, the plurality of prosodic word forming means including a prosodic word forming based on a binary prosodic tree, a prosodic word forming based on statistical probability, and a prosodic word forming based on rules; a grid deletion trust degree evaluation part to comprehensively to judge grids which actually need to be deleted in the grids ready to be deleted based on a plurality of prosodic word forming means and to provide a trust degree for the grids ready to be deleted; a grid deletion part to judge whether the grids ready to be deleted actually need to be deleted based on said trust degree and to delete the grids which actually need to be deleted in the grid prosodic word sequence in accordance with a result from the grid deletion part by checking whether a current grid has been marked with the at least one eliminable indicator; and a prosodic word generating part to form the words between every two grids in the remaining grids to generate prosodic words. - View Dependent Claims (5, 6, 7)
-
-
8. A program embedded in an apparatus and causing the apparatus to execute an operation including forming Chinese prosodic words, the operation comprising:
-
inputting Chinese text; performing a process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence including inserting at least one eliminable indicator in the grid prosodic word sequence; annotating grids ready to be deleted in the grid prosodic word sequence based on a prosodic word forming means; comprehensively judging grids which actually need to be deleted in the grids ready to be deleted based on a plurality of prosodic word forming means, said comprehensively judging includes providing a trust degree for the grids ready to be deleted and judging whether the grids ready to be deleted actually need to be deleted based on said trust degree by checking whether a current grid has been marked with the at least one eliminable indicator; and deleting the grids which actually need to be deleted in the grid prosodic word sequence when said comprehensively judging indicates deletion, and forming the words between every two grids in the remaining grids to generate prosodic words, and wherein the plurality of prosodic word forming means includes a prosodic word forming based on a binary prosodic tree, a prosodic word forming based on statistical probability, and a prosodic word forming based on rules.
-
-
9. A non-transitory computer readable storage medium storing Chinese prosodic words forming program to cause a computer to execute an operation, comprising:
-
inputting Chinese text; performing a process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence including inserting at least one eliminable indicator in the grid prosodic word sequence; annotating grids ready to be deleted in the grid prosodic word sequence based on a prosodic word forming means; comprehensively judging grids which actually need to be deleted in the grids ready to be deleted based on a plurality of prosodic word forming means, the plurality of prosodic word forming means including a prosodic word forming based on a binary prosodic tree, a prosodic word forming based on statistical probability, and a prosodic word forming based on rules, and said comprehensively judging includes providing a trust degree the grids to be deleted; judging whether the grids ready to be deleted actually need to be deleted based on said trust degree by checking whether a current grid has been marked with the at least one eliminable indicator; deleting the grids which actually need to be deleted in the grid prosodic word sequence when said comprehensively judging indicates deletion, and forming the words between every two grids in the remaining grids to generate prosodic words. - View Dependent Claims (10)
-
Specification