Method of identifying a language and of controlling a speech synthesis unit and a communication device
First Claim
1. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text;
- wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained; and
the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distribution are ascertained.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to a method of identifying a language in which a text is composed in the form of a string of characters, and also to a method of controlling a speech reproduction unit and to a communication device. To be able to carry out language identification with little expenditure, it is provided according to the invention that a frequency distribution (h1(x), h2(x,y), h3(x,y,z)) of letters in the text is ascertained, the ascertained frequency distribution (h1(x), h2(x,y), h3(x,y,z)) is compared with corresponding frequency distributions (l1(x), l2(x,y), l3(x,y,z)) of available languages, in order to ascertain similarity factors (s1, S2, s3) which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor (S1, S2, S3) is the greatest is established as the language of the text.
36 Citations
58 Claims
-
1. Method of identifying a language in which a text is composed as a string of characters, in which
a frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text; -
wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained; and
the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distribution are ascertained. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
a language identification according to claim 1 is carried out for a text to be output in spoken form by means of a speech synthesis module of the speech reproduction unit, the language thereby established is transmitted to the speech reproduction unit, and in the speech reproduction unit, the pronunciation rules of the language established are selected and used for the synthetic speech reproduction of the text by the speech synthesis module.
-
-
13. Communication device with
a receiving module for receiving, processing and managing information, a speech synthesis module, which for the spoken output of texts is in connection with the receiving module, and a language identification module, to which a text to be output by the speech synthesis module can be fed for identifying the language in which the text to be output is composed, and which is connected to the speech synthesis module for transmitting a language established for this text; -
wherein the language identification module comprises a statistics circuit, in order to ascertain a frequency distribution of letters in the text, and the statistics circuit has first, second and third computing circuits, in order to ascertain frequency distributions of individual letters, of groups of letters with two letters and of groups of letters with three letters. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. Method of identifying a language in which a text is composed as a string of characters, in which
a frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text; -
wherein the letters present in the text are investigated for special letters, in order to select according to the presence or absence of special letters, characteristic of certain languages the languages which are to be taken into consideration in the comparison of the ascertained frequency distribution with corresponding frequency distributions of available languages. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
the frequency distributions of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the number of letters in the alphabet. -
27. Method according to claim 24, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in the alphabet, in order to determine which frequency distributions are ascertained;
- and
the frequency distribution of individual letters is ascertained if the number of letters in the text is less than the number of letters in the alphabet.
- and
-
28. Method according to claim 19, wherein a complete alphabet is used, also including special letters of various languages based on Latin letters.
-
29. Method according to claim 19, wherein after establishing the language, the letters present in the text are investigated for special letters which are characteristic of the language established and of languages not established, in order to confirm the language established.
-
30. Method of controlling a speech reproduction unit, in which
a language identification according to claim 19 is carried out for a text to be output in spoken form by means of a speech synthesis module of the speech reproduction unit, the language thereby established is transmitted to the speech reproduction unit, and in the speech reproduction unit, the pronunciation rules of the language established are selected and used for the synthetic speech reproduction of the text by the speech synthesis module.
-
-
31. Method of identifying a language in which a text is composed as a string of characters, in which
a frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text; -
wherein after establishing the language, the letters present in the text are investigated for special letters which are characteristic of the language established and of languages not established, in order to confirm the language established. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
a language identification according to claim 35 is carried out for a text to be output in spoken form by means of a speech synthesis module of the speech reproduction unit, the language thereby established is transmitted to the speech reproduction unit, and in the speech reproduction unit, the pronunciation rules of the language established are selected and used for the synthetic speech reproduction of the text by the speech synthesis module.
-
-
42. Method of identifying a language in which a text is composed as a string of characters, in which
a frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text; -
wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained. - View Dependent Claims (43, 44, 45, 46, 47, 48)
a language identification according to claim 42, is carried out for a text to be output in spoken form by means of a speech synthesis module in spoken form by means of a speech synthesis module of the speech reproduction unit, the language thereby established is transmitted to the speech reproduction unit, and in the speech reproduction unit, the pronunciation rules of the language established are selected and used for the synthetic speech reproduction of the text by the speech synthesis module.
-
-
49. Method of identifying a language in which a text is composed as a string of characters, in which
a frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text, wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained.
Specification