Method of identifying a language and of controlling a speech synthesis unit and a communication device

US 6,711,542 B2
Filed: 12/28/2000
Issued: 03/23/2004
Est. Priority Date: 12/30/1999
Status: Active Grant

First Claim

Patent Images

1. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text;

wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained; and

the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distribution are ascertained.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a method of identifying a language in which a text is composed in the form of a string of characters, and also to a method of controlling a speech reproduction unit and to a communication device. To be able to carry out language identification with little expenditure, it is provided according to the invention that a frequency distribution (h₁(x), h₂(x,y), h₃(x,y,z)) of letters in the text is ascertained, the ascertained frequency distribution (h₁(x), h₂(x,y), h₃(x,y,z)) is compared with corresponding frequency distributions (l₁(x), l₂(x,y), l₃(x,y,z)) of available languages, in order to ascertain similarity factors (s₁, S₂, s₃) which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor (S₁, S₂, S₃) is the greatest is established as the language of the text.

36 Citations

View as Search Results

58 Claims

1. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text;
- wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained; and
  
  the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distribution are ascertained.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. Method according to claim 1, wherein the language is only established if the greatest similarity factor ascertained is greater than a threshold value.
  - 3. Method according to claim 1, wherein the ascertained frequency distribution is stored as the frequency distribution of a new language or is added to a corresponding frequency distribution of a language if, in response to an inquiry, a language to which the ascertained frequency distribution is to be assigned is indicated.
  - 4. Method according to claim 1, wherein the ascertained frequency distribution is added to the corresponding frequency distribution of the language established.
  - 5. Method according to claim 1, wherein all non-letter characters, apart from spaces, are removed from the string of characters of the text, in order to ascertain from the string of characters thus obtained frequency distributions of letters and groups of letters in the text.
  - 6. Method according to claim 1, wherein the frequency distributions of groups of letters with three letters, of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the square of the number of letters in the alphabet.
  - 7. Method according to claim 1, wherein the frequency distributions of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the number of letters in the alphabet.
  - 8. Method according to claim 1, wherein the frequency distribution of individual letters is ascertained if the number of letters in the text is less than the number of letters in the alphabet.
  - 9. Method according to claim 1, wherein a complete alphabet is used, also including special letters of various languages based on Latin letters.
  - 10. Method according to claim 1, wherein the letters present in the text are investigated for special letters, in order to select according to the presence or absence of special letters, characteristic of certain languages the languages which are to be taken into consideration in the comparison of the ascertained frequency distribution with corresponding frequency distributions of available languages.
  - 11. Method according to claim 1, wherein after establishing the language, the letters present in the text are investigated for special letters which are characteristic of the languages established and of languages not established, in order to confirm the language established.
  - 12. A Method of controlling a speech reproduction unit, in which

13. Communication device witha receiving module for receiving, processing and managing information, a speech synthesis module, which for the spoken output of texts is in connection with the receiving module, and a language identification module, to which a text to be output by the speech synthesis module can be fed for identifying the language in which the text to be output is composed, and which is connected to the speech synthesis module for transmitting a language established for this text;
- wherein the language identification module comprises a statistics circuit, in order to ascertain a frequency distribution of letters in the text, and the statistics circuit has first, second and third computing circuits, in order to ascertain frequency distributions of individual letters, of groups of letters with two letters and of groups of letters with three letters.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. Communication device according to claim 13, wherein pronunciation rules for various languages are stored in the speech synthesis module.
  - 15. Communication device according to claim 14, wherein a pronunciation-rules selection circuit is provided in the speech synthesis module, which circuit is connected to the language identification module and, depending on the language transmitted by the language identification module, selects the corresponding pronunciation rule, so that it can be used by a speech synthesis unit of the speech synthesis module.
  - 16. Communication device according to claim 13, wherein the language identification module comprises a filter circuit, in order to remove all non-letter characters, apart from spaces, from a setting of characters of a text.
  - 17. Communication device according to claim 13, wherein the language identification module has a comparator circuit, in order to compare for the ascertainment of similarly factors for a text ascertained frequency distributions of letters with corresponding stored frequency distributions of available languages.
  - 18. Communication device according to claim 17, wherein the language identification module comprises an evaluation circuit, to which the similarity factors can be fed by the comparator circuit, in order to establish the language for which the ascertained similarity factor is greatest as the language of the text.

19. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text;
- wherein the letters present in the text are investigated for special letters, in order to select according to the presence or absence of special letters, characteristic of certain languages the languages which are to be taken into consideration in the comparison of the ascertained frequency distribution with corresponding frequency distributions of available languages.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 20. Method according to claim 19, wherein the language is only established if the greatest similarity factor ascertained is greater than a threshold value.
  - 21. Method according to claim 19, wherein the ascertained frequency distribution is stored as the frequency distribution of a new language or is added to a corresponding frequency distribution of a language if, in response to an inquiry, a language to which the ascertained frequency distribution is to be assigned is indicated.
  - 22. Method according to claim 19, wherein the ascertained frequency distribution is added to the corresponding frequency distribution of the language established.
  - 23. Method according to claim 19, wherein all non-letter characters, apart from spaces, are removed from the string of characters of the text, in order to ascertain from the string of characters thus obtained frequency distributions of letters and groups of letters in the text.
  - 24. Method according to claim 19, wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained.
  - 25. Method according to claim 24, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in the alphabet, in order to determine which frequency distributions are ascertained;
    - and the frequency distributions of groups of letters with three letters, of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the square of the number of letters in the alphabet.
  - 26. Method according to claim 24, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distributions are ascertained, and
- 27. Method according to claim 24, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in the alphabet, in order to determine which frequency distributions are ascertained;
  - andthe frequency distribution of individual letters is ascertained if the number of letters in the text is less than the number of letters in the alphabet.
- 28. Method according to claim 19, wherein a complete alphabet is used, also including special letters of various languages based on Latin letters.
- 29. Method according to claim 19, wherein after establishing the language, the letters present in the text are investigated for special letters which are characteristic of the language established and of languages not established, in order to confirm the language established.
- 30. Method of controlling a speech reproduction unit, in whicha language identification according to claim 19 is carried out for a text to be output in spoken form by means of a speech synthesis module of the speech reproduction unit, the language thereby established is transmitted to the speech reproduction unit, and in the speech reproduction unit, the pronunciation rules of the language established are selected and used for the synthetic speech reproduction of the text by the speech synthesis module.

31. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text;
- wherein after establishing the language, the letters present in the text are investigated for special letters which are characteristic of the language established and of languages not established, in order to confirm the language established.
- View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
- - 32. Method according to claim 31, wherein the language is only established if the greatest similarity factor ascertained is greater than a threshold value.
  - 33. Method according to claim 31, wherein the ascertained frequency distribution is stored as the frequency distribution of a new language or is added to a corresponding frequency distribution of a language if, in response to an inquiry, a language to which the ascertained frequency distribution is to be assigned is indicated.
  - 34. Method according to claim 31, wherein the ascertained frequency distribution is added to the corresponding frequency distribution of the language established.
  - 35. Method according to claim 31, characterized in that all non-letter characters, apart from spaces, are removed from the string of characters of the text, in order to ascertain from the string of characters thus obtained frequency distributions of letters and groups of letters in the text.
  - 36. Method according to claim 31, wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained.
  - 37. Method according to claim 36, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distributions are ascertained;
    - and the frequency distributions of groups of letters with three letters, of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the square of the number of letters in the alphabet.
  - 38. Method according to claim 36, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in the alphabet, in order to determine which frequency distributions are ascertained;
    - and the frequency distributions of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the number of letters in the alphabet.
  - 39. Method according to claim 36, wherein the length of the text is established as the number of letters in the text and in that the number of letters in the text is compared with the number of letters in an alphabet, in order to determine which frequency distributions are ascertained;
    - and the frequency distribution of individual letters is ascertained if the number of letters in the text is less than the number of letters in the alphabet.
  - 40. Method according to claim 31, wherein a complete alphabet is used, also including special letters of various languages based on Latin letters.
  - 41. Method of controlling a speech reproduction unit, in which

42. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text;
- wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained.
- View Dependent Claims (43, 44, 45, 46, 47, 48)
- - 43. Method according to claim 42, wherein the language is only established if the greatest similarity factor ascertained is greater than a threshold value.
  - 44. Method according to claim 42, wherein in that the ascertained frequency distribution is stored as the frequency distribution of a new language or is added to a corresponding frequency distribution of a language if, in response to an inquiry, a language to which the ascertained frequency distribution is to be assigned is indicated.
  - 45. Method according to claim 42, wherein the ascertained frequency distribution is added to the corresponding frequency distribution of the language established.
  - 46. Method according to claim 42, wherein all non-letter characters, apart from spaces, are removed from the string of characters of the text, in order to ascertain from the string of characters thus obtained frequency distributions of letters and groups of letters in the text.
  - 47. Method according to claim 42, wherein a complete alphabet is used, also including special letters of various languages based on Latin letters.
  - 48. Method of controlling a speech reproduction unit, in which

49. Method of identifying a language in which a text is composed as a string of characters, in whicha frequency distribution of letters in the text is ascertained, the ascertained frequency distribution is compared with corresponding frequency distributions of available languages, in order to ascertain similarity factors which indicate the similarity of the language of the text with the available languages, and the language for which the ascertained similarity factor is the greatest is established as the language of the text, wherein the length of the text is established and, depending on the length of the text, one, two or more frequency distributions of letters and groups of letters in the text are ascertained.
- View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58)
- - 50. Method according to claim 49, wherein the language is only established if the greatest similarity factor ascertained is greater than a threshold value.
  - 51. Method according to claim 49, wherein the ascertained frequency distribution is stored as the frequency distribution of a new language or is added to a corresponding frequency distribution of a language if, in response to an inquiry, a language to which the ascertained frequency distribution is to be assigned is indicated.
  - 52. Method according to claim 49, wherein the ascertained frequency distribution is added to the corresponding frequency distribution of the language established.
  - 53. Method according to claim 49, wherein all non-letter characters, apart from spaces, are removed from the string of characters of the text, in order to ascertain from the string of characters thus obtained frequency distributions of letters and groups of letters in the text.
  - 54. Method according to claim 49, wherein the frequency distributions of groups of letters with three letters, of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the square of the number of letters in the alphabet.
  - 55. Method according to claim 54, wherein the frequency distributions of groups of letters with two letters and of individual letters are ascertained if the number of letters in the text is greater than the number of letters in the alphabet.
  - 56. Method according to claim 54, characterized in that the frequency distribution of individual letters is ascertained if the number of letters in the text is less than the number of letters in the alphabet.
  - 57. Method according to claim 49, wherein a complete alphabet is used, also including special letters of various languages based on Latin letters.
  - 58. Method of controlling a speech reproduction unit, in whicha language identification according to claim 49 is carried out for a text to be output in spoken form by means of a speech synthesis module of the speech reproduction unit, the language thereby established is transmitted to the speech reproduction unit, and in the speech reproduction unit, the pronunciation rules of the language established are selected and used for the synthetic speech reproduction of the text by the speech synthesis module.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ironworks Patents LLC
Original Assignee
Nokia Mobile Phones UK Limited (Nokia Corporation)
Inventors
Theimer, Wofgang
Primary Examiner(s)
Smits, Talivaldis Ivars
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/751,161
Publication Number

US 20010027394A1
Time in Patent Office

1,181 Days
Field of Search

704/257, 704/240, 704/1, 704/9, 704/10, 704/8, 704/2, 704/5, 345/382, 382/170, 382/187, 382/190
US Class Current

704/257
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/263   Language identification

G10L 13/00   Speech synthesis; Text to s...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

Method of identifying a language and of controlling a speech synthesis unit and a communication device

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

36 Citations

58 Claims

Specification

Use Cases

Quick Links

Others

Method of identifying a language and of controlling a speech synthesis unit and a communication device

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

58 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others