Method for the automatic classification of a text with the aid of a computer system
First Claim
1. A method for the automatic classification of a text that is contained in an incoming electronic information with the aid of a computer system, the method comprising:
- determining at least one qualitative characteristic of at least one word of the text to be classified;
determining a frequency of occurrence of the at least one qualitative characteristic in the text to be classified;
converting the text to be classified to a sequence of alphanumerical characters;
dismantling the sequence of alphanumerical characters in at least one specified way into a character shingle;
determining a frequency of occurrence of the character shingle in the text to be classified;
determining a vector from the at least one qualitative characteristic and an associated frequency of occurrence, and the character shingle and the associated frequency of occurrence;
comparing the determined vector to previously determined vectors of known example texts that are determined in the same way, wherein each of the example texts is assigned to a class; and
assigning the text to be classified in dependence of the comparison to one of the classes to which the example texts are assigned.
3 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment of the present invention, a method is disclosed for the automatic classification of a text contained in an incoming electronic information. At least one qualitative characteristic of at least one word of the text to be classified is determined and the frequency of occurrence of the qualitative characteristic in the text to be classified is also determined. The text to be classified is converted into a sequence of alphanumerical characters, the sequence of alphanumerical characters is dismantled in at least one specified way to form so-called character shingles, and the frequency of occurrence of the character shingle in the text to be classified is determined. A vector is formed from the qualitative characteristic and the associated frequency as well as from the character shingle and the associated frequency. The determined vector is then compared to vectors which are formed ahead of time with the aid of known example texts and in the same way, wherein each of the example texts is assigned to a class. The text to be classified is assigned in dependence of this comparison to one of the classes to which the example text is assigned.
15 Citations
20 Claims
-
1. A method for the automatic classification of a text that is contained in an incoming electronic information with the aid of a computer system, the method comprising:
-
determining at least one qualitative characteristic of at least one word of the text to be classified; determining a frequency of occurrence of the at least one qualitative characteristic in the text to be classified; converting the text to be classified to a sequence of alphanumerical characters; dismantling the sequence of alphanumerical characters in at least one specified way into a character shingle; determining a frequency of occurrence of the character shingle in the text to be classified; determining a vector from the at least one qualitative characteristic and an associated frequency of occurrence, and the character shingle and the associated frequency of occurrence; comparing the determined vector to previously determined vectors of known example texts that are determined in the same way, wherein each of the example texts is assigned to a class; and assigning the text to be classified in dependence of the comparison to one of the classes to which the example texts are assigned. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification