Method for normalizing case
First Claim
1. A method for automatically distinguishing significant from insignificant distinctions of upper and lower case in a number of input word types from a natural language text by means of a computer, comprising the steps of:
- assigning an input word type to one of a number of disjoint local groups based on the case, and position, of the letters that make up the input word type;
assigning said input word type to one of a number of disjoint global groups based on which local groups case variants of the input word type are assigned to; and
normalizing cases for said input word type in accordance with predetermined rules associated with the global group said input word type is assigned to.
4 Assignments
0 Petitions
Accused Products
Abstract
A method is disclosed for automatically distinguishing significant from insignificant variants of upper and lower case in a number of input word types by means of a computer. According to the method an input word type is assigned to one of a number of disjoint local groups based on the case, and position, of the letters that make up the input word type. Furthermore, the input word type is assigned to one of a number of disjoint global groups based on which local groups case variants of the input word type are assigned to. Finally the cases of the input word type are normalized in accordance with predetermined rules associated with the global group the input word type is assigned to.
-
Citations
15 Claims
-
1. A method for automatically distinguishing significant from insignificant distinctions of upper and lower case in a number of input word types from a natural language text by means of a computer, comprising the steps of:
-
assigning an input word type to one of a number of disjoint local groups based on the case, and position, of the letters that make up the input word type;
assigning said input word type to one of a number of disjoint global groups based on which local groups case variants of the input word type are assigned to; and
normalizing cases for said input word type in accordance with predetermined rules associated with the global group said input word type is assigned to. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to a local group that is predetermined for the global group said input word type is assigned to.
-
-
3. The method according to claim 1, wherein the step of assigning an input word type to one of a number of disjoint local groups comprises the step of:
assigning an input word type to one of a number of disjoint local groups based on the case of the initial letter of said input word type and the case of the non-initial letters of said input word type.
-
4. The method according to claim 1, wherein the step of assigning an input word type to one of a number of disjoint local groups comprises the steps of:
assigning an input word type to a number of disjoint local groups based on the case of the initial letter of said input word type and whether there are any non-initial letters of said input word type that are of a different case than the initial letter of the input word type or not.
-
5. The method according to claim 1, wherein the step of assigning an input word type to one of a number of disjoint local groups comprises the steps of:
-
assigning an input word type that has an upper case initial letter and no lower case non-initial letters to a first local group;
assigning an input word type that has an upper case initial letter and at least one lower case non-initial letter to a second local group;
assigning an input word type that has a lower case initial letter and no upper case non-initial letters to a third local group; and
assigning an input word type that has a lower case initial letter and at least one upper case letter to a fourth local group.
-
-
6. The method according to claim 5, wherein the step of assigning said input word type to disjoint global groups comprises the steps of:
-
assigning said input word type to a first global group, if one case variant of said input word type is assigned to said first local group, one case variant of said input word type is assigned to said second local group, and no case variant of said input word type is assigned to said third local group;
assigning said input word type to a second global group, if one case variant of said input word type is assigned to said first local group, one case variant of said input word type is assigned to said third local group, and no case variant of said input word type is assigned to said second local group;
assigning each input word type to a third global group, if one case variant of said input word type is assigned to said second local group, one case variant of said input word type is assigned to said third local group, and no case variant of said input word type is assigned to said first local group; and
assigning each input word type to a fourth global group, if one case variant of said input word type is assigned to said first local group, one case variant of said input word type is assigned to said second local group, and one case variant of said input word type is assigned to said third local group.
-
-
7. The method according to claim 6, wherein the step of normalizing cases comprises the steps of:
-
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said second local group, if said input word type is assigned to said first global group;
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said third local group, if said input word type is assigned to said second global group;
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said third local group, if said input word type is assigned to said third global group; and
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said second local group, if said input word type is assigned to said fourth global group.
-
-
8. The method according to claim 1, wherein the input word types each are associated with a frequency indicator indicating the number of occurrences of the input word type in said natural language text, and wherein the step of normalizing comprises the step of:
normalizing cases for said input word type in accordance with predetermined rules associated with the global group said input word type is assigned to and the frequency indicators the case variants of said input word type are associated with.
-
9. The method according to claim 6, wherein the input word types each are associated with a frequency indicator indicating the number of occurrences of the input word type in said natural language text, and wherein the step of normalizing comprises the step of:
-
normalizing cases of said input word type according to the cases of the case variant of said input word type that is associated to the largest frequency indicator, if said input word type is assigned to said first global group, said second global group, or said third global group;
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to the second local group, if said input word type is assigned to said fourth global group and the case variant of said input word type that is assigned to said second local group is associated with a frequency indicator that is larger than the frequency indicator that the case variant of said input word type that is assigned to the first local group is associated with; and
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to the first local group, if said input word type is assigned to said fourth global group and the case variant of said input word type that is assigned to said second local group is associated with a frequency indicator that is less than the frequency indicator that the case variant of said input word type that is assigned to the first local group is associated with.
-
-
10. The method according to claim 1, wherein said input word types each are associated with a sentence position indicator indicating whether the input word type occurred in an internal position of a sentence and/or in an initial position of a sentence in said natural language text, and wherein the step of normalizing comprises the step of:
normalizing cases for said input word type in accordance with predetermined rules associated with the global group said input word type is assigned to and the sentence position indicator the case variants of said input word type are associated with.
-
11. The method according to claim 6, wherein said input word types each are associated with a sentence position indicator indicating whether the input word type occurred in an internal position of a sentence and/or in an initial position of a sentence in said natural language text, and wherein the step of normalizing comprises the steps of:
-
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said second local group, if said input word type is assigned to said first global group;
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said third local group, if said input word type is assigned to said second global group;
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said third local group, if said input word type is assigned to said third global group and the case variant of said input word type that is assigned to said second local group is not associated with a sentence position indicator indicating that the input word type occurred in an internal position of a sentence in said natural language text; and
normalizing cases of said input word type according to the cases of the case variant of said input word type that is assigned to said second local group, if said input word type is assigned to said fourth global group and the case variant of said input word type that is assigned to said second local group is not associated with a sentence position indicator indicating that the input word type occurred in an internal position of a sentence in said natural language text.
-
-
12. The method according to claim 1, further comprising the step of:
storing said input word types with normalized cases in an electronic storage means.
-
13. A computer processor arranged to perform the steps recited in claim 1.
-
14. A computer readable medium having computer-executable instructions for a computer to perform the steps recited in claim 1.
-
15. A computer program comprising computer-executable instructions for a computer to perform the steps recited in claim 1.
Specification