Automatically finding acronyms and synonyms in a corpus
First Claim
1. A method in a computer system for identifying acronym and synonym pairs for a selected target corpus, the method comprising:
- analyzing each sentence in a target corpus to identify possible acronym and synonym pairs;
determining, using a processor associated with a computer system, an occurrence frequency of each identified possible acronym and synonym pair from among a plurality of possible acronym and synonym pairs;
determining a maximum possible length for each identified possible acronym and synonym pair;
receiving a user-selected relative weighting factor from a user for weighting an occurrence frequency relative to a maximum possible length;
scoring each identified possible acronym and synonym pair based on the user-selected weighting factor, occurrence frequency and maximum possible length, and wherein the scoring of each identified possible acronym and synonym pair further includes only scoring pairs with a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency;
determining that at least one of the identified acronym and synonym pairs includes a pair in which a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency;
only ranking the at least one identified acronym and synonym pair with the longer maximum length, such that only one of those pairs that had substantially the same occurrence frequency is ranked, wherein each of the acronym and synonym pairs are ranked relative to the plurality of ranked acronym and synonym pairs; and
displaying the ranked acronym and synonym pairs from among the plurality of ranked acronym and synonym pairs.
1 Assignment
0 Petitions
Accused Products
Abstract
Acronym and synonym pairs can be identified and retrieved automatically in a corpus and/or across an enterprise based on customer settings globally or for a single instance. Possible acronym and synonym term pairs can be identified using a rule such as a heuristic, user-defined rule. Rules selected by the user can be used to rank acronym and synonym pairs using factors such as occurrence frequency and maximum term length. A rule interpreter engine executes the user defined rule set to properly identify and retrieve the user selected acronym and synonym pairs through the utilization of a shallow pause read step. Finally, the user selected acronym and synonym pairs are ranked according to the user preferences, and can be displayed or held for subsequent use in searching.
202 Citations
13 Claims
-
1. A method in a computer system for identifying acronym and synonym pairs for a selected target corpus, the method comprising:
-
analyzing each sentence in a target corpus to identify possible acronym and synonym pairs; determining, using a processor associated with a computer system, an occurrence frequency of each identified possible acronym and synonym pair from among a plurality of possible acronym and synonym pairs; determining a maximum possible length for each identified possible acronym and synonym pair; receiving a user-selected relative weighting factor from a user for weighting an occurrence frequency relative to a maximum possible length; scoring each identified possible acronym and synonym pair based on the user-selected weighting factor, occurrence frequency and maximum possible length, and wherein the scoring of each identified possible acronym and synonym pair further includes only scoring pairs with a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency; determining that at least one of the identified acronym and synonym pairs includes a pair in which a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency; only ranking the at least one identified acronym and synonym pair with the longer maximum length, such that only one of those pairs that had substantially the same occurrence frequency is ranked, wherein each of the acronym and synonym pairs are ranked relative to the plurality of ranked acronym and synonym pairs; and displaying the ranked acronym and synonym pairs from among the plurality of ranked acronym and synonym pairs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product embedded in a non-transitory computer readable storage medium for identifying acronym and synonym pairs for a selected target corpus, comprising:
-
program code for analyzing each sentence in a target corpus to identify possible acronym and synonym pairs; program code for determining an occurrence frequency of each identified possible acronym and synonym pair from among a plurality of possible acronym and synonym pairs; program code for determining a maximum possible length for each identified possible acronym and synonym pair; program code for receiving a user-selected relative weighting factor from a user for weighting an occurrence frequency relative to a maximum possible length; program code for scoring each identified possible acronym and synonym pair based on the user-selected weighting factor, occurrence frequency and maximum possible length, and wherein the program code for scoring of each identified possible acronym and synonym pair further includes program code for only scoring pairs with a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency; program code for determining that at least one of the identified acronym and synonym pairs includes a pair in which a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency; program code for only ranking the at least one identified acronym and synonym pair with the longer maximum length, such that only one of those pairs that had substantially the same occurrence frequency is ranked, wherein each of the acronym and synonym pairs are ranked relative to the plurality of ranked acronym and synonym pairs; and program code for displaying the ranked acronym and synonym pairs from among the plurality of ranked acronym and synonym pairs. - View Dependent Claims (11)
-
-
12. A system for identifying acronym and synonym pairs for a selected target corpus, the system comprising a processor operable to execute instructions and a data storage medium for storing the instructions that, when executed by the processor, cause the processor to:
-
analyze each sentence in a target corpus to identify possible acronym and synonym pairs; determine an occurrence frequency of each identified possible acronym and synonym pair from among a plurality of possible acronym and synonym pairs; determine a maximum possible length for each identified possible acronym and synonym pair; receiving a user-selected relative weighting factor from a user for weighting an occurrence frequency relative to a maximum possible length; score each identified possible acronym and synonym pair based on the user-selected weighting factor, occurrence frequency and maximum possible length, and wherein the scoring of each identified possible acronym and synonym pair further includes only scoring pairs with a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency; determine that at least one of the identified acronym and synonym pairs includes a pair in which a longer maximum length higher than terms with a shorter maximum length when those pairs have substantially the same occurrence frequency; only rank the at least one identified acronym and synonym pair with the longer maximum length, such that only one of those pairs that had substantially the same occurrence frequency is ranked, wherein each of the acronym and synonym pairs are ranked relative to the plurality of ranked acronym and synonym pairs; and display the ranked acronym and synonym pairs from among the plurality of ranked acronym and synonym pairs. - View Dependent Claims (13)
-
Specification