System and method for the automatic mining of acronym-expansion pairs patterns and formation rules
First Claim
1. A system for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
- a database for storing previously identified acronym-expansion pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
a formation rule identifier that uses the acronym-expansion pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1 for deriving an acronym-expansion pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived (acronym, expansion) pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
wherein the pattern Pi−
1 defines a format in which the acronym and the expansion occur in the document di; and
wherein the pattern Pi−
1 is a tuple which is expressed in the following format;
(acronym_prefix, acronym_suffix, expansion_prefix, formation_rule, expansion_suffix), where the acronym_prefix and the acronym_suffix are surrounding characters of the acronym, and the expansion_prefix and the expansion_suffix are surrounding characters of the expansion.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer program product is provided as an automatic mining system to identify a set of related information on the World Wide Web using the duality concept. The mining system addresses iteratively refines mutually dependent approximations to their identifications. Specifically, the mining system iteratively refines (i) pairs of phrases related in a specific way; (ii) the patterns of their occurrences in web pages; and (iii) the formation rules. In one embodiment, the automatic mining system identifies (acronym, expansion) pairs in terms of the patterns of their occurrences in the web pages and their formation rules. The automatic mining system includes a formation rule identifier that derives the formation rules, an acronym-expansion pair identifier that derives the (acronym, expansion) pairs, and a pattern identifier that derives the patterns. The database stores the (acronym, expansion) pairs, patterns, and formation rules. Initially, the database begins with small seed sets of (acronym, expansion) pairs, patterns, and formation rules that are continuously and iteratively broadened by the automatic mining system.
-
Citations
18 Claims
-
1. A system for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
a database for storing previously identified acronym-expansion pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
a formation rule identifier that uses the acronym-expansion pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1 for deriving an acronym-expansion pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived (acronym, expansion) pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
wherein the pattern Pi−
1 defines a format in which the acronym and the expansion occur in the document di; and
wherein the pattern Pi−
1 is a tuple which is expressed in the following format;
(acronym_prefix, acronym_suffix, expansion_prefix, formation_rule, expansion_suffix), where the acronym_prefix and the acronym_suffix are surrounding characters of the acronym, and the expansion_prefix and the expansion_suffix are surrounding characters of the expansion.
-
-
2. A system for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
a database for storing previously identified acronym-expansion pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
a formation rule identifier that uses the acronym-expansion pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1, for deriving an acronym-expansion pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived acronym-expansion pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
wherein the pattern Pi−
1 defines a format in which the acronym and the expansion occur in the document di; and
wherein the pattern Pi−
1 includes a set of individual patterns pn and is expressed as follows;
-
-
3. A system for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
a database for storing previously identified acronym-expansion pairs Ri−
1, patterns Pi−
1, and formation rules Ei;
a formation rule identifier that uses the (acronym-expansion) pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1 for deriving an (acronym-expansion) pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived (acronym-expansion) pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
wherein the pattern Pi−
1 defines a format in which the acronym and the expansion occur in the document di; and
wherein the acronym-expansion formation rule Ei−
1 defines a format in which the acronym is associated with the expansion.
-
-
4. A system for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
a database for storing previously identified (acronym-expansion) pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
a formation rule identifier that uses the (acronym-expansion) pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1 for deriving an (acronym-expansion) pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived (acronym expansion) pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
wherein the pattern Pi−
1 defines a format in which the acronym and the expansion occur in the document di; and
wherein the formation rule Ei−
1 includes asset of individual formation rules en and is expressed as follows;
- View Dependent Claims (5, 6, 7)
-
-
8. A computer program product for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
a database for storing previously identified (acronym-expansion) pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
a formation rule identifier that uses the (acronym-expansion) pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1 for deriving an (acronym-expansion) pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived (acronym-expansion) pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
wherein the pattern Pi−
1 defines a format in which the acronym and the expansion occur in document di;
wherein the pattern Pi−
1 is a tuple which is expressed in the following format;
(acronym_prefix, acronym_suffix, expansion_prefix, formation_rule, expansion_suffix), where the acronym_prefix and the acronym_suffix are surrounding characters of the acronym, and the expansion_prefix and the expansion_suffix are surrounding characters of the expansion; and
wherein the pattern Pi−
1 includes a set of individual patterns pn and is expressed as follows;
- View Dependent Claims (9)
-
-
10. A computer program product for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
a database for storing previously identified (acronym-expansion) pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
a formation rule identifier that uses the (acronym-expansion) pairs Ri−
1 for deriving a formation rule Ei;
an acronym-expansion pair identifier that uses the document di and the patterns Pi−
1 for deriving an (acronym-expansion) pair Ri;
a pattern identifier that uses the document di, the derived formation rule Ei, the derived (acronym-expansion) pairs Ri, and the patterns Pi−
1 for deriving a pattern Pi;
wherein the acronym-expansion formation rule Ei−
1 defines the format in which the acronym is associated with the expansion; and
wherein the formation rule Ei−
1 includes a set of individual formation rules en and is expressed as follows;
- View Dependent Claims (11)
-
-
12. A method for automatically and iteratively mining acronyms and expansion in a document di through patterns of occurrences and formation rules, comprising:
-
storing previously identified (acronym-expansion) pairs Ri−
1, patterns Pi−
1, and formation rules Ei−
1;
using the (acronym-expansion) pairs Ri−
1 for deriving a formation rule Ei;
using the document di and the patterns Pi−
1for deriving an (acronym-expansion) pair Ri;
using the document di, the derived formation rule Ei, the derived (acronym-expansion) pairs Ri, and the patterns Pi−
1, for deriving a pattern Pi;
further including defining the pattern Pi−
1 by a format in which the acronym and the expansion occur in the document di; and
wherein defining the pattern Pi−
1 includes expressing the pattern Pi−
1 by a tuple in the following format;
(acronym_prefix, acronym_suffix, expansion_prefix, formation_rule, expansion_suffix), where the acronym_prefix and the acronym_suffix are surrounding characters of the acronym, and the expansion_prefix and the expansion_suffix are surrounding characters of the expansion.- View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
16. The method according to claim 15, wherein each individual formation rule en includes a sequence of replacement rules that are interspersed with an intermediate.
-
17. The method according to claim 16, wherein the intermediate includes a string of characters between words in the expansion that are not a part of the acronym.
-
18. The method according to claim 17, wherein a replacement rule is a tuple expressed as:
- (substring_beginPosition, substring_endPosition, replacee, replacer), where the substring_beginPosition is the position of a leading character of the expansion to be placed in the acronym, the substring_endPosition is the position of an ending character of the expansion to be placed in the acronym, the replacee is a substring to be replaced by another substring referred to as the replacer, if any, in the acronym.
Specification