Acronym Extraction System and Method of Identifying Acronyms and Extracting Corresponding Expansions from Text
0 Assignments
0 Petitions
Accused Products
Abstract
An acronym expansion system of the present invention receives electronic documents and extracts acronyms and their corresponding expansions. A part-of-speech tagger decomposes text into string tokens or words and tags them with their part-of-speech, while an acronym identifier determines whether a word is a potential acronym based on various conditions. An expansion identifier retrieves lists of words preceding and following a potential acronym to search for the expansion. The resulting word lists are examined sequentially to identify and retrieve an expansion for the potential acronym. An expansion extractor receives the potential acronym and a processed word list to retrieve the expansion of the potential acronym from that list. The extractor may utilize information from prior search iterations, and verifies an extracted expansion against a set of rules to remove spurious expansions.
43 Citations
134 Claims
-
1-95. -95. (canceled)
-
96. A system for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
a computer system to receive said text and identify abbreviated terms and corresponding expansions therein, said computer system including;
an identification module to examine said text to identify at least one abbreviated term residing therein;
an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and
an expansion extraction module to produce a plurality of sets of terms from a corresponding retrieved text portion with each set including a member term that includes an initial portion containing an initial portion of said identified abbreviated term, to compare said identified abbreviated term with said produced sets to extract an expansion for said abbreviated term from one of said sets, and to verify said extracted expansion to produce a valid expansion for said identified abbreviated term. - View Dependent Claims (97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
-
109. A method of identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
-
(a) examining said text to identify at least one abbreviated term residing therein;
(b) retrieving at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and
(c) producing a valid expansion for said identified abbreviated term by producing a plurality of sets of terms from a corresponding retrieved text portion with each set including a member term that includes an initial portion containing an initial portion of said identified abbreviated term, comparing said identified abbreviated term with said produced sets to extract an expansion for said abbreviated term from one of said sets, and verifying said extracted expansion. - View Dependent Claims (110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121)
-
-
122. A program product apparatus including a computer readable medium with computer program logic recorded thereon for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases, said program product apparatus comprising:
-
an identification module to examine said text to identify at least one abbreviated term residing therein;
an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and
an expansion extraction module produce a plurality of sets of terms from a corresponding retrieved text portion with each set including a member term that includes an initial portion containing an initial portion of said identified abbreviated term, to compare said identified abbreviated term with said produced sets to extract an expansion for said abbreviated term from one of said sets, and to verify said extracted expansion to produce a valid expansion for said identified abbreviated term. - View Dependent Claims (123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134)
-
Specification