Acronym extraction system and method of identifying acronyms and extracting corresponding expansions from text
First Claim
1. A system for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
- a processing system to receive said text and identify abbreviated terms and corresponding expansions therein, said processing system including;
an identification module to examine said text to identify at least one abbreviated term residing therein;
an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and
an expansion extraction module to compare said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term and to verify said extracted expansion to produce a valid expansion for said identified abbreviated term, wherein said expansion extraction module includes;
an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and
a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term.
13 Assignments
0 Petitions
Accused Products
Abstract
An acronym expansion system of the present invention receives electronic documents and extracts acronyms and their corresponding expansions. A part-of-speech tagger decomposes text into string tokens or words and tags them with their part-of-speech, while an acronym identifier determines whether a word is a potential acronym based on various conditions. An expansion identifier retrieves lists of words preceding and following a potential acronym to search for the expansion. The resulting word lists are examined sequentially to identify and retrieve an expansion for the potential acronym. An expansion extractor receives the potential acronym and a processed word list to retrieve the expansion of the potential acronym from that list. The extractor may utilize information from prior search iterations, and verifies an extracted expansion against a set of rules to remove spurious expansions.
204 Citations
86 Claims
-
1. A system for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
a processing system to receive said text and identify abbreviated terms and corresponding expansions therein, said processing system including; an identification module to examine said text to identify at least one abbreviated term residing therein; an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and an expansion extraction module to compare said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term and to verify said extracted expansion to produce a valid expansion for said identified abbreviated term, wherein said expansion extraction module includes; an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. A method of identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
-
(a) examining said text to identify at least one abbreviated term residing therein; (b) retrieving at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and (c) comparing said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term and verifying said extracted expansion to produce a valid expansion for said identified abbreviated term, wherein step (c) further includes; (c.1) examining a retrieved text portion for said identified abbreviated term and selectively producing at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and (c.2) iteratively scanning a retrieved text portion subset and comparing successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A program product apparatus including a computer readable medium with computer program logic recorded thereon for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases, said program product apparatus comprising:
-
an identification module to examine said text to identify at least one abbreviated term residing therein; an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and an expansion extraction module to compare said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term and to verify said extracted expansion to produce a valid expansion for said identified abbreviated term, wherein said expansion extraction module includes; an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 61, 62)
-
-
54. A system for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
a processing system to receive said text and identify abbreviated terms and corresponding expansions therein, said processing system including; an identification module to examine said text to identify at least one abbreviated term residing therein; an expansion retrieval module to retrieve a plurality of portions of said text for an identified abbreviated term, wherein said plurality of text portions for said identified abbreviated term includes a first portion located within said text preceding said identified abbreviated term and a second portion located within said text following said identified abbreviated term; an expansion extraction module to compare said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term, wherein said expansion extraction module includes; an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (55, 56, 57, 58)
-
59. A method of identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
-
(a) examining said text to identify at least one abbreviated term residing therein; (b) retrieving a plurality of portions of said text for an identified abbreviated term, wherein said plurality of text portions for said identified abbreviated term includes a first portion located within said text preceding said identified abbreviated term and a second portion located within said text following said identified abbreviated term; and (c) comparing said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term, wherein step (c) further includes; (c.1) examining a retrieved text portion for said identified abbreviated term and selectively producing at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and (c.2) iteratively scanning a retrieved text portion subset and comparing successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (60, 63)
-
-
64. A program product apparatus including a computer readable medium with computer program logic recorded thereon for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases, said program product apparatus comprising:
-
an identification module to examine said text to identify at least one abbreviated term residing therein; an expansion retrieval module to retrieve a plurality of portions of said text for an identified abbreviated term, wherein said plurality of text portions for said identified abbreviated term includes a first portion located within said text preceding said identified abbreviated term and a second portion located within said text following said identified abbreviated term; an expansion extraction module to compare said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term, wherein said expansion extraction module includes; an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein a current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (65, 66, 67, 68)
-
-
69. A system for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
a processing system to receive said text and identify abbreviated terms and corresponding expansions therein, said processing system including; an identification module to examine said text to identify at least one abbreviated term residing therein; an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and an expansion extraction module to compare portions of said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term, wherein said expansion extraction module includes; an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein said search module includes; a backtrack module to identify a corresponding expansion term for a current abbreviated term portion utilizing information from at least one previous abbreviated term portion comparison, wherein said current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (70, 71, 72, 73, 74)
-
75. A method of identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases comprising:
-
(a) examining said text to identify at least one abbreviated term residing therein; (b) retrieving at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and (c) comparing portions of said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom, wherein step (c) further includes; (c.1) examining a retrieved text portion for said identified abbreviated term and selectively producing at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and (c.2) iteratively scanning a retrieved text portion subset and comparing successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and in response to a failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term, identifying a corresponding expansion term for said current abbreviated term portion utilizing information from at least one previous abbreviated term portion comparison, wherein said current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term. - View Dependent Claims (76, 77, 78, 79, 80)
-
-
81. A program product apparatus including a computer readable medium with computer program logic recorded thereon for identifying abbreviated terms within text each representing a corresponding phrase of at least one term and extracting expansions of said abbreviated terms from said text in the form of said corresponding phrases, said program product apparatus comprising:
-
an identification module to examine said text to identify at least one abbreviated term residing therein; an expansion retrieval module to retrieve at least one portion of said text for an identified abbreviated term, wherein each retrieved text portion is located within said text proximate said identified abbreviated term; and an expansion extraction module to compare portions of said identified abbreviated term with at least one corresponding retrieved text portion to extract an expansion therefrom for said abbreviated term, wherein said expansion extraction module includes; an expansion initialization module to examine a retrieved text portion for said identified abbreviated term and selectively produce at least one subset of said retrieved text portion for identifying and extracting said expansion, wherein said at least one subset is produced based on a comparison of an initial portion of said identified abbreviated term with initial portions of terms within said retrieved text portion; and a search module to iteratively scan a retrieved text portion subset and compare successive portions of said identified abbreviated term to at least one term within a corresponding search window for said abbreviated term portion to identify corresponding expansion terms within that subset for said abbreviated term portions, wherein said search window includes a predetermined number of terms from said retrieved text portion subset and is movable within that subset, and wherein said search module includes; a backtrack module to identify a corresponding expansion term for a current abbreviated term portion utilizing information from at least one previous abbreviated term portion comparison, wherein said current abbreviated term portion and corresponding search window are respectively combined for a subsequent scan iteration with at least one prior abbreviated term portion and corresponding search window that identify a corresponding expansion term in response to failure of said current abbreviated term portion and corresponding search window to identify a corresponding expansion term. - View Dependent Claims (82, 83, 84, 85, 86)
-
Specification