Context-based disambiguation of acronyms and abbreviations
First Claim
1. A system for context-based disambiguation of abbreviations, comprising:
- a processor;
an analyze passage module operable to execute on the processor and further operable to determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word;
a contextual search query generation component operable to generate a contextual search query comprising the target abbreviation and said one or more keywords;
a search pseudo document index module operable to search a pseudo document index for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents by titles, associated one or more abbreviations and associated context keywords, wherein the titles are the expansions of the abbreviations contained in the pseudo documents respectively,the search pseudo document index module further operable to return one or more pseudo documents associated with the target abbreviation based on the searching of the pseudo document index, wherein one or more expansions associated with the target abbreviation are provided based on the returned one or more target pseudo documents, wherein a pseudo document of said one or more pseudo documents is generated for an expansion in an abbreviation expansion dictionary by extracting data from sources that contain language occurring with the expansion; and
a machine learning classification model generation module operable to determine the target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the machine learning classification model generating one or more features that capture lexical and syntactic properties of the passage, and recognizing said target abbreviation and said one or more keywords appearing in context with the target abbreviation in the received passage based on the captured lexical and syntactic properties.
1 Assignment
0 Petitions
Accused Products
Abstract
Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.
-
Citations
13 Claims
-
1. A system for context-based disambiguation of abbreviations, comprising:
-
a processor; an analyze passage module operable to execute on the processor and further operable to determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word; a contextual search query generation component operable to generate a contextual search query comprising the target abbreviation and said one or more keywords; a search pseudo document index module operable to search a pseudo document index for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents by titles, associated one or more abbreviations and associated context keywords, wherein the titles are the expansions of the abbreviations contained in the pseudo documents respectively, the search pseudo document index module further operable to return one or more pseudo documents associated with the target abbreviation based on the searching of the pseudo document index, wherein one or more expansions associated with the target abbreviation are provided based on the returned one or more target pseudo documents, wherein a pseudo document of said one or more pseudo documents is generated for an expansion in an abbreviation expansion dictionary by extracting data from sources that contain language occurring with the expansion; and a machine learning classification model generation module operable to determine the target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the machine learning classification model generating one or more features that capture lexical and syntactic properties of the passage, and recognizing said target abbreviation and said one or more keywords appearing in context with the target abbreviation in the received passage based on the captured lexical and syntactic properties. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer readable storage medium, excluding signal per se, storing a program of instructions executable by a machine to perform a method of context-based disambiguation of abbreviations, comprising:
-
determining a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word; generating a contextual search query comprising the target abbreviation and said one or more keywords; searching a pseudo document index for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents by titles, associated one or more abbreviations, and associated context keywords, wherein the titles are the expansions of the abbreviations contained in the pseudo documents respectively; returning one or more target pseudo documents associated with the target abbreviation based on the searching of the pseudo document index; and providing one or more expansions associated with the target abbreviation based on the returned one or more target pseudo documents, wherein the determining the target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage comprises generating one or more features that capture lexical and syntactic properties of the passage, and recognizing said target abbreviation and said one or more keywords appearing in context with the target abbreviation in the received passage based on the captured lexical and syntactic properties. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computer readable storage medium, excluding signal per se, storing a program of instructions executable by a machine to perform a method for context-based disambiguation of abbreviations, comprising:
-
generating an abbreviation expansion dictionary by identifying a set of abbreviations with associated potential expansions; generating a pseudo document for each expansion identified in the abbreviation expansion dictionary, the pseudo document comprising an abbreviation, associated expansion and one or more words that occur with said abbreviation, the generated pseudo document having a title corresponding to the associated expansion of the abbreviation that the pseudo document contains, the pseudo document generated at least by extracting data from sources that contain language commonly occurring with the expansion; generating a pseudo document index indexing said abbreviation and said associated expansion; and generating a machine learning classification model by generating one or more features that capture lexical and syntactic properties of a received passage, and building the machine learning classification model for recognizing one or more target abbreviations and one or more target keywords appearing in context with the target abbreviation in the received passage.
-
Specification