System and method for extraction of factoids from textual repositories
First Claim
1. A method of extracting factoids associated with a given factoid category of a plurality of categories from text repositories, said method comprising the steps of:
- training a classifier to recognise factoids relevant to said given factoid category;
collecting, by a processor within a computer, documents or document summaries relevant to said given factoid category from the text repositories and storing the documents or document summaries in an entity store;
extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; and
classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category, and storing the snippets in a snippet store.
1 Assignment
0 Petitions
Accused Products
Abstract
A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognize factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.
15 Citations
20 Claims
-
1. A method of extracting factoids associated with a given factoid category of a plurality of categories from text repositories, said method comprising the steps of:
-
training a classifier to recognise factoids relevant to said given factoid category; collecting, by a processor within a computer, documents or document summaries relevant to said given factoid category from the text repositories and storing the documents or document summaries in an entity store; extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; and classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category, and storing the snippets in a snippet store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus for extracting factoids associated with a given factoid category of a plurality of factoid categories from text repositories, said apparatus comprising:
-
means for training a classifier to recognise factoids relevant to said given factoid category; means for collecting documents or document summaries relevant to said given factoid category from the text repositories; means for extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; means for classifying, in a noisy environment, said sentences to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category, and a memory to store the snippets in a snippet store within the memory.
-
-
20. A computer program product comprising a machine-readable storage medium having machine-readable program code recorded thereon for controlling the operation of a data processing apparatus on which the program code executes to perform a method of extracting factoids associated with a given factoid category of a plurality of factoid categories from text repositories, said method comprising the steps of:
-
training a classifier to recognise factoids relevant to said given factoid category; collecting documents or document summaries relevant to said given factoid category from the text repositories; extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category; and storing the snippets on a snippet store.
-
Specification