System and method for extraction of factoids from textual repositories
First Claim
1. A method of extracting factoids associated with a given factoid category from text repositories, said method comprising the steps of:
- training a classifier to recognise factoids relevant to said given factoid category;
collecting documents or document summaries relevant to said given factoid category from the text repositories;
extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; and
classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category.
1 Assignment
0 Petitions
Accused Products
Abstract
A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognise factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.
-
Citations
20 Claims
-
1. A method of extracting factoids associated with a given factoid category from text repositories, said method comprising the steps of:
-
training a classifier to recognise factoids relevant to said given factoid category;
collecting documents or document summaries relevant to said given factoid category from the text repositories;
extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; and
classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus for extracting factoids associated with a given factoid category from text repositories, said method comprising the steps of:
-
means for training a classifier to recognise factoids relevant to said given factoid category;
means for collecting documents or document summaries relevant to said given factoid category from the text repositories;
means for extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; and
means for classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category.
-
-
20. A computer program product comprising machine-readable program code recorded on a machine-readable recording medium, for controlling the operation of a data processing apparatus on which the program code executes to perform a method of extracting factoids associated with a given factoid category from text repositories, said method comprising the steps of:
-
training a classifier to recognise factoids relevant to said given factoid category;
collecting documents or document summaries relevant to said given factoid category from the text repositories;
extracting sentences having a predetermined association to said given factoid category from said documents or said document summaries; and
classifying, in a noisy environment, said sentences using said classifier to extract snippets containing phrases relevant to said given factoid category, said extracted snippets being said factoid associated with said given factoid category.
-
Specification