Techniques for facilitating information acquisition and storage
First Claim
1. A computer-implemented method of storing information in an information store, the computer-implemented method comprising:
- identifying a plurality of articles from which information is to be extracted;
identifying a plurality of information extractors for extracting information from the plurality of articles;
providing a database for storing information related to the plurality of articles and the plurality of information extractors;
assigning the plurality of articles to the plurality of information extractors for information extraction, wherein a first article from the plurality of articles is assigned to a first information extractor from the plurality of information extractors;
receiving information extracted by the first information extractor from the first article;
storing the information extracted by the first information extractor from the first article in the information store;
enabling a content reviewer to review the extracted information received from the first information extractor for the first article;
receiving information from the content reviewer identifying errors associated with the extracted information received from the first information extractor for the first article;
determining, from the information received from the content reviewer, an error count indicating number of errors in the extracted information received from the first information extractor for the first article;
storing the error count in the database;
determining if the error count is above a threshold error count level;
if the error count is above the threshold error level, reassigning the first article to the first information extractor for information extraction; and
if the error count is equal to or below the threshold error level, enabling the content reviewer to change the extracted information received from the first information extractor for the first article to correct the errors.
1 Assignment
0 Petitions
Accused Products
Abstract
The method and system for extracting information from a plurality of articles and for storing the extracted information in a knowledge-based information store. The method and system identify a plurality of articles from which information is to be extracted. The method and system identify and assign a plurality of information extractors for extracting information from the plurality of articles. The method and system receive information extracted by an information extractor from an article assigned to the information extractor. The method and system enable a content reviewer to review the extracted information received from the information extractor and the content reviewer identifies errors associated with the extracted information. If the error count is above the threshold level, the article may be reassigned for information extraction. If the error count is equal to or below the threshold level, the content reviewer may change the extracted information to correct the errors.
-
Citations
68 Claims
-
1. A computer-implemented method of storing information in an information store, the computer-implemented method comprising:
-
identifying a plurality of articles from which information is to be extracted;
identifying a plurality of information extractors for extracting information from the plurality of articles;
providing a database for storing information related to the plurality of articles and the plurality of information extractors;
assigning the plurality of articles to the plurality of information extractors for information extraction, wherein a first article from the plurality of articles is assigned to a first information extractor from the plurality of information extractors;
receiving information extracted by the first information extractor from the first article;
storing the information extracted by the first information extractor from the first article in the information store;
enabling a content reviewer to review the extracted information received from the first information extractor for the first article;
receiving information from the content reviewer identifying errors associated with the extracted information received from the first information extractor for the first article;
determining, from the information received from the content reviewer, an error count indicating number of errors in the extracted information received from the first information extractor for the first article;
storing the error count in the database;
determining if the error count is above a threshold error count level;
if the error count is above the threshold error level, reassigning the first article to the first information extractor for information extraction; and
if the error count is equal to or below the threshold error level, enabling the content reviewer to change the extracted information received from the first information extractor for the first article to correct the errors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 28, 29, 30)
receiving criteria for selecting articles from which information is to be extracted; and
identifying the plurality of articles which are relevant to the criteria for selecting articles.
-
-
3. The method of claim 2 wherein assigning the plurality of articles to the plurality of information extractors comprises:
-
computing a priority score for each article in the plurality of articles based on a degree of relevancy of the article to the criteria for selecting the articles, such that an article with a high degree of relevancy is computed a higher priority score than an article with a low degree of relevancy to the criteria for selecting the articles; and
assigning articles from the plurality of articles to the plurality of information extractors based on the priority scores associated with the articles, such that articles with higher priority scores are assigned before articles with lower priority scores.
-
-
4. The method of claim 2 wherein receiving the criteria for selecting articles comprises receiving names of information sources and terms specific to a domain.
-
5. The method of claim 1 wherein identifying the plurality of articles from which information is to be extracted comprises:
storing access information for the plurality of articles in the database, wherein the access information enables the plurality of information extractors to access the plurality of articles.
-
6. The method of claim 5 wherein storing the access information for the plurality of articles comprises:
-
determining a first set of articles from the plurality of articles which are available in electronic format;
storing electronic copies of the articles in the first set in the database;
determining a second set of articles from the plurality of articles which are not available in electronic format;
scanning paper copies of the articles in the second set to generate electronic versions of the articles in the second set; and
storing the electronic versions of the articles in the second set in the database.
-
-
7. The method of claim 5 wherein storing the access information for the plurality of articles comprises:
-
determining uniform resource locator (URL) information for at least one article from the plurality of articles; and
storing the URL information for the at least one article in the database.
-
-
8. The method of claim 1 wherein identifying the plurality of information extractors comprises:
-
receiving information related to a plurality of candidates;
determining a first set of candidates from the plurality of candidates who have completed online certification;
determining a second set of candidates from the first set of candidates who have passed testing procedures; and
designating the second set of candidates as the plurality of information extractors.
-
-
9. The method of claim 1 wherein receiving the information extracted by the first information extractor from the first article comprises:
-
providing a user interface; and
receiving the information extracted from the first article via the user interface.
-
-
10. The method of claim 1 further comprising:
-
determining if the errors associated with the information extracted from the first article by the first information extractor have been corrected; and
if the errors have been corrected;
calculating a quality score for the first article based upon the error count; and
storing the quality score in the database.
-
-
11. The method of claim 10 further comprising:
-
if the errors have been corrected;
determining a compensation amount to be paid to the first information extractor for extracting information from the first article; and
storing the compensation amount in the database.
-
-
12. The method of claim 11 wherein determining the compensation amount to be paid to the first information extractor comprises:
calculating the compensation amount based upon the error count and the quality score for the first article.
-
13. The method of claim 1 wherein the information store is configured to store the extracted information according to an information model, the method further comprising:
before storing the extracted information for the first article in the information store, enabling model reviewers to make changes to the information model based on the information extracted by the first information extractor from the first article.
-
14. The method of claim 13 wherein the information store is a knowledge base and the information model is an ontology for the knowledge base.
-
15. The method of claim 1 wherein the information store is a knowledge base configured to store the extracted information according to an ontology, the method further comprising:
-
before storing the extracted information into the knowledge base;
receiving concept information identifying a concept associated with the extracted information received from the first information extractor for the first article;
enabling a first reviewer to review the concept information; and
receiving information from the first reviewer identifying changes to be made to the ontology.
-
-
16. The method of claim 15 further comprising:
-
enabling a second reviewer to review the information received from the first reviewer; and
making changes to the ontology based on the information received from the first reviewer after the second reviewer approves of the information received from the first reviewer.
-
-
28. The system of claim 1 wherein the plurality of code modules stored by the memory further comprises:
-
a code module for determining if the errors associated with the information extracted from the first article by the first information extractor have been corrected; and
if the errors have been corrected;
a code module for calculating a quality score for the first article based upon the error count; and
a code module for storing the quality score in the database.
-
-
29. The system of claim 28 wherein the plurality of code modules stored by the memory further comprises:
-
if the errors have been corrected;
a code module for determining a compensation amount to be paid to the first information extractor for extracting information from the first article; and
a code module for storing the compensation amount in the database.
-
-
30. The system of claim 29 wherein the code module for determining the compensation amount to be paid to the first information extractor comprises:
a code module for calculating the compensation amount based upon the error count and the quality score for the first article.
-
17. A computer-implemented method of storing information in an information store, the information store configured to store the extracted information according to an information model, the computer-implemented method comprising:
-
identifying a plurality of articles from which the information is to be extracted;
identifying information extractors for extracting the information from the plurality of articles;
storing information related to the plurality of articles and the information extractors in a database;
assigning the plurality of articles to the information extractors; and
for each article from the plurality of articles;
receiving information extracted from the article by the information extractor to whom the article is assigned;
storing the extracted information in the database;
storing the information extracted from the article in the information store, wherein the information store is a knowledge base configured to store the extracted information according to an ontology. - View Dependent Claims (18, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
enabling content reviewers to identify and correct errors associated with the extracted information;
enabling model reviewers to identify and make changes to the information model of the information store based on the information extracted from the article.
-
-
55. The method of claim 17 wherein identifying the plurality of articles comprises:
-
receiving criteria for selecting articles from which information is to be extracted; and
identifying the plurality of articles which are relevant to the criteria for selecting articles.
-
-
56. The method of claim 55 wherein assigning the plurality of articles to the plurality of information extractors comprises:
-
computing a priority score for each article in the plurality of articles based on a degree of relevancy of the article to the criteria for selecting the articles, such that an article with a high degree of relevancy is computed a higher priority score than an article with a low degree of relevancy to the criteria for selecting the articles; and
assigning articles from the plurality of articles to the plurality of information extractors based on the priority scores associated with the articles, such that articles with higher priority scores are assigned before articles with lower priority scores.
-
-
57. The method of claim 55 wherein receiving the criteria for selecting articles comprises receiving names of information sources and terms specific to a domain.
-
58. The method of claim 17 wherein identifying the plurality of articles from which information is to be extracted comprises:
storing access information for the plurality of articles in the database, wherein the access information enables the plurality of information extractors to access the plurality of articles.
-
59. The method of claim 58 wherein storing the access information for the plurality of articles comprises:
-
determining a first set of articles from the plurality of articles which are available in electronic format;
storing electronic copies of the articles in the first set in the database;
determining a second set of articles from the plurality of articles which are not available in electronic format;
scanning paper copies of the articles in the second set to generate electronic versions of the articles in the second set; and
storing the electronic versions of the articles in the second set in the database.
-
-
60. The method of claim 58 wherein storing the access information for the plurality of articles comprises:
-
determining uniform resource locator (URL) information for at least one article from the plurality of articles; and
storing the URL information for the at least one article in the database.
-
-
61. The method of claim 17 wherein identifying the plurality of information extractors comprises:
-
receiving information related to a plurality of candidates;
determining a first set of candidates from the plurality of candidates who have completed online certification;
determining a second set of candidates from the first set of candidates who have passed testing procedures; and
designating the second set of candidates as the plurality of information extractors.
-
-
62. The method of claim 17 wherein receiving the information extracted by the first information extractor from the first article comprises:
-
providing a user interface; and
receiving the information extracted from the first article via the user interface.
-
-
63. The method of claim 17 further comprising:
-
determining if the errors associated with the information extracted from the first article by the first information extractor have been corrected; and
if the errors have been corrected;
calculating a quality score for the first article based upon the error count; and
storing the quality score in the database.
-
-
64. The method of claim 63 further comprising:
-
if the errors have been corrected;
determining a compensation amount to be paid to the first information extractor for extracting information from the first article; and
storing the compensation amount in the database.
-
-
65. The method of claim 64 wherein determining the compensation amount to be paid to the first information extractor comprises:
calculating the compensation amount based upon the error count and the quality score for the first article.
-
66. The method of claim 17 wherein the information store is configured to store the extracted information according to an information model, the method further comprising:
before storing the extracted information for the first article in the information store, enabling model reviewers to make changes to the information model based on the information extracted by the first information extractor from the first article.
-
67. The method of claim 17 wherein the information store is a knowledge base configured to store the extracted information according to an ontology, the method further comprising:
-
before storing the extracted information into the knowledge base;
receiving concept information identifying a concept associated with the extracted information received from the first information extractor for the first article;
enabling a first reviewer to review the concept information; and
receiving information from the first reviewer identifying changes to be made to the ontology.
-
-
68. The method of claim 67 further comprising:
-
enabling a second reviewer to review the information received from the first reviewer; and
making changes to the ontology based on the information received from the first reviewer after the second reviewer approves of the information received from the first reviewer.
-
-
19. A computer system for storing information comprising:
-
a processor;
a memory coupled to the processor, the memory configured to store a plurality of code modules for execution by the processor, the plurality of code modules comprising;
a code module for identifying a plurality of articles from which information is to be extracted;
a code module for identifying a plurality of information extractors for extracting information from the plurality of articles;
a code module for storing information related to the plurality of articles and the plurality of information extractors in a database;
a code module for assigning the plurality of articles to the plurality of information extractors for information extraction, wherein a first article from the plurality of articles is assigned to a first information extractor from the plurality of information extractors;
a code module for receiving information extracted by the first information extractor from the first article;
a code module for storing the information extracted by the first information extractor from the first article in an information store;
a code module for enabling a content reviewer to review the extracted information received from the first information extractor for the first article;
a code module for receiving information from the content reviewer identifying errors associated with the extracted information received from the first information extractor for the first article;
a code module for determining, from the information received from the content reviewer, an error count indicating number of errors in the extracted information received from the first information extractor for the first article;
a code module for storing the error count in the database;
a code module for determining if the error count is above a threshold error count level;
if the error count is above the threshold error level, a code module for reassigning the first article to the first information extractor for information extraction; and
if the error count is equal to or below the threshold error level, a code module for enabling the content reviewer to change the extracted information received from the first information extractor for the first article to correct the errors. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 31, 32, 33, 34)
a code module for receiving criteria for selecting articles from which information is to be extracted; and
a code module for identifying the plurality of articles which are relevant to the criteria for selecting articles.
-
-
21. The system of claim 20 wherein the code module for assigning the plurality of articles to the plurality of information extractors comprises:
-
a code module for computing a priority score for each article in the plurality of articles based on a degree of relevancy of the article to the criteria for selecting the articles, such that an article with a high degree of relevancy is computed a higher priority score than an article with a low degree of relevancy to the criteria for selecting the articles; and
a code module for assigning articles from the plurality of articles to the plurality of information extractors based on the priority scores associated with the articles, such that articles with higher priority scores are assigned before articles with lower priority scores.
-
-
22. The system of claim 20 wherein the code module for receiving the criteria for selecting articles comprises a code module for receiving names of information sources and terms specific to a domain.
-
23. The system of claim 19 wherein the code module for identifying the plurality of articles from which information is to be extracted comprises:
a code module for storing access information for the plurality of articles in the database, wherein the access information enables the plurality of information extractors to access the plurality of articles.
-
24. The system of claim 23 wherein the code module for storing the access information for the plurality of articles comprises:
-
a code module for determining a first set of articles from the plurality of articles which are available in electronic format;
a code module for storing electronic copies of the articles in the first set in the database;
a code module for determining a second set of articles from the plurality of articles which are not available in electronic format;
a code module for scanning paper copies of the articles in the second set to generate electronic versions of the articles in the second set; and
a code module for storing the electronic versions of the articles in the second set in the database.
-
-
25. The system of claim 23 wherein the code module for storing the access information for the plurality of articles comprises:
-
a code module for determining uniform resource locator (URL) information for at least one article from the plurality of articles; and
a code module for storing the URL information for the at least one article in the database.
-
-
26. The system of claim 19 wherein the code module for identifying the plurality of information extractors comprises:
-
a code module for receiving information related to a plurality of candidates;
a code module for determining a first set of candidates from the plurality of candidates who have completed online certification;
a code module for determining a second set of candidates from the first set of candidates who have passed testing procedures; and
a code module for designating the second set of candidates as the plurality of information extractors.
-
-
27. The system of claim 19 wherein the code module for receiving the information extracted by the first information extractor from the first article comprises:
-
a code module for providing a user interface; and
a code module for receiving the information extracted from the first article via the user interface.
-
-
31. The system of claim 19 wherein the information store is configured to store the extracted information according to an information model, and wherein the plurality of code modules stored by the memory further comprises:
a code module for enabling model reviewers to make changes to the information model based on the information extracted by the first information extractor from the first article before storing the extracted information for the first article in the information store.
-
32. The system of claim 31 wherein the information store is a knowledge base and the information model is an ontology for the knowledge base.
-
33. The system of claim 19 wherein the information store is a knowledge base configured to store the extracted information according to an ontology, and wherein the plurality of code modules stored by the memory further comprises:
-
a code module for receiving concept information identifying a concept associated with the extracted information received from the first information extractor for the first article before storing the extracted information into the knowledge base;
a code module for enabling a first reviewer to review the concept information; and
a code module for receiving information from the first reviewer identifying changes to be made to the ontology.
-
-
34. The system of claim 33 wherein the plurality of code modules stored by the memory further comprises:
-
a code module for enabling a second reviewer to review the information received from the first reviewer; and
a code module for making changes to the ontology based on the information received from the first reviewer after the second reviewer approves of the information received from the first reviewer.
-
-
35. A networked system for storing information comprising:
-
a communication network;
a computer system coupled to the communication network;
an information store coupled to the computer system, the information store configured to store the information according to an information model; and
a database coupled to the communication network;
wherein the computer system is configured to;
identify a plurality of articles from which the information is to be extracted;
identify information extractors for extracting the information from the plurality of articles;
store information related to the plurality of articles and the information extractors in a database;
assign the plurality of articles to the information extractors; and
for each article from the plurality of articles;
receive information extracted from the article by the information extractor to whom the article is assigned;
store the extracted information in the database;
store the information extracted from the article in the information store, wherein the information store is a knowledge base configured to store the extracted information according to an ontology. - View Dependent Claims (36)
enable content reviewers to identify and correct errors associated with the extracted information;
enable model reviewers to identify and make changes to the information model of the information store based on the information extracted from the article.
-
-
37. A computer program product stored on a computer-readable medium for storing information in an information store, the computer program product comprising:
-
code for identifying a plurality of articles from which information is to be extracted;
code for identifying a plurality of information extractors for extracting information from the plurality of articles;
code for providing a database for storing information related to the plurality of articles and the plurality of information extractors;
code for assigning the plurality of articles to the plurality of information extractors for information extraction, wherein a first article from the plurality of articles is assigned to a first information extractor from the plurality of information extractors;
code for receiving information extracted by the first information extractor from the first article; and
code for storing the information extracted by the first information extractor from the first article in the information store;
code for enabling a content reviewer to review the extracted information received from the first information extractor for the first article;
code for receiving information from the content reviewer identifying errors associated with the extracted information received from the first information extractor for the first article;
code for determining, from the information received from the content reviewer, an error count indicating number of errors in the extracted information received from the first information extractor for the first article;
code for storing the error count in the database;
code for determining if the error count is above a threshold error count level;
if the error count is above the threshold error level, code for reassigning the first article to the first information extractor for information extraction; and
if the error count is equal to or below the threshold error level, code for enabling the content reviewer to change the extracted information received from the first information extractor for the first article to correct the errors. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
code for receiving criteria for selecting articles from which information is to be extracted; and
code for identifying the plurality of articles which are relevant to the criteria for selecting articles.
-
-
39. The computer program product of claim 38 wherein the code for assigning the plurality of articles to the plurality of information extractors comprises:
-
code for computing a priority score for each article in the plurality of articles based on a degree of relevancy of the article to the criteria for selecting the articles, such that an article with a high degree of relevancy is computed a higher priority score than an article with a low degree of relevancy to the criteria for selecting the articles; and
code for assigning articles from the plurality of articles to the plurality of information extractors based on the priority scores associated with the articles, such that articles with higher priority scores are assigned before articles with lower priority scores.
-
-
40. The computer program product of claim 38 wherein code for receiving the criteria for selecting articles comprises code for receiving names of information sources and terms specific to a domain.
-
41. The computer program product of claim 37 wherein the code for identifying the plurality of articles from which information is to be extracted comprises:
code for storing access information for the plurality of articles in the database, wherein the access information enables the plurality of information extractors to access the plurality of articles.
-
42. The computer program product of claim 41 wherein the code for storing the access information for the plurality of articles comprises:
-
code for determining a first set of articles from the plurality of articles which are available in electronic format;
code for storing electronic copies of the articles in the first set in the database;
code for determining a second set of articles from the plurality of articles which are not available in electronic format;
code for scanning paper copies of the articles in the second set to generate electronic versions of the articles in the second set; and
code for storing the electronic versions of the articles in the second set in the database.
-
-
43. The computer program product of claim 41 wherein the code for storing the access information for the plurality of articles comprises:
-
code for determining uniform resource locator (URL) information for at least one article from the plurality of articles; and
code for storing the URL information for the at least one article in the database.
-
-
44. The computer program product of claim 37 wherein the code for identifying the plurality of information extractors comprises:
-
code for receiving information related to a plurality of candidates;
code for determining a first set of candidates from the plurality of candidates who have completed online certification;
code for determining a second set of candidates from the first set of candidates who have passed testing procedures; and
code for designating the second set of candidates as the plurality of information extractors.
-
-
45. The computer program product of claim 37 wherein the code for receiving the information extracted by the first information extractor from the first article comprises:
-
code for providing a user interface; and
code for receiving the information extracted from the first article via the user interface.
-
-
46. The computer program product of claim 37 further comprising:
-
code for determining if the errors associated with the information extracted from the first article by the first information extractor have been corrected; and
if the errors have been corrected;
code for calculating a quality score for the first article based upon the error count; and
code for storing the quality score in the database.
-
-
47. The computer program product of claim 46 further comprising:
-
if the errors have been corrected;
code for determining a compensation amount to be paid to the first information extractor for extracting information from the first article; and
code for storing the compensation amount in the database.
-
-
48. The computer program product of claim 47 wherein the code for determining the compensation amount to be paid to the first information extractor comprises:
code for calculating the compensation amount based upon the error count and the quality score for the first article.
-
49. The computer program product of claim 37 wherein the information store is configured to store the extracted information according to an information model, the computer program product further comprising:
code for enabling model reviewers to make changes to the information model based on the information extracted by the first information extractor from the first article before storing the extracted information for the first article in the information store.
-
50. The computer program product of claim 49 wherein the information store is a knowledge base and the information model is an ontology for the knowledge base.
-
51. The computer program product of claim 37 wherein the information store is a knowledge base configured to store the extracted information according to an ontology, the computer program product further comprising:
-
before storing the extracted information into the knowledge base;
code for receiving concept information identifying a concept associated with the extracted information received from the first information extractor for the first article;
code for enabling a first reviewer to review the concept information; and
code for receiving information from the first reviewer identifying changes to be made to the ontology.
-
-
52. The computer program product of claim 51 further comprising:
-
code for enabling a second reviewer to review the information received from the first reviewer; and
code for making changes to the ontology based on the information received from the first reviewer after the second reviewer approves of the information received from the first reviewer.
-
-
53. A computer program product stored on a computer-readable medium for storing information in an information store, the information store configured to store the extracted information according to an information model, the computer program product comprising:
-
code for identifying a plurality of articles from which the information is to be extracted;
code for identifying information extractors for extracting the information from the plurality of articles;
code for storing information related to the plurality of articles and the information extractors in a database;
code for assigning the plurality of articles to the information extractors; and
for each article from the plurality of articles;
code for receiving information extracted from the article by the information extractor to whom the article is assigned;
code for storing the extracted information in the database;
code for storing the information extracted from the article in the information store, wherein the information store is a knowledge base configured to store the extracted information according to an ontology. - View Dependent Claims (54)
code for enabling content reviewers to identify and correct errors associated with the extracted information;
code for enabling model reviewers to identify and make changes to the information model of the information store based on the information extracted from the article.
-
Specification