IDENTIFYING A STALE DATA SOURCE TO IMPROVE NLP ACCURACY
First Claim
1. A system, comprising:
- a computer processor; and
a memory containing a program that, when executed on the computer processor, performs an operation comprising;
receiving a query for processing by a natural language processing (NLP) system comprising a corpus containing data ingested from a plurality of data sources;
identifying a data source expected to contain an answer to the query by correlating one or more elements of the query to data previously ingested into the corpus from the data source;
upon determining that the ingested data in the corpus does not contain the answer to the query, determining whether new material has been added to the identified data source since the last time the identified data source was ingested into the corpus;
if so, re-ingesting the identified data source whereby the new material is inserted into the corpus; and
processing the query to determine if the new material in the corpus contains the answer to the query.
1 Assignment
0 Petitions
Accused Products
Abstract
In some NLP systems, queries are compared to different data sources stored in a corpus to provide an answer to the query. However, the best data sources for answering the query may not currently be contained within the corpus or the data sources in the corpus may contain stale data that provides an inaccurate answer. When receiving a query, the NLP system may evaluate the query to identify a data source that is likely to contain an answer to the query. If the data source is not currently contained within the corpus, the NLP system may ingest the data source. If the data source is already within the corpus, however, the NLP may determine a time-sensitivity value associated with at least some portion of the query. This value may then be used to determine whether the data source should be re-ingested—e.g., the information contained in the corpus is stale.
-
Citations
13 Claims
-
1. A system, comprising:
-
a computer processor; and a memory containing a program that, when executed on the computer processor, performs an operation comprising; receiving a query for processing by a natural language processing (NLP) system comprising a corpus containing data ingested from a plurality of data sources; identifying a data source expected to contain an answer to the query by correlating one or more elements of the query to data previously ingested into the corpus from the data source; upon determining that the ingested data in the corpus does not contain the answer to the query, determining whether new material has been added to the identified data source since the last time the identified data source was ingested into the corpus; if so, re-ingesting the identified data source whereby the new material is inserted into the corpus; and processing the query to determine if the new material in the corpus contains the answer to the query. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product for maintaining a corpus in a natural language processing (NLP) system, the computer program product comprising:
a computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code comprising computer-readable program code configured to; receive a query for processing by the NLP system comprising a corpus containing data ingested from a plurality of data sources; identify a data source expected to contain an answer to the query by correlating one or more elements of the query to data previously ingested into the corpus from the data source; upon determining that the ingested data in the corpus does not contain the answer to the query, determine whether new material has been added to the identified data source since the last time the identified data source was ingested into the corpus; if so, re-ingest the identified data source whereby the new material is inserted into the corpus; and process the query to determine if the new material in the corpus contains the answer to the query. - View Dependent Claims (8, 9, 10, 11, 12, 13)
Specification