Detecting and executing data re-ingestion to improve accuracy in a NLP system
First Claim
1. A system, comprising:
- a computer processor; and
a memory containing a program that, when executed on the computer processor, performs an operation comprising;
receiving first and second queries for processing by a natural language processing (NLP) system;
identifying first and second data sources related to the first and second queries, respectively, by associating one or more elements of the first and second queries to the first and second data sources; and
upon determining that the first related data source is not in a corpus of the NLP system, ingesting the first related data source into the corpus, wherein the corpus includes a data store comprising information from one or more data sources formatted and stored into one or more objects; and
upon determining that the second related data source is in the corpus of the NLP system;
before searching the second related data source for at least one answer to the second query, determining a time-sensitivity value associated with the second query indicating a degree to which an accurate answer to the second query is dependent on a staleness of the second related data source, andupon determining that the time-sensitivity value satisfies a staleness threshold, re-ingesting the second related data source into the corpus.
1 Assignment
0 Petitions
Accused Products
Abstract
In some NLP systems, queries are compared to different data sources stored in a corpus to provide an answer to the query. However, the best data sources for answering the query may not currently be contained within the corpus or the data sources in the corpus may contain stale data that provides an inaccurate answer. When receiving a query, the NLP system may evaluate the query to identify a data source that is likely to contain an answer to the query. If the data source is not currently contained within the corpus, the NLP system may ingest the data source. If the data source is already within the corpus, however, the NLP may determine a time-sensitivity value associated with at least some portion of the query. This value may then be used to determine whether the data source should be re-ingested—e.g., the information contained in the corpus is stale.
-
Citations
13 Claims
-
1. A system, comprising:
-
a computer processor; and a memory containing a program that, when executed on the computer processor, performs an operation comprising; receiving first and second queries for processing by a natural language processing (NLP) system; identifying first and second data sources related to the first and second queries, respectively, by associating one or more elements of the first and second queries to the first and second data sources; and upon determining that the first related data source is not in a corpus of the NLP system, ingesting the first related data source into the corpus, wherein the corpus includes a data store comprising information from one or more data sources formatted and stored into one or more objects; and upon determining that the second related data source is in the corpus of the NLP system; before searching the second related data source for at least one answer to the second query, determining a time-sensitivity value associated with the second query indicating a degree to which an accurate answer to the second query is dependent on a staleness of the second related data source, and upon determining that the time-sensitivity value satisfies a staleness threshold, re-ingesting the second related data source into the corpus. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product for maintaining a corpus in a natural language processing (NLP) system, the computer program product comprising:
-
a computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code comprising computer-readable program code configured to; receive first and second queries for processing by a NLP system; identify first and second data sources related to the first and second queries by associating one or more elements of the first and second queries to the first and second data sources; and upon determining that the first related data source is not in the corpus of the NLP system, ingest the first related data source into the corpus, wherein the corpus includes a data store comprising information from one or more data sources formatted and stored into one or more objects; and upon determining that the second related data source is in the corpus of the NLP system; before searching the second related data source for at least one answer to the second query, determine a time-sensitivity value associated with the second query indicating a degree to which an accurate answer to the second query is dependent on a staleness of the second related data source, and upon determining that the time-sensitivity value satisfies a staleness threshold, re-ingest the second related data source into the corpus. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification