Determining veracity of data in a repository using a semantic network
First Claim
1. A computer implemented method for determining veracity of data in documents stored in a repository, the computer implemented method comprising:
- creating one or more semantic networks from the documents in the repository, wherein said one or more semantic networks define intra-document and inter-document relationships among terms contained within the documents;
responsive to receiving a search query, identifying one or more semantic networks containing nodes matching one or more terms in the search query;
determining an edge density for each node matching a term in the search query;
calculating a relevancy score for each of the one or more semantic networks based on the edge densities of the nodes matching a term in the search query;
determining a relevancy, to the search query, of a first document associated with the one or more semantic networks based on the relevancy score;
determining if data from the first document in one of the semantic networks conflicts with data from a second document in one of the semantic networks;
responsive to a determination that data from the first document conflicts with data from the second document, determining whether the conflicting data from the first document is obsolete in comparison to data from the second document;
wherein determining if the data from the first document is obsolete further comprises;
comparing search frequency information for the data from the first document against search frequency information for the data from the second document; and
responsive to a determination that the search frequency information for the data from the second document is higher than the search frequency information for the data from the first document, determining that the data from the first document is obsolete in comparison with the data from the second document;
responsive to a determination that the conflicting data from the first document is obsolete in comparison to data from the second document, annotating a portion of the first document corresponding to the obsolete data with the data from the second document to form an annotated first document; and
providing a search result list to the user comprising the second document and the annotated first document.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism for determining the veracity of data in a repository. Responsive to receiving a search query from a user, a semantic network is created from the documents in the repository. A determination is made as to whether data from a first document in the semantic network conflicts with data from a second document in the semantic network. If a conflict exists, a determination is made as to whether the data from the first document is obsolete in comparison to data from the second document. If the data from the first document is obsolete in comparison to data from the second document, a portion of the first document corresponding to the obsolete data is automatically annotating with the data from the second document to form an annotated first document. A search result list is then provided to the user comprising the second document and the annotated first document.
-
Citations
15 Claims
-
1. A computer implemented method for determining veracity of data in documents stored in a repository, the computer implemented method comprising:
-
creating one or more semantic networks from the documents in the repository, wherein said one or more semantic networks define intra-document and inter-document relationships among terms contained within the documents; responsive to receiving a search query, identifying one or more semantic networks containing nodes matching one or more terms in the search query; determining an edge density for each node matching a term in the search query; calculating a relevancy score for each of the one or more semantic networks based on the edge densities of the nodes matching a term in the search query; determining a relevancy, to the search query, of a first document associated with the one or more semantic networks based on the relevancy score; determining if data from the first document in one of the semantic networks conflicts with data from a second document in one of the semantic networks; responsive to a determination that data from the first document conflicts with data from the second document, determining whether the conflicting data from the first document is obsolete in comparison to data from the second document; wherein determining if the data from the first document is obsolete further comprises; comparing search frequency information for the data from the first document against search frequency information for the data from the second document; and responsive to a determination that the search frequency information for the data from the second document is higher than the search frequency information for the data from the first document, determining that the data from the first document is obsolete in comparison with the data from the second document; responsive to a determination that the conflicting data from the first document is obsolete in comparison to data from the second document, annotating a portion of the first document corresponding to the obsolete data with the data from the second document to form an annotated first document; and providing a search result list to the user comprising the second document and the annotated first document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer readable storage medium having stored thereon computer-executable instructions for determining veracity of data in documents stored in a repository, said computer-executable instructions performing a method comprising:
-
creating one or more semantic networks from the documents in the repository, wherein said one or more semantic networks define intra-document and inter-document relationships among terms contained within the documents; responsive to receiving a search query, identifying one or more semantic networks containing nodes matching one or more terms in the search query; determining an edge density for each node matching a term in the search query; calculating a relevancy score for each of the one or more semantic networks based on the edge densities of the nodes matching a term in the search query; determining a relevancy, to the search query, of a first document associated with the one or more semantic networks based on the relevancy score; determining if data from the first document in one of the semantic networks conflicts with data from a second document in one of the semantic networks; determining, in response to a determination that data from the first document conflicts with data from the second document, whether the conflicting data from the first document is obsolete in comparison to data from the second document; wherein said determining if the data from the first document is obsolete further comprises; comparing search frequency information for the data from the first document against search frequency information for the data from the second document; and determining, in response to a determination that the search frequency information for the data from the second document is higher than the search frequency information for the data from the first document, that the data from the first document is obsolete in comparison with the data from the second first document; annotating, in response to a determination that the conflicting data from the first document is obsolete in comparison to data from the second document, a portion of the first document corresponding to the obsolete data with the data from the second document to form an annotated first document; and providing a search result list to the user comprising the second document and the annotated first document. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification