DATA PROVENANCE SYSTEM
First Claim
1. A method comprising:
- receiving data from a computing system describing particular content of a digital work;
processing the data to identify a particular concept represented in the particular content;
initiating a search of a corpus to identify a set of other digital works in the corpus comprising content related to the particular concept;
determining similarity scores representing a degree of similarity between the particular content of the digital work and the respective content of each of the set of digital works related to the particular concept;
determining that a particular one of the other digital works is a source of the particular content of the digital work based on the similarity scores; and
sending result data to the computing system to indicate that the particular other digital work is a source of the particular concept.
1 Assignment
0 Petitions
Accused Products
Abstract
Data is received from a computing system describing particular content of a digital work. The data is processed to identify a particular concept represented in the particular content. A search of a corpus is initiated to identify a set of other digital works in the corpus including content related to the particular concept. Similarity scores are determined representing a degree of similarity between the particular content of the digital work and the respective content of each of the set of digital works related to the particular concept. A data provenance system determines that a particular one of the other digital works is a source of the particular content of the digital work based on the similarity scores. Result data is generated and sent to the computing system to indicate that the particular other digital work is a source of the particular concept.
13 Citations
20 Claims
-
1. A method comprising:
-
receiving data from a computing system describing particular content of a digital work; processing the data to identify a particular concept represented in the particular content; initiating a search of a corpus to identify a set of other digital works in the corpus comprising content related to the particular concept; determining similarity scores representing a degree of similarity between the particular content of the digital work and the respective content of each of the set of digital works related to the particular concept; determining that a particular one of the other digital works is a source of the particular content of the digital work based on the similarity scores; and sending result data to the computing system to indicate that the particular other digital work is a source of the particular concept. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product comprising a computer readable storage medium comprising computer readable program code embodied therewith, the computer readable program code comprising:
-
computer readable program code configured to generate a first representation of content of a first digital work comprising media of a first type; computer readable program code configured to determine similarity scores for the first digital work to indicate a degree of similarity between the first digital work and a plurality of other digital works based on comparing the first representation with a plurality of representations of the plurality of other digital works, wherein the plurality of other digital works comprises a second digital work, and the plurality of other digital works comprise media of a plurality of different types; computer readable program code configured to determine, from the similarity scores, that the first digital work incorporates content originally sourced from the second digital work; and computer readable program code configured to send result data to a system associated with the first digital work, wherein the result data indicates an attribution to the second digital work to be associated with the first digital work based on determining that the first digital work incorporates content originally sourced from the second digital work.
-
-
18. A system comprising:
-
a processor; a memory element; a data provenance service, executable by the processor to; receive data describing at least a particular portion of a first digital work; process the data to identify a particular concept represented in the particular content; identify a set of other digital works in a corpus comprising content related to the particular concept, wherein the first digital work comprises media of a first type, and at least a portion of the digital works in the set of other works comprise media of a different, second type; determine similarity scores representing a degree of similarity between the particular content of the first digital work and the respective content of each of the set of digital works related to the particular concept; determine from the similarity scores that a second digital work, in the set of other digital works, is a source of the particular content of the first digital work; and send result data to a computing system associated with the first digital work to indicate that the second digital work is a source of the particular content. - View Dependent Claims (19, 20)
-
Specification