Data provenance system
First Claim
Patent Images
1. A method comprising:
- accessing, from an index, an electronic artifact comprising content of a particular type of media;
automatically determining, using a data processor, text corresponding to the content;
performing natural language processing on the text, using the data processor, to identify at least a subset of words in a statement within the text and determine meanings of each word in the subset of words; and
generating a context image for the electronic artifact based on the natural language processing, wherein the context image comprises a graph comprising nodes corresponding to the subset of words, the context image comprises a syntax-free representation of the statement, the context image comprises the subset of words but less than all words in the statement, and the context image defines relationships between the subset of words.
1 Assignment
0 Petitions
Accused Products
Abstract
An electronic artifact is accessed which includes content of a particular type of media. Text is determined corresponding to the content and natural language processing is performed on the text to identify at least a subset of words in a statement within the text and determine meanings of each word in the subset of words. A context image is generated for the electronic artifact based on the natural language processing, where the context image includes a graph including nodes corresponding to the subset of words and the context image defines relationships between the subset of words.
-
Citations
18 Claims
-
1. A method comprising:
-
accessing, from an index, an electronic artifact comprising content of a particular type of media; automatically determining, using a data processor, text corresponding to the content; performing natural language processing on the text, using the data processor, to identify at least a subset of words in a statement within the text and determine meanings of each word in the subset of words; and generating a context image for the electronic artifact based on the natural language processing, wherein the context image comprises a graph comprising nodes corresponding to the subset of words, the context image comprises a syntax-free representation of the statement, the context image comprises the subset of words but less than all words in the statement, and the context image defines relationships between the subset of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium having program instructions stored therein, wherein the program instructions are executable by a computer system to perform operations comprising:
-
identifying digital media of a particular type; determining text statements from content of the digital media; performing natural language processing on the text statements to; identify a first word in a particular one of the text statements as a key term in the particular text statement, wherein the key term represents a topic of the particular text statement; and identify a set of second words in the particular text statement representing attributes of the topic; generating a context image for the statement, wherein the context image comprises a graph comprising nodes corresponding to the first word and the set of second words, the context image comprises a syntax-free representation of the statement, the context image comprises the first word and set of second words but less than all words in the statement, and defining relationships between the nodes to indicate that the set of second words represent attributes of the topic represented by the first word; and determining a similarity score for the particular text statement based on a comparison of the context image with a plurality of other context images generated from other digital media.
-
-
16. A system comprising:
-
a data processing apparatus; a memory element storing data comprising an electronic artifact; a text extractor, executable by the data processing apparatus to determine a text statement from content of the electronic artifact; a natural language processor, executable by the data processing apparatus to assess the text statement to; determine meanings of a set of words included in the text statement; identify a first word in the set of words as a key term in the text statement, wherein the key term represents a topic of the text statement; and identify a set of second words in the text statement representing attributes of the topic; and a context image generator, executable by the data processing apparatus to generate a context image for the text statement, wherein the context image comprises a graph comprising nodes corresponding to the first word and the set of second words, the context image comprises a syntax-free representation of the text statement, the context image comprises the first word and set of second words but less than all words in the text statement, and defining relationships between the nodes to indicate that the set of second words represent attributes of the topic represented by the first word. - View Dependent Claims (17, 18)
-
Specification