Discovering relationships in tabular data
First Claim
1. A computer usable program product comprising a non-transitory computer usable storage device including computer usable code for determining relationships in tabular data, the computer usable code comprising:
- computer usable code for receiving a set of documents, a document in the set including the tabular data;
computer usable code for applying, to the tabular data, a library of hypotheses specific to a subject-matter domain of the tabular data, each hypothesis in the library representing a hypothetical relationship between hypothetical cells of a hypothetical table,a first hypothesis in the library of hypotheses applying to hypothetical cells in a column of the hypothetical table,a second hypothesis applying to hypothetical cells in a row of the hypothetical table,a third hypothesis repeating in different columns of hypothetical cells of the hypothetical table,a fourth hypothesis repeating in different rows of hypothetical cells of the hypothetical table,wherein the hypotheses in the library are configured such that an applicability of a particular hypothesis to actual cells of the tabular data boosts an applicability of another particular hypothesis to the tabular data;
computer usable code for identifying a markup in the document, the markup relating to a cell in the tabular data;
computer usable code for identifying, using the markup, a selected cell-range in the tabular data;
computer usable code for selecting the cell to determine a dependency of the cell on the cell-range;
computer usable code for selecting, based on the markup, a hypothesis from the library of hypotheses to use in conjunction with the cell and the cell-range;
computer usable code for applying the hypothesis to the cell-range;
computer usable code for evaluating, based on a confidence value, that the hypothesis does not fit the cell-range;
computer usable code for changing, responsive to the evaluating, the cell-range to form an adjusted cell-range;
computer usable code for applying the hypothesis to the adjusted cell-range; and
computer usable code for narrating according to the hypothesis, responsive to the hypothesis fitting the adjusted cell-range, using Natural Language Processing, a functional dependency between the cell and the adjusted cell-range.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, and computer program product for discovering relationships in tabular data are provided in the illustrative embodiments. A set of documents is received, a document in the set including the tabular data. A cell in the tabular data is selected whose dependencies are to be determined. A hypothesis to use in conjunction with the cell is selected. Whether the hypothesis applies to a selected portion of the document is tested by determining whether a conclusion in the hypothesis can be computed using a function specified in the hypothesis on the selected portion. The selected portion can be a selected cell-range in the tabular data or content in a non-tabular portion of the document. The hypothesis is utilized to describe the cell relative to the selected portion.
74 Citations
9 Claims
-
1. A computer usable program product comprising a non-transitory computer usable storage device including computer usable code for determining relationships in tabular data, the computer usable code comprising:
-
computer usable code for receiving a set of documents, a document in the set including the tabular data; computer usable code for applying, to the tabular data, a library of hypotheses specific to a subject-matter domain of the tabular data, each hypothesis in the library representing a hypothetical relationship between hypothetical cells of a hypothetical table, a first hypothesis in the library of hypotheses applying to hypothetical cells in a column of the hypothetical table, a second hypothesis applying to hypothetical cells in a row of the hypothetical table, a third hypothesis repeating in different columns of hypothetical cells of the hypothetical table, a fourth hypothesis repeating in different rows of hypothetical cells of the hypothetical table, wherein the hypotheses in the library are configured such that an applicability of a particular hypothesis to actual cells of the tabular data boosts an applicability of another particular hypothesis to the tabular data; computer usable code for identifying a markup in the document, the markup relating to a cell in the tabular data; computer usable code for identifying, using the markup, a selected cell-range in the tabular data; computer usable code for selecting the cell to determine a dependency of the cell on the cell-range; computer usable code for selecting, based on the markup, a hypothesis from the library of hypotheses to use in conjunction with the cell and the cell-range; computer usable code for applying the hypothesis to the cell-range; computer usable code for evaluating, based on a confidence value, that the hypothesis does not fit the cell-range; computer usable code for changing, responsive to the evaluating, the cell-range to form an adjusted cell-range; computer usable code for applying the hypothesis to the adjusted cell-range; and computer usable code for narrating according to the hypothesis, responsive to the hypothesis fitting the adjusted cell-range, using Natural Language Processing, a functional dependency between the cell and the adjusted cell-range. - View Dependent Claims (2, 3, 4, 5, 6, 8, 9)
-
-
7. A data processing system for determining relationships in tabular data, the data processing system comprising:
-
a storage device, wherein the storage device stores computer usable program code; and a processor, wherein the processor executes the computer usable program code, and wherein the computer usable program code comprises; computer usable code for receiving a set of documents, a document in the set including the tabular data; computer usable code for applying, to the tabular data, a library of hypotheses specific to a subject-matter domain of the tabular data, each hypothesis in the library representing a hypothetical relationship between hypothetical cells of a hypothetical table, a first hypothesis in the library of hypotheses applying to hypothetical cells in a column of the hypothetical table, a second hypothesis applying to hypothetical cells in a row of the hypothetical table, a third hypothesis repeating in different columns of hypothetical cells of the hypothetical table, a fourth hypothesis repeating in different rows of hypothetical cells of the hypothetical table, wherein the hypotheses in the library are configured such that an applicability of a particular hypothesis to actual cells of the tabular data boosts an applicability of another particular hypothesis to the tabular data; computer usable code for identifying a markup in the document, the markup relating to a cell in the tabular data; computer usable code for identifying, using the markup, a selected cell-range in the tabular data; computer usable code for selecting the cell to determine a dependency of the cell on the cell-range; computer usable code for selecting, based on the markup, a hypothesis from the library of hypotheses to use in conjunction with the cell and the cell-range; computer usable code for applying the hypothesis to the cell-range; computer usable code for evaluating, based on a confidence value;
that the hypothesis does not fit the cell-range;computer usable code for changing, responsive to the evaluating, the cell-range to form an adjusted cell-range; computer usable code for applying the hypothesis to the adjusted cell-range; and computer usable code for narrating according to the hypothesis, responsive to the hypothesis fitting the adjusted cell-range, using Natural Language Processing, a functional dependency between the cell and the adjusted cell-range.
-
Specification