Automatic identification of document versions
First Claim
1. A computer-implemented method for document management, the method comprising:
- receiving an input document containing an input spreadsheet;
computing a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents;
identifying one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity; and
outputting an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet,wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity,wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for document management includes extracting from an input document a set of terms, each term including a fixed number of words. Respective numbers of the terms that occur in each of a group of stored documents are counted, and a respective association rate is computed between the input document and each of at least some of the stored documents responsively to the respective numbers of the terms that were counted in the stored documents. One or more of the stored documents are identified as versions of the input document responsively to the association rate, and an identification of the stored documents that are versions of the input document is outputted.
44 Citations
6 Claims
-
1. A computer-implemented method for document management, the method comprising:
-
receiving an input document containing an input spreadsheet; computing a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents; identifying one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity; and outputting an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet, wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity, wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates. - View Dependent Claims (2)
-
-
3. Apparatus for document management, comprising:
-
an interface, which is coupled to access documents in one or more data repositories; and a processor, which is configured to receive an input document containing an input spreadsheet, to compute a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents, to identify one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity, and to output an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet, wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity, wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates. - View Dependent Claims (4, 5)
-
-
6. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive an input document containing an input spreadsheet, to compute a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents, to identify one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity, and to output an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet,
wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity, wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates.
Specification