Automatic identification of document versions

US 8,315,997 B1
Filed: 08/28/2008
Issued: 11/20/2012
Est. Priority Date: 08/28/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for document management, the method comprising:

receiving an input document containing an input spreadsheet;

computing a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents;

identifying one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity; and

outputting an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet,wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity,wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for document management includes extracting from an input document a set of terms, each term including a fixed number of words. Respective numbers of the terms that occur in each of a group of stored documents are counted, and a respective association rate is computed between the input document and each of at least some of the stored documents responsively to the respective numbers of the terms that were counted in the stored documents. One or more of the stored documents are identified as versions of the input document responsively to the association rate, and an identification of the stored documents that are versions of the input document is outputted.

44 Citations

View as Search Results

6 Claims

1. A computer-implemented method for document management, the method comprising:
- receiving an input document containing an input spreadsheet;
  
  computing a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents;
  
  identifying one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity; and
  
  outputting an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet,wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity,wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates.
- View Dependent Claims (2)
- - 2. The method according to claim 1, wherein the input and stored spreadsheets comprise rows and columns, and wherein computing the respective measure of the similarity comprises extracting from the input spreadsheet a first set of row terms from the rows of the input spreadsheet, and extracting from the input spreadsheet a second set of column terms from the columns of the input spreadsheet, and counting respective numbers of the row terms and the column terms that occur in the rows and columns of the stored spreadsheets.

3. Apparatus for document management, comprising:
- an interface, which is coupled to access documents in one or more data repositories; and
  
  a processor, which is configured to receive an input document containing an input spreadsheet, to compute a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents, to identify one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity, and to output an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet,wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity,wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates.
- View Dependent Claims (4, 5)
- - 4. The apparatus according to claim 3, wherein the processor is configured to extract respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity.
  - 5. The apparatus according to claim 3, wherein the input and stored spreadsheets comprise rows and columns, and wherein the processor is configured to extract from the input spreadsheet a first set of row terms from the rows of the input spreadsheet, and to extract from the input spreadsheet a second set of column terms from the columns of the input spreadsheet, and to count respective numbers of the row terms and the column terms that occur in the rows and columns of the stored spreadsheets.

6. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive an input document containing an input spreadsheet, to compute a respective measure of similarity between the input spreadsheet and each of a plurality of stored spreadsheets contained in a group of stored documents, to identify one or more of the stored spreadsheets as versions of the input spreadsheet responsively to the measure of the similarity, and to output an identification of the stored documents that are versions of the input document responsively to having identified the one or more of the stored spreadsheets as versions of the input spreadsheet,wherein computing the respective measure of the similarity comprises extracting respective formulas from the cells of the input and stored spreadsheets and computing respective data values of the cells of the input and stored spreadsheets, and comparing both the formulas and the data values in order to compute the respective measure of the similarity,wherein comparing both the formulas and the data values comprises computing a first association rate with respect to the formulas and computing a second association rate with respect to the data values, and finding the measure of the similarity as a weighted sum of the first and second association rates.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nogacom Ltd.
Original Assignee
Nogacom Ltd.
Inventors
Peled, Ariel, Reznikov, Elad, Regev, Yizhar, Brumer, Shai
Primary Examiner(s)
Vital, Pierre
Assistant Examiner(s)
Obisesan, Augustine K

Application Number

US12/200,089
Time in Patent Office

1,545 Days
Field of Search

None
US Class Current

707/695
CPC Class Codes

G06F 40/295 Named entity recognition

Automatic identification of document versions

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

44 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic identification of document versions

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

44 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links