Please download the dossier by clicking on the dossier button x
×

Software and method for recognizing similarity of documents written in different languages based on a quantitative measure of similarity

  • US 6,519,557 B1
  • Filed: 06/06/2000
  • Issued: 02/11/2003
  • Est. Priority Date: 06/06/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of identifying different versions of the same structured document comprising steps of:

  • reading a first file including text;

    reading a second file including text;

    generating from the first file a first hierarchical structured document using formatting codes and leaf content in the first file, wherein the first hierarchical document includes the text of the first file;

    generating from the second file a second hierarchical structured document using formatting codes and leaf content in the second file, wherein the second hierarchical document includes the text of the second file;

    reading a first portion of text which occupies, a first position in the first hierarchical structured document;

    reading a second portion of text which occupies a second position which is congruent to the first position in the second hierarchical structured document; and

    obtaining a quantitative measure of similarity of the first and the second portions of text.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×