×

Systems and methods for determining relevant information based on document structure

  • US 7,739,279 B2
  • Filed: 12/12/2005
  • Issued: 06/15/2010
  • Est. Priority Date: 12/12/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of determining information relevant to a location within a first document, the method comprising:

  • receiving a selection of the first document, the first document being received through an input and output interface of a computer;

    identifying at least two structural elements in the first document having a dominance relationship, the identifying being performed by one or more processors of the computer;

    receiving a selection of a first location in the first document from a user through the input and output interface;

    determining surrounding structural elements surrounding the first location, the determining comprising selecting from the at least two structural elements;

    characterizing the surrounding structural elements by the one or more processors;

    characterizing one or more non-surrounding structural elements from among the at least two structural elements not determined to be the surrounding structural elements by the one or more processors;

    characterizing surrounding phrase for frequency of occurrence of a plurality of first terms by the one or more processors;

    characterizing non-surrounding phrases in the first document for the occurrence of the plurality of the first terms by the one or more processors, the non-surrounding phrases being phrases in the first document other than the surrounding phrase;

    associating one or more second documents with the surrounding structural elements based on the characterization of the surrounding structural elements and the one or more non-surrounding structural elements by the one or more processors, wherein the one or more second documents are determined as being similar to the surrounding structural elements and being dissimilar to the one or more non-surrounding structural elements;

    creating representative vectors based on the frequency of occurrence of the first terms in the surrounding structural elements, performing latent semantic analysis (LSA) on the surrounding structural elements, the surrounding structural elements are determined based on explicit or implicit information, the implicit information is determined based on theory of analysis, the theory of analysis is at least one of;

    Linguistic Discourse Model (LDM), Universal Linguistic Discourse Model (ULDM), Discourse Structures Theory (DST), Rhetorical Structures Theory (RST), and Structure Discourse Representation Theory (SDRT), the characterizing of the surrounding structural elements is based on similarity of the representative vectors, the representative vectors are used to select additional documents that are similar in meaning to the surrounding structure elements but are dissimilar to the non-surrounding structure elements, wherein the additional documents are in association with the first location; and

    removing a second group of the one or more second documents from among first groups of the one or more second documents to obtain a third group of the one or more documents, wherein the removing is based on the characterizing the surrounding structure elements.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×