×

System and method for database tomography

  • US 5,440,481 A
  • Filed: 10/28/1992
  • Issued: 08/08/1995
  • Est. Priority Date: 10/28/1992
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for full-text database searching, for identification of often repeated phrases which by virtue of their repeated occurrence, frequency of occurrence above a user-set threshold, or user input constitute phrases having a high user-interest designated as pervasive them areas (PTAs), said phrases consisting of one to n words (n*words), where n is an integer, in one or more documents defined as the database, relationships defined as connectivity among said PTAs, and phrases in close physical proximity to and which are supportive of said PTAs, comprising:

  • means for introducing document information content into a full-text database in digital form;

    means for digitally storing said database;

    means for processing said digitally stored database;

    means operatively associated with said processing means and said storing means for identifying pervasive theme areas (PTAs) defined as often-repeating word phrases consisting of one or more adjacent words such that said phrases are one word phrases, adjacent 2 word phrases, adjacent 3 word phrases . . . and adjacent n* word phrases, and for entering said phrases in said storing means;

    means for identifying phrases in said database related to said PTAs, said phrases being defined as m words, where m=1,2,3, . . . n and where each word phrase for m=2,3, . . . n is composed of adjacent words, said word phrase for m=1 being a single word phrase, for m=2 a double word phrase, for m=3 a triple word phrase . . . and for n=m an nth word phrase, by applying a user specified range of interest R expressed as a number of single words appearing both before and after said PTAs, and for storing said identified phrases in said storing means;

    means for counting for each PTA the extracted phrases within said range of said PTA stored in said storage means, sorting all phases found for each PTA by frequency of occurrence, listing each PTA and its related sorted list of extracted phrases, and storing said counts and said lists of PTA'"'"'s and their related sorted list of extracted phrases in said storing means;

    means for quantifying the strength of relationship between extracted phrases and each pervasive theme area (PTA) applying user-predefined numerical indices and figures of merit, and providing the results of said quantifying means to said storing means;

    means for obtaining the results of said quantification from said quantifying means and said storing means and presenting said results to said user for user-selection of phrases having a relationship to each PTA predicated on the relationship strengths obtained by said quantifying means;

    means for identifying PTAs which are closely related, said means employing user-input figure of merit threshold values above a user-predetermined number for selecting phrases of high-user interest, said means storing identified closely related PTAs in said storing means;

    means for identifying phrases in common among PTA and storing those identified in said storing means;

    means for identifying and grouping related PTA based upon the number of phrases in common among the PTA, said number being above a user-input predetermined threshold, each group having at least one PTA having extracted phrases in common with one or more other PTA in said group, said groupings of PTA'"'"'s stored in said storing means; and

    means for displaying relationships among related PTA and between PTA and related phrases said display means connected to said processing means.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×