×

Document data processing method and apparatus for document retrieval

  • US 5,469,354 A
  • Filed: 02/28/1992
  • Issued: 11/21/1995
  • Est. Priority Date: 06/14/1989
  • Status: Expired due to Term
First Claim
Patent Images

1. A document data processing method for retrieving a document containing at least a search term designated by an operator from a document database registering therein document information in terms of character code data while referring to textual content of said document, comprising steps of:

  • upon registration of text documents in said document database,creating condensed texts by decomposing each of textual character strings of the documents to be registered into fragmental character strings on the basis of at least one of character species including katakana character, hiragana character, kanji character, alphabetic character, numeric character, and symbol character and checking mutual inclusion relations possibly existing among said fragmental character strings resulting from said decomposition, to thereby create the condensed texts each constituted by a set of the fragmental character strings in which any character string found to be included by other character string is eliminated;

    creating a component character table in which characters occurring in each of said condensed texts are registered without duplication; and

    registering in said document database said condensed texts together with said component character table in addition to the texts of the document to be registered; and

    upon retrieval of the document containing the designated search term, executing first a component character table search for thereby extracting those documents which contain all species of characters constituting the search term designated by the operator by consulting said component character table;

    executing subsequently a condensed text search by consulting the condensed texts of the documents extracted through said component character table search for extracting only the documents corresponding to the condensed texts which contain the fragmental character strings constituting the search term designated by the operator to thereby select the documents containing the designated search term; and

    executing finally a text body search for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through said component character table search and said condensed text search.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×