Document retrieval device
First Claim
1. A document retrieval device that classifies a group of documents, each document having structural elements arranged in a logical hierarchical format, each structural element including at least one of a heading and a content, the documents stored in a document storage device, the retrieval device comprising:
- logical structure analysis means for analyzing the logical hierarchical format of the documents and for obtaining structural elements and a hierarchical relationship between the structural elements within each document;
classification unit designation means for designating a classification unit showing which level the structural elements to be classified are at within the hierarchical relationship;
fundamental vector generation means for extracting a key word from the content of each structural element of the classification unit that is designated by the classification unit designation means, and for generating a fundamental vector based on the extracted key word;
heading vector generation means for extracting a key word from the heading of each structural element that is superordinate to the structural element used to generate the fundamental vector within said hierarchical relationship, and for generating a heading vector based on the extracted key word;
vector synthesis means for generating, for each structural element of the classification unit, a composite vector based on the corresponding fundamental and heading vectors; and
classification means for calculating a degree of similarity among the composite vectors and for classifying the structural elements of the documents based on the degree of similarity.
1 Assignment
0 Petitions
Accused Products
Abstract
Parts of documents are retrieved using the entire context of selected documents. Classification unit designation section performs the designation of a classification unit. A logical structure analysis section analyzes the logical structure of the documents read-in from a document storing section where the documents are stored. A fundamental vector generation section partitions the logical structure of the documents by means of the classification unit, extracts keywords, and generates fundamental vectors. A heading vector generation section extracts key words from the headings of the structural elements that are arranged in higher level of structure than the structural element of the classification unit that was the target of fundamental vector generation, and generates heading vectors. A vector synthesis section synthesizes fundamental vectors and heading vectors, and generates composite vectors. Composite vector maintenance section attaches the corresponding composite vectors to structural elements of the classification unit that were the target of composite vector generation and maintains the attached objects. A classification section classifies the structural elements of the documents of the classification unit based on the degree of similarity of the generated composite vectors. A display section displays the results of classification.
133 Citations
16 Claims
-
1. A document retrieval device that classifies a group of documents, each document having structural elements arranged in a logical hierarchical format, each structural element including at least one of a heading and a content, the documents stored in a document storage device, the retrieval device comprising:
-
logical structure analysis means for analyzing the logical hierarchical format of the documents and for obtaining structural elements and a hierarchical relationship between the structural elements within each document; classification unit designation means for designating a classification unit showing which level the structural elements to be classified are at within the hierarchical relationship; fundamental vector generation means for extracting a key word from the content of each structural element of the classification unit that is designated by the classification unit designation means, and for generating a fundamental vector based on the extracted key word; heading vector generation means for extracting a key word from the heading of each structural element that is superordinate to the structural element used to generate the fundamental vector within said hierarchical relationship, and for generating a heading vector based on the extracted key word; vector synthesis means for generating, for each structural element of the classification unit, a composite vector based on the corresponding fundamental and heading vectors; and classification means for calculating a degree of similarity among the composite vectors and for classifying the structural elements of the documents based on the degree of similarity. - View Dependent Claims (2, 3)
-
-
4. A document retrieval device that retrieves structural elements of documents, each document having structural elements arranged in a logical hierarchical format, each structural element including at least one of a heading and a content, the documents stored in a document storage device, the retrieval device comprising:
-
logical structure analysis means for analyzing the logical hierarchical format of said documents and for obtaining structural elements and a hierarchical relationship between the structural elements within each document; retrieval unit designation means for designating a retrieval unit, the retrieval unit showing which level the structural elements to be retrieved are at within the hierarchical relationship; fundamental vector generation means for extracting a key word from the content of each structural element of the retrieval unit designated by the retrieval unit designation means, and for generating a fundamental vector based on the extracted key word; heading vector generation means for extracting a key word from the heading of each structural element that is superordinate to the structural element used to generate the fundamental vector within said hierarchical relationship, and for generating a heading vector based on the extracted key word; vector synthesis means for generating, for each structural element of the retrieval unit, a composite vector based on the corresponding fundamental and heading vectors; query input means for inputting a query; query vector generation means for generating a query vector from the query; and retrieval means for calculating a degree of similarity between the query vector and each composite vector and for retrieving the structural elements of the retrieval unit having a predetermined degree of similarity relative to composite vectors. - View Dependent Claims (5, 6)
-
-
7. A document retrieval device that classifies a group of documents, each document having structural elements arranged in a logical hierarchical format, each structural element having a content, the documents stored in a document storage device, the retrieval device comprising:
-
logical structure analysis means for analyzing the logical hierarchical format of the documents and for obtaining structural elements and a hierarchical relationship between the structural elements within each document; classification unit designation means for designating a classification unit showing which level the structural elements to be classified are at within the hierarchical relationship; fundamental vector generation means for extracting a key word from the content of each structural element of the classification unit that is designated by the classification unit designation means, and for generating a fundamental vector based on the extracted key word; content vector generation means for extracting a key word from the content of each structural element other than the structural element used to generate the fundamental vector, and for generating a content vector for each structural element based on the extracted key word; vector synthesis means for generating, for each structural element of the classification unit, a composite vector based on the corresponding fundamental and content vectors; and classification means for calculating a degree of similarity among the composite vectors and for classifying the structural elements of the documents based on the degree of similarity. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A document retrieval device that classifies a group of documents, each document having structural elements arranged in a logical hierarchical format, each structural element having a content, the documents stored in a document storage device, the retrieval device comprising:
-
logical structure analysis means for analyzing the logical hierarchical format of the documents and for obtaining structural elements and a hierarchical relationship between the structural elements within each document; retrieval unit designation means for designating a retrieval unit, the retrieval unit showing which level the structural elements to be retrieved are at within the hierarchical relationship; fundamental vector generation means for extracting a key word from the content of each structural element of the retrieval unit designated by the retrieval unit designation means, and for generating a fundamental vector based on the extracted key word; content vector generation means for extracting a key word from the content of each structural element other than the structural element used to generate the fundamental vector, and for generating a content vector for each structural element based on the extracted key word; vector synthesis means for generating, for each structural element of the retrieval unit, a composite vector based on the corresponding fundamental and content vectors; query input means for inputting a query; query vector generation means for generating a query vector from the query; and retrieval means for calculating a degree of similarity between the query vector and each composite vector and for retrieving the structural elements of the retrieval unit having a predetermined degree of similarity relative to composite vectors. - View Dependent Claims (14, 15, 16)
-
Specification