Method for automatically indexing documents
First Claim
1. A method for retrieving based on a search term together with its corresponding meaning from a set of base documents those documents which contain said search term and in which said search term has said corresponding meaning, said method comprising:
- searching, utilizing a computer, for those base documents among said set of base documents which contain said search term;
evaluating, utilizing the computer, the found base documents as to whether said search term contained in said found base documents has said corresponding meaning, said evaluation comprising;
generating, utilizing the computer, a text document to represent elements surrounding the search term and the elements'"'"' corresponding relative position with respect to said search term, said elements'"'"' relative position with respect to said search term comprising where the elements are located in the surrounding area of the search term, as compared with where the search term is located;
inputting, utilizing the computer, said text document into a trainable classifying apparatus which has been trained to recognize whether said search term in each said found base document has said corresponding meaning, whereas said training has been performed based on a training sample of said found base documents which have been generated for documents in which the search term surrounded by the surrounding elements has said corresponding meaning inputted by said user; and
classifying, utilizing the computer, each said found base document to judge whether said search term in each said found based document has said corresponding meaning;
generating a database from the elements and their corresponding meaning.
13 Assignments
0 Petitions
Accused Products
Abstract
A method for retrieving based on a search term together with its corresponding meaning from a set of base documents those documents which contain the search term and in which the certain search term has the certain meaning to enable the building of an index on the retrieved documents. The method includes searching for those base documents among the set of base documents which contain the certain search term and evaluating the found base documents as to whether the search term contained in the found base documents, respectively, has a certain meaning. Evaluation includes generating a text document to represent elements surrounding the search term and their corresponding absolute or relative position with respect to the search term; inputting the text document into a trainable classifying apparatus; classifying the inputted text document to judge whether the search term has the inputted meaning.
-
Citations
18 Claims
-
1. A method for retrieving based on a search term together with its corresponding meaning from a set of base documents those documents which contain said search term and in which said search term has said corresponding meaning, said method comprising:
-
searching, utilizing a computer, for those base documents among said set of base documents which contain said search term; evaluating, utilizing the computer, the found base documents as to whether said search term contained in said found base documents has said corresponding meaning, said evaluation comprising; generating, utilizing the computer, a text document to represent elements surrounding the search term and the elements'"'"' corresponding relative position with respect to said search term, said elements'"'"' relative position with respect to said search term comprising where the elements are located in the surrounding area of the search term, as compared with where the search term is located; inputting, utilizing the computer, said text document into a trainable classifying apparatus which has been trained to recognize whether said search term in each said found base document has said corresponding meaning, whereas said training has been performed based on a training sample of said found base documents which have been generated for documents in which the search term surrounded by the surrounding elements has said corresponding meaning inputted by said user; and classifying, utilizing the computer, each said found base document to judge whether said search term in each said found based document has said corresponding meaning; generating a database from the elements and their corresponding meaning. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of training a classifying apparatus to retrieve based on a search term together with its corresponding meaning from a set of base documents those documents which contain said search term and in which said search term has said corresponding meaning, said method of training comprising:
-
looking, utilizing a computer, for base documents in which said search term has said corresponding meaning; selecting, utilizing the computer, said search term by the user; repeating said looking and selecting until a sufficient set of base documents has been selected to generate a training sample; generating, utilizing the computer and the training sample, text documents to represent elements surrounding the search term and the elements'"'"' corresponding relative positions with respect to said search term, said elements'"'"' relative positions with respect to said search term comprising where the elements are located in the surrounding area of the search term, as compared with where the search term is located; and using, utilizing the computer, said generated text documents as said training set for training said classifying apparatus by running said classifying apparatus in the training mode; generating a database from the elements and their corresponding meaning.
-
-
9. A method for automatically indexing a set of base documents based on a set of training examples, said automatic indexing comprising:
-
evaluating, using a computer, said base documents by checking for all elements respectively contained therein which meet predefined criteria, and whether the elements have a corresponding meaning, said evaluation comprising; for those elements to be checked, generating a text document based on said element to be checked and its surrounding elements coding for the surrounding elements'"'"' corresponding relative positions with respect to said element to be checked, said relative positions with respect to said elements to be checked comprising where the surrounding elements are located as compared with where the elements to be checked are located; inputting, using the computer, said text documents into a trainable classifying apparatus which has been trained to recognize whether an inputted text document belongs to a corresponding classification category or not, whereas said training has been performed based on a training sample of text documents which have been generated for documents in which the element to be checked surrounded by the surrounding elements has said corresponding meaning; judging, using the computer, by said trainable classifying apparatus whether said element has said corresponding meaning; and for those base documents where elements have been found to have said corresponding meaning, building, utilizing the computer, an index indexing said large volume of base documents, the index comprising said elements with a corresponding reference to the document in which the elements are contained.
-
-
10. A computer program comprising computer program code for enabling a computer to retrieve, based on a search term together with its corresponding meaning, from a set of base documents, those documents which contain said search term and in which said search term has said corresponding meaning, said computer program code comprising:
-
a computer configured for; searching for those base documents among said set of base documents which contain said search term; evaluating the found base documents as to whether said search term contained in said found base documents has said corresponding meaning, said evaluation comprising; generating a text document to represent elements surrounding the search term and the elements'"'"' corresponding relative position with respect to said search term, said elements'"'"' relative position with respect to said search term comprising where the elements are located in the surrounding area of the search term, as compared with where the search term is located; inputting said text document into a trainable classifying apparatus which has been trained to recognize whether said search term in each said found base document has said corresponding meaning, whereas said training has been performed based on a training sample of said found base documents which have been generated for documents in which the search term surrounded by the surrounding elements has said corresponding meaning inputted by said user; and classifying said found base document to judge whether said search term in said found base document has said corresponding meaning, generating a database from the elements and their corresponding meaning. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system of training a classifying apparatus to retrieve based on a search term together with its corresponding meaning from a set of base documents those documents which contain said search term and in which said search term has said corresponding meaning, said method of training comprising:
-
a computer configured for; looking for base documents in which said search term has said corresponding meaning; selecting said search term by the user; repeating said looking and selecting until a sufficient set of base documents has been selected to generate a training sample; generating, utilizing the training sample, text documents to represent elements surrounding the search term and the elements'"'"' corresponding relative positions with respect to said search term, said elements'"'"' relative positions with respect to said search term comprising where the elements are located in the surrounding area of the search term, as compared with where the search term is located; and using, utilizing the computer, said generated text documents as said training set for training said classifying apparatus by running said classifying apparatus in the training mode; generating a database from the elements and their corresponding meaning.
-
-
18. A system for automatically indexing a set of base documents based on a set of training examples, said automatic indexing comprising:
-
a computer configured for; evaluating said base documents by checking for some or all elements respectively contained therein which meet predefined criteria, and whether the elements have a corresponding meaning, said evaluation comprising; for those elements to be checked, generating a text document based on said element to be checked and its surrounding elements coding for the surrounding elements'"'"' corresponding relative positions with respect to said element to be checked, said relative positions with respect to said elements to be checked comprising where the surrounding elements are located as compared with where the elements to be checked are located; inputting said text documents into a trainable classifying apparatus which has been trained to recognize whether an inputted text document belongs to a predefined classification category or not, whereas said training has been performed based on a training sample of text documents which have been generated for documents in which the element to be checked surrounded by the surrounding elements has said corresponding meaning; judging by said trainable classifying apparatus whether said element has said corresponding meaning; and for those base documents where elements have been found to have said corresponding meaning, building, utilizing the computer, an index indexing said large volume of base documents, the index comprising said elements with a corresponding reference to the document in which the elements are contained.
-
Specification