Method and apparatus for multi-language indexing
First Claim
1. A method for forming an index comprising indexing features for a plurality of documents, comprising data-processing apparatus implemented steps of:
- identifying each of at least some of the terms present in the documents;
generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs; and
forming an index comprising the first and second indexing features, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for forming an index comprising indexing features for a plurality of documents, includes the steps of identifying each of at least some of the terms present in the documents, generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term, forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs, forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identifier term to which the equivalent term is equivalent occurs, and forming an index comprising the first and second indexing features.
-
Citations
27 Claims
-
1. A method for forming an index comprising indexing features for a plurality of documents, comprising data-processing apparatus implemented steps of:
-
identifying each of at least some of the terms present in the documents;
generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs; and
forming an index comprising the first and second indexing features, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A data-processing apparatus for forming an index comprising indexing features for a plurality of documents, comprising:
-
means for identifying each of at least some of the terms present in the documents;
means for generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
means for forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
means for forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs; and
means for forming an index comprising the first and second indexing features, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A storage medium containing a program for controlling a data processor to perform a method for forming an index comprising indexing features for a plurality of documents, the method comprising the steps of:
-
identifying each of at least some of the terms present in the documents;
generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs; and
forming an index comprising the first and second indexing features, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively.
-
-
25. An index comprising indexing features for a plurality of documents, the index formed by a method comprising data-processing apparatus implemented steps of:
-
identifying each of at least some of the terms present in the documents;
generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs; and
forming an index comprising the first and second indexing features, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively.
-
-
26. A storage medium containing an index comprising indexing features for a plurality of documents, the index formed by a method comprising the steps of:
-
identifying each of at least some of the terms present in the documents;
generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs; and
forming an index comprising the first and second indexing features, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively.
-
-
27. A method for accessing documents comprising data processing apparatus implemented steps of:
-
forming an index by;
identifying each of at least some of the terms present in the documents;
generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term;
forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs;
forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identified term to which the equivalent term is equivalent occurs;
forming an index comprising the first and second indexing features; and
accessing the documents by using the index, wherein the documents are natural language documents in a source language, the at least one equivalent term is a natural language translation of the corresponding identified term in the source language to the equivalent term in a target language different from the source language, and the forming steps include forming the first and second indexing features for the identified term in the source language and the equivalent term in the target language, respectively.
-
Specification