Apparatus and computerised method for determining constituent words of a compound word
First Claim
Patent Images
1. An apparatus for determining constituent words of a compound word, the apparatus comprising:
- a document collection;
means for determining, from the document collection, a number of documents containing the compound word;
means for determining, from the document collection, a number of documents containing constituent words constituting the compound word;
means for determining a ratio between the number of documents containing the compound word and the number of documents containing the constituent words constituting the compound word; and
means for splitting the compound word into the constituent words when the ratio is smaller than a threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus, a computer program and a computerized method for determining constituent words of a compound word are provided. Constituent words constitute a compound word. When the constituent words comply to split decision criteria then the constituent words can be used in a separate form. The separate form of the constituent words is used in the search to retrieve the related documents from the document collection.
-
Citations
22 Claims
-
1. An apparatus for determining constituent words of a compound word, the apparatus comprising:
-
a document collection;
means for determining, from the document collection, a number of documents containing the compound word;
means for determining, from the document collection, a number of documents containing constituent words constituting the compound word;
means for determining a ratio between the number of documents containing the compound word and the number of documents containing the constituent words constituting the compound word; and
means for splitting the compound word into the constituent words when the ratio is smaller than a threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An information retrieval system comprising:
-
means for entering at least one search term comprising a compound word;
means for determining, from a document collection, a number of documents containing the compound word;
means for determining, from the document collection, a number of documents containing constituent words constituting the compound word;
means for determining a ratio between the number of documents containing the compound word and he number of documents containing the constituent words constituting the compound word;
means for splitting the compound word into the constituent words when the ratio is smaller than a threshold value; and
means for carrying out a search with the delivered constituent words as search terms.
-
-
11. A method for determining constituent words of a compound word, the method comprising the steps of:
-
determining, from a document collection, a number of documents containing a compound word;
determining, from the document collection, a number of documents containing constituent words constituting the compound word;
determining a ratio between the number of documents containing the compound word and the number of documents containing the constituent words constituting the compound word; and
splitting the compound word into the constituent words when the ratio is smaller than a threshold value. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for information retrieval, the method comprising the steps of:
-
entering at least one search term comprising a compound word;
determining, from a document collection, a number of documents containing the compound word;
determining, from the document collection, a number of documents containing constituent words constituting the compound word;
determining a ratio between the number of documents containing the compound word and the number of documents containing the constituent words constituting the compound word;
splitting the compound word into the constituent words when the ratio is smaller than a threshold value;
delivering the constituent words; and
carrying out a search with the delivered constituent words as search terms.
-
-
21. A computer program product embodied on at least one computer-readable medium, for determining constituent words of a compound word, the product comprising computer-executable instructions for:
-
determining, from a document collection, a number of documents containing a compound word;
determining, from the document collection, a number of documents containing constituent words constituting the compound word;
determining a ratio between the number of documents containing the compound word and the number of documents containing the constituent words constituting the compound word; and
splitting the compound word into the constituent words when the ratio is smaller than a threshold value.
-
-
22. A computer program product embodied on at least one computer-readable medium, for retrieving information, the product comprising computer-executable instructions for:
-
entering at least one search term comprising a compound word;
determining, from a document collection, a number of documents containing the compound word;
determining, from the document collection, a number of documents containing constituent words constituting the compound word;
determining a ratio between the number of documents containing the compound word and the number of documents containing the constituent words constituting the compound word;
splitting the compound word into the constituent words when the ratio is smaller than a threshold value;
delivering the constituent words; and
carrying out a search with the delivered constituent words as search terms.
-
Specification