Method and device for document retrieval
First Claim
Patent Images
1. A method for document retrieval, comprising:
- dividing a query character string into partial character strings;
selecting at least one document from a plurality of registered documents such that each of the at least one document includes all the partial character strings;
computing respective scores of the partial character strings for each of the at least one document; and
computing a score of the query character string from the respective scores of the partial character strings for each of the at least one document.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for document retrieval includes the steps of dividing a query character string into partial character strings, selecting one or more documents from a plurality of registered documents such that the one or more documents each include all the partial character strings, computing respective scores of the partial character strings for each of the one or more documents, and computing a score of the query character string from the respective scores of the partial character strings for each of the one or more documents.
-
Citations
36 Claims
-
1. A method for document retrieval, comprising:
-
dividing a query character string into partial character strings;
selecting at least one document from a plurality of registered documents such that each of the at least one document includes all the partial character strings;
computing respective scores of the partial character strings for each of the at least one document; and
computing a score of the query character string from the respective scores of the partial character strings for each of the at least one document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
obtaining a first count indicating how many of the plurality of registered documents include a given one of the partial character strings;
obtaining a second count indicating how many times the given one of the partial character strings appears in a given one of the at least one document; and
obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count, such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
5. The method as claimed in claim 1, wherein said step of computing respective scores of the partial character strings comprises:
-
obtaining a first count indicating how many of the plurality of registered documents include a given one of the partial character strings;
obtaining second counts each indicating how many times a corresponding one of the partial character strings appears in a given one of the at least one document;
obtaining a smallest of the second counts; and
obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the smallest of the second counts such that the score of the given one of the partial character strings increases as the first count decreases and as the smallest of the second counts increases.
-
-
6. The method as claimed in claim 1, wherein said step of computing respective scores of the partial character strings comprises:
-
obtaining a first count indicating how many of the plurality of registered documents include a given one of the partial character strings;
obtaining a second count indicating how many times the query character string appears in a given one of the at least one document; and
obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
7. The method as claimed in claim 6, wherein said step of obtaining a second count further comprises placing an upper limit on the second count.
-
8. The method as claimed in claim 1, wherein said step of selecting the at least one document selects the at least one document, each of which includes the query character string, and said step of computing respective scores of the partial character strings comprises:
-
obtaining a first count indicating how many of the plurality of registered documents include the query character string;
obtaining a second count indicating how many times a given one of the partial character strings appears in a given one of the at least one document; and
obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
9. The method as claimed in claim 1, wherein said step of selecting the at least one document selects the at least one document, each of which includes the query character string, and said step of computing respective scores of the partial character strings comprises:
-
obtaining a first count indicating how many of the plurality of registered document include the query character string;
computing a limit from the first count;
obtaining a second count indicating how many times the query character string appears in a given one of the at least one document while limiting an upper end of the second count to said limit; and
obtaining a score of a given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the spore of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
10. A method for document retrieval, comprising the steps of:
-
providing respective indexes for documents, each of the respective indexes listing partial character strings found in a corresponding document and respective positions thereof in the corresponding document;
selecting the partial character strings which start with a character string identical to a query character string;
selecting at least one document from the documents such that the at least one document each includes at least one of the selected partial character strings;
computing respective scores of the selected partial character strings for each of the at least one document; and
computing a score of the query character string from the respective scores of the selected partial character strings for each of the at least one document. - View Dependent Claims (11, 12)
obtaining a first count indicating how many of the registered documents include a given one of the selected partial character strings;
obtaining a second count indicating how many times the given one of the selected partial character strings appears in a given one of the at least one document; and
obtaining a score of the given one of the selected partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the selected partial character strings increases as the first count decreases and as the second count increases.
-
-
12. The method as claimed in claim 10, wherein said step of computing respective scores of the selected partial character strings comprises:
-
obtaining a first count indicating how many of the registered documents include a given one of the selected partial character strings;
obtaining a second count indicating how many times the query character string appears in a given one of the at least one document; and
obtaining a score of the given one of the selected partial character strings for the given one of the one at least document from the first count and the second count such that the score of the given one of the selected partial character strings increases as the first count decreases and as the second count increases.
-
-
13. A device for document retrieval, comprising:
-
a dividing unit which divides a query character string into partial character strings;
a document-selection unit which selects at least one document from a plurality of registered documents such that the at least one document each include all the partial character strings; and
a score-computation unit which computes respective scores of the partial character strings for each of the at least one document, and further computes a score of the query character string from the respective scores of the partial character strings for each of the at least one document. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
means for obtaining a first count indicating how many of the registered documents include a given one of the partial character strings;
means for obtaining a second count indicating how many times the given one of the partial character strings appears in a given one of the at least one document; and
means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
17. The device as claimed in claim 13, wherein said score-computation unit comprises:
-
means for obtaining a first count indicating how many of the registered documents include a given one of the partial character strings;
means for obtaining second counts each indicating how many times a corresponding one of the partial character strings appears in a given one of the at least one document;
means for obtaining a smallest of the second counts; and
means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the smallest of the second counts such that the score of the given one of the partial character strings increases as the first count decreases and as the smallest of the second counts increases.
-
-
18. The device as claimed in claim 13, wherein said score-computation unit comprises:
-
means for obtaining a first count indicating how many of the registered documents include a given one of the partial character strings;
means for obtaining a second count indicating how many times the query character string appears in a given one of the at least one document; and
means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
19. The device as claimed in claim 18, wherein said means for obtaining a second count further comprises means for placing an upper limit on the second count.
-
20. The device as claimed in claim 13, wherein said document-selection unit selects the at least one document each of which includes the query character string, and said score-computation unit comprises:
-
means for obtaining a first count indicating how many of the registered documents include the query character string;
means for obtaining a second count indicating how many times a given one of the partial character strings appears in a given one of the at least one document; and
means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
21. The device as claimed in claim 13, wherein said document-selection unit selects the at least one document each of which includes the query character string, and said score-computation unit comprises:
-
means for obtaining a first count indicating how many of the registered documents include the query character string;
means for computing a limit from the first count;
means for obtaining a second count indicating how many times the query character string appears in a given one of the at least one document while limiting an upper end of the second count to said limit; and
means for obtaining a score of a given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
22. A device for document retrieval, comprising:
-
a text-dividing unit which provides respective indexes for documents, each of the respective indexes listing partial character strings found in a corresponding document and respective positions thereof in the corresponding document, and which selects the partial character strings which start with a character string identical to a query character string;
a document-selection unit which selects at least one document from the documents such that the at least one document each include at least one of the selected partial character strings; and
a score-computation unit which computes respective scores of the selected partial character strings for each of the at least one document, and further computes a score of the query character string from the respective scores of the selected partial character strings for each of the at least one document. - View Dependent Claims (23, 24)
means for obtaining a first count indicating how many of the registered documents include a given one of the selected partial character strings;
means for obtaining a second count indicating how many times the given one of the selected partial character strings appears in a given one of the at least one document; and
means for obtaining a score of the given one of the selected partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the selected partial character strings increases as the first count decreases and as the second count increases.
-
-
24. The device as claimed in claim 22, wherein said score-computation unit comprises:
-
means for obtaining a first count indicating how many of the registered documents include a given one of the selected partial character strings;
means for obtaining a second count indicating how many times the query character string appears in a given one of the at least one document; and
means for obtaining a score of the given one of the selected partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the selected partial character strings increases as the first count decreases and as the second count increases.
-
-
25. A computer-readable record medium having a program embodied therein for causing a computer to attend to document retrieval, said program comprising:
-
a dividing code unit which divides a query character string into partial character strings;
a document-selection code unit which selects at least one document from a plurality of registered documents such that the at least one document each include all the partial character strings; and
a score-computation code unit which computes respective scores of the partial character strings for each of the at least one document, and further computes a score of the query character string from the respective scores of the partial character strings for each of the at least one document. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
code means for obtaining a first count indicating how many of the registered documents include a given one of the partial character strings;
code means for obtaining a second count indicating how many times the given one of the partial character strings appears in a given one of the at least one document; and
code means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
29. The computer-readable record medium as claimed in claim 25, wherein said score-computation code unit comprises:
-
code means for obtaining a first count indicating how many of the registered documents include a given one of the partial character strings;
code means for obtaining second counts each indicating how many times a corresponding one of the partial character strings appears in a given one of the at least one document;
code means for obtaining a smallest of the second counts; and
code means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the smallest of the second counts such that the score of the given one of the partial character strings increases as the first count decreases and as the smallest of the second counts increases.
-
-
30. The computer-readable record medium as claimed in claim 25, wherein said score-computation code unit comprises:
-
code means for obtaining a first count indicating how many of the registered documents include a given one of the partial character strings;
code means for obtaining a second count indicating how many times the query character string appears in a given one of the at least one document; and
code means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
31. The computer-readable record medium as claimed in claim 30, wherein said code means for obtaining a second count further comprises code means for placing an upper limit on the second count.
-
32. The device as claimed in claim 25, wherein said document-selection code unit selects the one or more documents each of which includes the query character string, and said score-computation code unit comprises:
-
code means for obtaining a first count indicating how many of the registered documents include the query character string;
code means for obtaining a second count indicating how many times a given one of the partial character strings appears in a given one of the at least one document; and
code means for obtaining a score of the given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
33. The computer-readable record medium as claimed in claim 25, wherein said document-selection code unit selects the one or more documents each of which includes the query character string, and said score-computation code unit comprises:
-
code means for obtaining a first count indicating how many of the registered documents include the query character string;
code means for computing a limit from the first count;
code means for obtaining a second count indicating how many times the query character string appears in a given one of the at least one document while limiting an upper end of the second count to said limit; and
code means for obtaining a score of a given one of the partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the partial character strings increases as the first count decreases and as the second count increases.
-
-
34. A computer-readable record medium having a program embodied therein for causing a computer to attend to document retrieval, said program comprising:
-
a text-dividing code unit which provides respective indexes for documents, each of the respective indexes listing partial character strings found in a corresponding document and respective positions-thereof in the corresponding document, and which selects the partial character strings which start with a character string identical to a query character string;
a document-selection code unit which selects one or more documents from the documents such that the at least one document each include at least one of the selected partial character strings; and
a score-computation code unit which computes respective scores of the selected partial character strings for each of the at least one document, and further computes a score of the query character string from the respective scores of the selected partial character strings for each of the at least one document. - View Dependent Claims (35, 36)
code means for obtaining a first count indicating how many of the registered documents include a given one of the selected partial character strings;
code means for obtaining a second count indicating how many times the given one of the selected partial character strings appears in a given one of the at least one document; and
code means for obtaining a score of the given one of the selected partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the selected partial character strings increases as the first count decreases and as the second count increases.
-
-
36. The computer-readable record medium as claimed in claim 34, wherein said score-computation code unit comprises:
-
code means for obtaining a first count indicating how many of the registered documents include a given one of the selected partial character strings;
code means for obtaining a second count indicating how many times the query character string appears in a given one of the at least one document; and
code means for obtaining a score of the given one of the selected partial character strings for the given one of the at least one document from the first count and the second count such that the score of the given one of the selected partial character strings increases as the first count decreases and as the second count increases.
-
Specification