Document searching apparatus, method thereof, and record medium thereof
First Claim
Patent Images
1. A document searching apparatus for searching a document group having a link relation for a document, comprising:
- a link importance assigning unit weighting the link relation and assigning link importance which indicates importance of the document based on the weighted link relation to the document; and
an accessing unit accessing the document based on the link importance,wherein said link importance assigning unit includes;
a URL similarity calculating unit calculating a URL similarity that is a text similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs,wherein said link importance assigning unit calculates the link importance based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases.
2 Assignments
0 Petitions
Accused Products
Abstract
A document searching apparatus for searching a document group having a link relation for particular document is disclosed, that comprises a link importance assigning unit weighting the link relation and assigning the link importance which indicates importance of the document based on the weighted link relation to each document, and an accessing unit accessing the particular document based on the link importance. Thus, important document can be automatically searched.
-
Citations
28 Claims
-
1. A document searching apparatus for searching a document group having a link relation for a document, comprising:
-
a link importance assigning unit weighting the link relation and assigning link importance which indicates importance of the document based on the weighted link relation to the document; and an accessing unit accessing the document based on the link importance, wherein said link importance assigning unit includes; a URL similarity calculating unit calculating a URL similarity that is a text similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs, wherein said link importance assigning unit calculates the link importance based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13)
-
-
12. A document searching apparatus for searching a document group having a link relation for a document, comprising:
-
a link importance assigning unit weighting the link relation and assigning link importance which indicates importance of the document based on the weighted link relation to the document; and an accessing unit accessing the document based on the link importance, and wherein said link importance assigning unit includes; a URL similarity calculating unit calculating a URL similarity that is a similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs, wherein said link importance assigning unit calculates the link importance based on the URL similarity and the link relation of the document, wherein the link importance of each document is defined as a solution of the following simultaneous linear equation (1), assuming that Cq is constant (the lower limit of the importance that depends on each page) for each p∈
DOC and that when a page p is linked to a page q, the link weight lw(p, q) is defined by the formula (2);where DOC={p1, p2, . . . , pN} is a set of documents calculated for the link importance;
Wp is the link importance of the page p;
Ref(p) is a set of pages linked from the page p;
Refed(p) is a set of pages linking to the page p;
sim(p, q) is the URL similarity of the pages p and q;
diff(p, q)=1/sim(p, q) is the difference.
-
-
14. A document index creating apparatus for creating an index of a document group having a link relation, comprising:
-
a link importance assigning unit assigning a link importance to the document based on the link relation; a keyword extracting unit extracting a keyword from the document; an index creating unit creating an index for accessing the keyword based on pronunciation characters or spelling of the extracted keyword; and an accessing unit accessing document assigned the link importance corresponding to the keyword when the pronunciation characters or spelling of the keyword are selected from the index, wherein said link importance assigning unit includes; a URL similarity calculating unit calculating a URL similarity that is a text similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs, wherein said link importance assigning unit calculates the link importance based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases.
-
-
15. A document index creating apparatus for creating an index of a document group having a link relation, comprising:
-
a link importance assigning unit assigning a link importance to the document depending on whether or not URLs of the documents are similar; a keyword extracting unit extracting a keyword from the document; and an index creating unit creating an index for accessing the document corresponding to pronunciation characters or spelling of the extracted keyword based link importance, wherein said link importance assigning unit includes; a URL similarity calculating unit calculating a URL similarity that is a text similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs, wherein said link importance assigning unit calculates the link importance based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases.
-
-
16. A link list creating system for creating a link list for a document group having a link relation, comprising:
-
a collecting unit collecting the documents from a network; a link importance assigning unit assigning a link importance of the document as an importance calculated based on the link relation to the document; a URL charcter string determining unit determining a URL having a particular charcterisitic of a charcter string from the documents; an index creating unit creating a link list for listing less than a predetermined number of links to the documents based on the link importance and the particular characterisitic of the character string of the URL; and a document type determining unit determining a document type based on a URL similarity represnting a text similarity between character strings of URLs of the documents and being an appearance of written characters of URLs, the number of links to the document, and the number of links from the documents, wherein said index creating unit selects the document based on the document type and creates a link list of the selected document, and wherein said link importance assigning unit calculates the link importance based on an inverse URL similarity and the link relation of the document, so that a link wieght increase as URL similarity decreases.
-
-
17. A document searching method for searching a document group having a link relation for a document, comprising:
-
assigning a link importance as an importance of the document calculated with weighting the link relation to the document, comprising; calculating a URL similarity that is a text similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs; calculating the link importance based on the URL similarity and the link relation of the document with said link importance being based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases; and accessing the document based on the link importance. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A link list creating method for creating a link list for a document group having a link relation, comprising:
-
collecting the document from a network; assigning a link importance which indicates importance of the document to the document based on the link relation; determining a URL having a particular characteristic of a character string from the URLs of each document; creating a link list for listing less than a predetermined number of links to the document based on the link importance and the particular characteristic of the character string of the URL determining a document type based on a URL similarity that is a text similarity between character strings of URLs (Uniform Resource Locators) of the documents and that is an appearance of written characters of URLs, the number of links to the document, and the number of links from the document, and selecting the document based on the document type, and wherein the creating creates the link list for the selected document based on the link importance and the particular characteristic of the character string of the URL, and wherein said link importance being based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases.
-
-
26. A computer readable record medium for recording a program that causes a computer to execute a process for creating a link list for a document group having a link relation, the program comprising:
-
collecting documents from a network; assigning a link importance which indicates importance of the document to each document based on the link relation, including; calculating a URL similarity that is a text similarity of character strings of URLs that represent the location of the documents and that is an appearance of written characters of URLs, calculating the link importance based on the URL similarity and the link relation of the document with said link importance being based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases; determining a URL having a particular characteristic of a character string from the URLs of documents; and creating a link list for listing less than a predetermined number of links to the documents based on the link importance and the particular characteristic of the character string of the URL.
-
-
27. A document searching apparatus for searching a document group having a link relation for a document, comprising:
-
a link importance assigning unit weighting the link relation and assigning link importance which indicates importance of the document based on the weighted link relation to the document, said link importance assigning unit comprising a similarity calculating unit calculating a URL similarity that is a similarity URLs that represent the location of the documents and that is an appearance of written characters of URLs and said link importance assigning unit calculates the link importance based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases; and an accessing unit accessing the document based on the link importance.
-
-
28. A document searching method for searching a document group having a link relation for a document, comprising:
-
assigning a link importance as an importance of the document calculated with weighting the link relation to the document, comprising; calculating a similarity that is a similarity of URLs that represent the location of the documents; calculating the link importance based on an inverse URL similarity and the link relation of the document, so that a link weight increases as URL similarity decreases; and accessing the document based on the link importance.
-
Specification