Searching hypertext based multilingual web information
First Claim
1. A computer-implemented method comprising searching hypertext based multilingual Web information when searching on a network for keywords to be queried, the step of searching comprising steps of:
- receiving keywords input by a user via a searching interface;
transferring the keywords to an analysis evaluation module for computing similarity between the keywords and hypertext information in a hypertext database;
using the analysis evaluation module for matching the keywords based on index data stored in a Web repository and performing an analysis of hyperlinks;
conducting a comprehensive evaluation based on the computing results of hypertext information retrieving and hyperlink analysis;
ranking hyperlinks according to the correlativity of the hyperlink text with the keywords; and
returning to the user a ranked search result;
wherein the hypertext similarity with respect to the keyword to be queried is determined as follows;
where di represents an ith hypertext in a Web page d, dij represents a j'"'"'th dimension of the i'"'"'th hypertext, qj represents the j'"'"'th dimension of a keyword Q, and S represents a hypertext similarity di with said keyword Q.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides methods, apparatus and systems for searching hypertext based multilingual Web information when searching on a network for keywords to be queried. A method includes: a receiving step for receiving keywords input by a user; a native language hypertext searching step for searching on the network, according to the keywords to be queried, for all hypertexts whose representing language is the same as a language representing the keywords and which matches the keywords to be queried; extracting hyperlinks related to an arbitrary language from all the searched hypertexts; a hyperlink ranking step for ranking the extracted hyperlinks according to the correlativity of the hyperlinks with the keywords to be queried; and returning to the user ranked search result. Thereby, an accurate cross language searching can be provided without extra machine translation effort, being more accurate and objective than machine translation, even than human translation.
-
Citations
12 Claims
-
1. A computer-implemented method comprising searching hypertext based multilingual Web information when searching on a network for keywords to be queried, the step of searching comprising steps of:
-
receiving keywords input by a user via a searching interface; transferring the keywords to an analysis evaluation module for computing similarity between the keywords and hypertext information in a hypertext database; using the analysis evaluation module for matching the keywords based on index data stored in a Web repository and performing an analysis of hyperlinks; conducting a comprehensive evaluation based on the computing results of hypertext information retrieving and hyperlink analysis; ranking hyperlinks according to the correlativity of the hyperlink text with the keywords; and returning to the user a ranked search result; wherein the hypertext similarity with respect to the keyword to be queried is determined as follows; where di represents an ith hypertext in a Web page d, dij represents a j'"'"'th dimension of the i'"'"'th hypertext, qj represents the j'"'"'th dimension of a keyword Q, and S represents a hypertext similarity di with said keyword Q. - View Dependent Claims (2, 3, 4, 5, 6)
where PR(v) refers to the rank of a Web page v, outlink(u) refers to the number of all the hyperlinks in a Web page u, n is the number of all Web pages, and ε
is an adjusting parameter.
-
-
6. The computer-implemented method of claim 3, wherein after conducting comprehensive evaluation according to the following computation, ranking is performed according to the comprehensive evaluation value:
-
where PR(v) refers to the rank of a Web page v, outlink(u) refers to the number of all the hyperlinks in a Web page u, n is the number of all Web pages, and δ
is an adjusting parameter;R(d) is the comprehensive evaluation value, S represents the similarity of the hypertext dj included in the Web page d with the keyword Q, and parameter δ
is used to adjust the weights of PR(d) and S in the computation of R(d).
-
-
7. A computer system comprising:
-
a web crawler configured for downloading information from the Web, wherein the information comprises hypertext-based multilingual Web information; a storage device storing an index of the hypertext-based multilingual Web information; an analysis evaluation module configured to search for keywords in the index of the hypertext based multilingual Web information for keywords and for computing similarity of hypertext corresponding to the URL with respect to the keywords; an interface device for receiving the keywords input by a user; a processor configured for; searching for the keywords, for all hypertexts whose representing language is the same as a language representing the keywords and which matches the keywords to be queried; extracting hyperlinks related to an arbitrary language from all the searched hypertexts; conducting a comprehensive evaluation based on the computing results of hypertext information retrieving and the hyperlink analysis; ranking the extracted hyperlinks according to the correlativity of the hyperlinks with the keywords to be queried; and wherein the interface device is also for returning to the user ranked search results and the similarity of the hypertext with respect to the keyword to be queried is determined as follows; where dj represents an ith hypertext in a Web page d, dij represents a jth dimension of the ith hypertext, qi represents the jth dimension of a keyword Q, and S represents a hypertext similarity di with said keyword Q. - View Dependent Claims (8, 9, 10, 11)
where PR(v) refers to the rank of a Web page v, outlink(u) refers to the number of all the hyperlinks in a Web page u, n is the number of all Web pages, and ε
is an adjusting parameter.
-
-
11. The system of claim 9, wherein after conducting comprehensive evaluation according to the following computation, ranking is performed according to the comprehensive evaluation value:
-
where PR(v) refers to the rank of a Web page v, outlink(u) refers to the number of all the hyperlinks in a Web page u, n is the number of all Web pages, and ε
is an adjusting parameter;
R(d) is the comprehensive evaluation value, S represents the similarity of the hypertext di contained in the Web page d with the keyword Q, and parameter δ
is used to adjust the weights of PR(d) and S in the computation of R(d).
-
-
12. An article of manufacture comprising a physical computer usable storage medium having computer readable program code means embodied therein for causing a processor to search for hypertext based multilingual Web information for searching on a network for keywords to be queried, the computer readable program code means in said article of manufacture for causing a computer to perform the steps of:
-
receiving keywords input by a user via a searching interface; searching for the keywords, for all hypertexts having a representing language the same as a language representing the keywords and which matches the keywords; extracting hyperlinks related to an arbitrary language from all the searched hypertexts; transferring the keywords to an analysis evaluation module for computing similarity between the keywords and hypertext information in a hypertext database; using the analysis evaluation module for matching the keywords based on index data stored in a Web repository and performing an analysis of hyperlinks; conducting a comprehensive evaluation based on the computing results of hypertext information retrieving and the hyperlink analysis; ranking the extracted hyperlinks according to the correlativity of the hyperlinks with the keywords to be queried; and returning to the user ranked search results wherein; in the hyperlink ranking step, the hyperlink which is pointed to most includes the information comprising the most matches with the keyword in the native language hypertext searching step, Web pages are downloaded in advance from the Internet; data indexing is conducted for quickly searching for the hypertext matched with the keyword; and the hyperlink ranking step performs ranking on the hyperlinks according to hypertext similarity with respect to the keyword to be queried and hyperlink ranks; wherein the hypertext similarity with respect to the keyword is determined as follows; where di represents an ith hypertext in a Web page d, dij represents a jth dimension of the ith hypertext, qi represents the jth dimension of a keyword Q, and S represents a hypertext similarity di with the keyword Q; wherein the hyperlink ranking is determined through the following iterative computation; where PR(v) refers to the rank of a Web page v, outlink(u) refers to the number of all the hyperlinks in a Web page u, n is the number of all Web pages, and ε
is an adjusting parameter;wherein after conducting comprehensive evaluation according to the following computation, the hyperlink ranking is performed according to the comprehensive evaluation value; wherein PR(v) refers to the rank of a Web page v, outlink(u) refers to the number of all the hyperlinks in a Web page u, n is the number of all Web pages, and ε
is an adjusting parameter;wherein R(d) is the comprehensive evaluation value, S represents the similarity of the hypertext dj included in the Web page d with the keyword Q, and parameter δ
is used to adjust the weights of PR(d) and S in the computation of R(d).
-
Specification