System and method for disambiguating entities in a web page search
First Claim
1. A method of disambiguating entities in a computerized web search, said method comprising:
- identifying a set of potential meanings for an entity, wherein said entity comprises any of a word and phrase;
retrieving at least one retrieved web page comprising descriptions referencing said entity;
establishing a base web page comprising a selected meaning of said potential meanings for said entity;
attributing dimensions of a vector space attributed to domains in said retrieved web page;
computing a probability of similarity between said descriptions in said retrieved web page and said entity in said base web page,wherein said computing of said probability of similarity comprises corresponding a similarity measure between said dimensions of said vector space attributed to domains in said retrieved web page and a likelihood of said retrieved web page referring to said entity in said base web page; and
reporting said probability of said similarity to a user.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method of disambiguating entities in a computerized web search includes identifying a set of potential meanings for an entity; retrieving at least one web page having descriptions referencing the entity; establishing a base web page having a selected context for the entity; attributing dimensions of a vector space attributed to domains in the retrieved web page; and computing a probability of similarity between the referenced entity in the retrieved web page and the entity in the base web page. The method includes corresponding a similarity measure between the dimensions of the vector space attributed to domains in the retrieved web page and a likelihood of the retrieved web page referring to the entity in the base web page. The method further includes ranking web pages based on the computed probability of similarity.
-
Citations
14 Claims
-
1. A method of disambiguating entities in a computerized web search, said method comprising:
-
identifying a set of potential meanings for an entity, wherein said entity comprises any of a word and phrase; retrieving at least one retrieved web page comprising descriptions referencing said entity; establishing a base web page comprising a selected meaning of said potential meanings for said entity; attributing dimensions of a vector space attributed to domains in said retrieved web page; computing a probability of similarity between said descriptions in said retrieved web page and said entity in said base web page, wherein said computing of said probability of similarity comprises corresponding a similarity measure between said dimensions of said vector space attributed to domains in said retrieved web page and a likelihood of said retrieved web page referring to said entity in said base web page; and reporting said probability of said similarity to a user. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A program storage device readable by computer, tangibly embodying a program of instructions executable by said computer to perform a method of disambiguating entities in a computerized web search, said method comprising:
-
identifying a set of potential meanings for an entity, wherein said entity comprises any of a word and phrase; retrieving at least one retrieved web page comprising descriptions referencing said entity; establishing a base web page comprising a selected meaning of said potential meanings for said entity; attributing dimensions of a vector space attributed to domains in said retrieved web page; computing a probability of similarity between said descriptions in said retrieved web page and said entity in said base web page, wherein said computing of said probability of similarity comprises corresponding a similarity measure between said dimensions of said vector space attributed to domains in said retrieved web page and a likelihood of said retrieved web page referring to said entity in said base web page; and reporting said probability of said similarity to a user. - View Dependent Claims (7, 8, 9)
-
-
10. A system for disambiguating entities in a computerized web search, said system comprising:
-
a user interface adapted to identify a set of potential meanings for an entity, wherein said entity comprises any of a word and phrase; a search engine connected to said user interface and adapted to retrieve at least one retrieved web page comprising descriptions referencing said entity; a processor adapted to; establish a base web page comprising a selected meaning of said potential meanings for said entity; attribute dimensions of a vector space attributed to domains in said retrieved web page; and compute a probability of similarity between said descriptions in said retrieved web page and said entity in said base web page, wherein said computing of said probability of similarity comprises corresponding a similarity measure between said dimensions of said vector space attributed to domains in said retrieved web page and a likelihood of said retrieved web page referring to said entity in said base web page; and a display adapted to report said probability of said similarity to a user. - View Dependent Claims (11, 12, 13, 14)
-
Specification