Web object retrieval based on a language model
First Claim
1. A computing system with a processor and a memory for determining relevance of an object to a target term, the object being a product, comprising:
- a component that retrieves a plurality of web pages containing a description of the object, each description including terms used to describe the object;
a component that generates a collection of records of terms relating to the object, the collection including a record generated from each of the plurality of retrieved web pages, each record including a plurality of terms, each record of the collection being generated by extracting from a retrieved web page terms used to describe the object, such that the terms extracted from that retrieved web page compose the record of terms for that retrieved web page;
a component that, for each record of the collection of records of terms relating to the object,determines a language model probability for that record generating the target term, the language model probability for that record generating the target term being a weighted summation of a number of occurrences of the target term in the record divided by a number of occurrences of all terms in the record and a number of occurrences of the target term in the collection of the records divided by a number of occurrences of all terms in the collection of the records as represented by the following equation;
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system is provided for determining relevance of an object to a term based on a language model. The relevance system provides records extracted from web pages that relate to the object. To determine the relevance of the object to a term, the relevance system first determines, for each record of the object, a probability of generating that term using a language model of the record of that object. The relevance system then calculates the relevance of the object to the term by combining the probabilities. The relevance system may also weight the probabilities based on the accuracy or reliability of the extracted information for each data source.
67 Citations
11 Claims
-
1. A computing system with a processor and a memory for determining relevance of an object to a target term, the object being a product, comprising:
-
a component that retrieves a plurality of web pages containing a description of the object, each description including terms used to describe the object; a component that generates a collection of records of terms relating to the object, the collection including a record generated from each of the plurality of retrieved web pages, each record including a plurality of terms, each record of the collection being generated by extracting from a retrieved web page terms used to describe the object, such that the terms extracted from that retrieved web page compose the record of terms for that retrieved web page; a component that, for each record of the collection of records of terms relating to the object, determines a language model probability for that record generating the target term, the language model probability for that record generating the target term being a weighted summation of a number of occurrences of the target term in the record divided by a number of occurrences of all terms in the record and a number of occurrences of the target term in the collection of the records divided by a number of occurrences of all terms in the collection of the records as represented by the following equation; - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
Specification