METHODS FOR IDENTIFYING DOCUMENTS RELATING TO A MARKET
First Claim
1. A method of identifying web documents pertaining a market domain, the method comprising:
- classifying a first group of web documents as relating to a specified market domain;
identifying a set of brand term combinations, where each combination includes a first brand term and a second, different brand term occurring within the first group of web documents;
establishing a term correlation between the first term and the second term for each combination based on usage of the first and the second terms within the first group of web documents;
assigning a similarity score to each combination as a function of the term correlation;
searching for a second group of web documents unrelated to the specified market domain using at least some of combinations; and
presenting the second group of web documents via a computer interface according the similarity scores of the brand term combinations.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods of identifying web documents as relating to a market domain that would ordinarily be considered unrelated are presented. Market domain criteria can be defined that provide for classifying web documents as being related to the domain. The documents classified as related to the market domain form a training sample of documents used to establish correlations among brand term combinations found within the documents. If correlations are established among the terms in a combination, the term combinations can be assigned a similarity score indicating how similar the terms are considered to be. The term combinations can be used to search for additional web documents that could pertain to the market domain but would otherwise fail to satisfy the market domain criteria. The search results can be presented via a computer interface according to similarity scores.
29 Citations
17 Claims
-
1. A method of identifying web documents pertaining a market domain, the method comprising:
-
classifying a first group of web documents as relating to a specified market domain; identifying a set of brand term combinations, where each combination includes a first brand term and a second, different brand term occurring within the first group of web documents; establishing a term correlation between the first term and the second term for each combination based on usage of the first and the second terms within the first group of web documents; assigning a similarity score to each combination as a function of the term correlation; searching for a second group of web documents unrelated to the specified market domain using at least some of combinations; and presenting the second group of web documents via a computer interface according the similarity scores of the brand term combinations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
Specification