Business lines
First Claim
1. A method, implemented by an electronic device, of operating operates a search engine to identify documents on a network based on content relevancy and identifying business lines of a company, the method comprising:
- executing a web crawler to search for and retrieve documents on the network;
storing in an electronic memory a plurality of company data structures for a plurality of companies, each company data structure storing patterns of document elements in documents retrieved including anchor word sets and other word sets with a context of each anchor word in an anchor word set parsing a document to identify words in the document;
identifying anchor words in the document;
if a predetermined number of anchor words are present in the document perform an evaluation process a. through h.;
a. comparing, by the electronic device, a set of documents from a plurality of resources with a first set of content relevance models that define relevance of the documents to different companies and a second set of content relevance models that define relevance of the documents to different business lines, wherein each content relevance model includes (i) data that is used to identify documents related to a business line or a company that the model represents, (ii) the patterns of document elements associated with scores, and (iii) parameters used in the analysis of documents by the model;
b. accessing the first and second patterns and based on the patterns and calculating a content relevance score as an arithmetic function of the patterns and parameters of the content relevance models, wherein the content relevant score represents at least a number of anchor words in each document related to one or more of the business lines and one or more of the companies;
c. when a particular document in the set of documents satisfies a particular content relevance score of a particular content relevance model, in the first set of content relevance models, associated with a particular company, associating the particular company with the particular document by storing an identifier of the particular company in a data structure for the document;
d. when a particular document in the set of documents satisfies a particular content relevance score of a particular content relevance model, in the second set of content relevance models, associated with a particular business line, associating the particular business line with the particular document by storing an identifier of the particular business line in a data structure for the document;
e. determining a first threshold number and a second threshold number, wherein (i) the first threshold number is dependent on the first business line, (ii) the second threshold number is dependent on the second business line, and (iii) the first threshold number is different from the second threshold number;
f. when more than the first threshold number of documents are associated with a first company and a first business line, specifying the first business line as a business line of the first company by storing an identifier of the first business line in a data structure for the first company;
g. when more than the second threshold number of documents are associated with a second company and a second business line, specifying the second business line as a business line of the second company by storing an identifier of the second business line in a data structure for the second company; and
h. upon receiving a request for the first company, accessing, searching the data structure for the identifiers, and displaying a set of data for a business line associated with the first company based on the stored identifiers in the data structure for the first company;
if a predetermined number of anchor words are not present in the document, do not perform the evaluation process a. through h.
5 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide a method for identifying business lines of a company. The method classifies several documents as relevant to several different business lines and several different companies. For a particular company and particular business line, the method identifies a number of documents classified as relevant to both the particular company and the particular business line. When the identified number of documents exceeds a particular threshold, the method associates the particular business line as a business line of the particular company. In some embodiments, the method calculates a score for each business line in the set. The score for a particular business line represents the importance of the particular business line to the particular company. The method sorts the business lines in the set based on the calculated scores.
111 Citations
19 Claims
-
1. A method, implemented by an electronic device, of operating operates a search engine to identify documents on a network based on content relevancy and identifying business lines of a company, the method comprising:
-
executing a web crawler to search for and retrieve documents on the network; storing in an electronic memory a plurality of company data structures for a plurality of companies, each company data structure storing patterns of document elements in documents retrieved including anchor word sets and other word sets with a context of each anchor word in an anchor word set parsing a document to identify words in the document; identifying anchor words in the document; if a predetermined number of anchor words are present in the document perform an evaluation process a. through h.; a. comparing, by the electronic device, a set of documents from a plurality of resources with a first set of content relevance models that define relevance of the documents to different companies and a second set of content relevance models that define relevance of the documents to different business lines, wherein each content relevance model includes (i) data that is used to identify documents related to a business line or a company that the model represents, (ii) the patterns of document elements associated with scores, and (iii) parameters used in the analysis of documents by the model; b. accessing the first and second patterns and based on the patterns and calculating a content relevance score as an arithmetic function of the patterns and parameters of the content relevance models, wherein the content relevant score represents at least a number of anchor words in each document related to one or more of the business lines and one or more of the companies; c. when a particular document in the set of documents satisfies a particular content relevance score of a particular content relevance model, in the first set of content relevance models, associated with a particular company, associating the particular company with the particular document by storing an identifier of the particular company in a data structure for the document; d. when a particular document in the set of documents satisfies a particular content relevance score of a particular content relevance model, in the second set of content relevance models, associated with a particular business line, associating the particular business line with the particular document by storing an identifier of the particular business line in a data structure for the document; e. determining a first threshold number and a second threshold number, wherein (i) the first threshold number is dependent on the first business line, (ii) the second threshold number is dependent on the second business line, and (iii) the first threshold number is different from the second threshold number; f. when more than the first threshold number of documents are associated with a first company and a first business line, specifying the first business line as a business line of the first company by storing an identifier of the first business line in a data structure for the first company; g. when more than the second threshold number of documents are associated with a second company and a second business line, specifying the second business line as a business line of the second company by storing an identifier of the second business line in a data structure for the second company; and h. upon receiving a request for the first company, accessing, searching the data structure for the identifiers, and displaying a set of data for a business line associated with the first company based on the stored identifiers in the data structure for the first company; if a predetermined number of anchor words are not present in the document, do not perform the evaluation process a. through h. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory machine readable medium storing a program which when executed by at least one processing unit operates a search engine to identify documents on a network based on content relevancy identifies business lines of a company, the program comprising sets of instructions for:
-
executing a web crawler to search for and retrieve documents on the network; storing in an electronic memory a plurality of company data structures for a plurality of companies, each company data structure storing patterns of document elements in documents retrieved including anchor word sets and other word sets with a context of each anchor word in an anchor word set parsing a document to identify words in the document; identifying anchor words in the document; if a predetermined number of anchor words are present in the document perform an evaluation process a. through h; a. comparing a set of documents from a plurality of resources with a first set of content relevance models that define relevance of the documents to different companies and a second set of content relevance models that define relevance of the documents to different business lines, wherein each content relevance model includes (i) data that is used to identify documents related to a business line or a company that the model represents, (ii) the patterns of document elements associated with scores, and (iii) parameters used in the analysis of documents by the model; b. accessing the first and second patterns and based on the patterns and calculating a content relevance score as an arithmetic function of the patterns and parameters of the content relevance models, wherein the content relevant score represents at least a number of anchor words in each document related to one or more of the business lines and one or more of the companies; c. when a particular document in the set of documents satisfies a particular content relevance model, in the first set of content relevance models, associated with a particular company, associating the particular company with the particular document by storing an identifier of the particular company in a data structure for the document; d. when a particular document in the set of documents satisfies a particular content relevance model, in the second set of content relevance models, associated with a particular business line, associating the particular business line with the particular document by storing an identifier of the particular business line in a data structure for the document; e. determining a first threshold number and a second threshold number, wherein (i) the first threshold number is dependent on the first business line, (ii) the second threshold number is dependent on the second business line, and (iii) the first threshold number is different from the second threshold number; f. when more than the first threshold number of documents are associated with a first company and a first business line, specifying the first business line as a business line of the first company by storing an identifier of the first business line in a data structure for the first company; g. when more than the second threshold number of documents are associated with a second company and a second business line, specifying the second business line as a business line of the second company by storing an identifier of the second business line in a data structure for the second company; and h. upon receiving a request for the first company, accessing, searching the data structure for the identifiers, and displaying a set of data for a business line associated with the first company based on the stored identifiers in the data structure for the first company; if a predetermined number of anchor words are not present in the document, do not perform the evaluation process a. through h. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
Specification