Link analysis for enterprise environment
First Claim
1. A computer-implemented method of improving link scores in an enterprise search system, the method comprising:
- crawling one or more documents in an enterprise system;
storing a document host corresponding to each crawled document of the one or more documents;
identifying a set of links included in said each crawled document, wherein a link of the set of links is included in said each crawled document and points to a second document;
storing source host information in association with each link included in said each crawled document, wherein the source host is the same host as the document host corresponding to said each crawled document;
determining, using a processor operatively coupled with a memory, a link score for the one or more documents in the enterprise system, further comprising;
for a particular document of the one or more documents, identifying a set of incoming links, wherein an incoming link of the set of incoming links points to the particular document;
counting a number of incoming links that are not same host links;
wherein a same host link has an associated source host that is the same as the document host associated with the particular document;
determining a ratio of a first number of incoming links to a second number of incoming links, wherein the first number of incoming links is a number of incoming links that are not same host links, and the second number of incoming links is a total number of incoming links including same host links;
saving the link score in association with the particular document; and
in response to a query, returning sorted search results sorted based at least in part on the link scores stored in association with documents in the set of query search results.
1 Assignment
0 Petitions
Accused Products
Abstract
A flexible and extensible architecture allows for secure searching across an enterprise. Such an architecture can provide a simple Internet-like search experience to users searching secure content inside (and outside) the enterprise. The architecture allows for the crawling and searching of a variety or sources across an enterprise, regardless of whether any of these sources conform to a conventional user role model. The architecture further allows for security attributes to be submitted at query time, for example, in order to provide real-time secure access to enterprise resources. The user query also can be transformed to provide for dynamic querying that provides for a more current result list than can be obtained for static queries.
198 Citations
12 Claims
-
1. A computer-implemented method of improving link scores in an enterprise search system, the method comprising:
-
crawling one or more documents in an enterprise system; storing a document host corresponding to each crawled document of the one or more documents; identifying a set of links included in said each crawled document, wherein a link of the set of links is included in said each crawled document and points to a second document; storing source host information in association with each link included in said each crawled document, wherein the source host is the same host as the document host corresponding to said each crawled document; determining, using a processor operatively coupled with a memory, a link score for the one or more documents in the enterprise system, further comprising; for a particular document of the one or more documents, identifying a set of incoming links, wherein an incoming link of the set of incoming links points to the particular document; counting a number of incoming links that are not same host links; wherein a same host link has an associated source host that is the same as the document host associated with the particular document; determining a ratio of a first number of incoming links to a second number of incoming links, wherein the first number of incoming links is a number of incoming links that are not same host links, and the second number of incoming links is a total number of incoming links including same host links; saving the link score in association with the particular document; and in response to a query, returning sorted search results sorted based at least in part on the link scores stored in association with documents in the set of query search results. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer readable storage medium for improving link scores in an enterprise search system, comprising instructions, said instructions which when executed, cause one or more processors to perform:
-
crawling one or more documents in an enterprise system storing a document host corresponding to each crawled document of the one or more documents; identifying a set of links included in said each crawled document, wherein a link of the set of links is included in said each crawled document and points to a second document; storing source host information in association with each link included in said each crawled document, wherein the source host is the same host as the document host corresponding to said each crawled document; determining a link score for the one or more documents in the enterprise system, further comprising; for a particular document of the one or more documents, identifying a set of incoming links, wherein an incoming link of the set of incoming links points to the particular document; counting the number of incoming links that are not same host links; wherein a same host link has an associated source host that is the same as the document host associated with the particular document; determining a ratio of a first number of incoming links to a second number of incoming links, wherein the first number of incoming links is a number of incoming links that are not same host links, and the second number of incoming links is a total number of incoming links including same host links; saving the link score in association with the particular document; and in response to a query, returning sorted search results sorted based at least in part on the link scores stored in association with documents in the set of query search results. - View Dependent Claims (6, 7, 8)
-
-
9. A method of preparing a database table of crawled documents in an enterprise system for a keyword search, the method comprising:
crawling one or more documents in an enterprise system; storing a document host corresponding to each crawled document of the one or more documents; identifying a set of links included in said each crawled document, wherein a link of the set of links is included in said each crawled document and points to a second document; storing source host information in association with each link included in said each crawled document, wherein the source host is the same host as the document host corresponding to said each crawled document; determining, using a processor operatively coupled with a memory, a link score for the one or more documents in the enterprise system, further comprising; for a particular document of the one or more documents, identifying a set of incoming links, wherein an incoming link of the set of incoming links points to the particular document; counting a number of incoming links that are not same host links; wherein a same host link has an associated source host that is the same as the document host associated with the particular document; determining a ratio of a first number of incoming links to a second number of incoming links, wherein the first number of incoming links is a number of incoming links that are not same host links, and the second number of incoming links is a total number of incoming links including same host links; pushing the link score into the database table, wherein the link score is associated with the particular document; receiving a search query string from a user; and querying the database table with the search query string and a requested link score such that documents including the query string and associated with the requested link score are returned. - View Dependent Claims (10)
-
11. A non-transitory computer readable storage medium for improving link scores in an enterprise search system, comprising instructions, said instructions which when executed, cause one or more processors to perform:
-
crawling one or more documents in an enterprise system; storing a document host corresponding to each crawled document of the one or more documents identifying a set of links included in said each crawled document, wherein a link of the set of links is included in said each crawled document and points to a second document; storing source host information in association with each link included in said each crawled document, wherein the source host is the same host as the document host corresponding to said each crawled document; determining, using a processor operatively coupled with a memory, a link score for the one or more documents in the enterprise system, further comprising; for a particular document of the one or more documents, identifying a set of incoming links, wherein an incoming link of the set of incoming links points to the particular document; counting a number of incoming links that are not same host links; wherein a same host link has an associated source host that is the same as the document host associated with the particular document; determining a ratio of a first number of incoming links to a second number of incoming links, wherein the first number of incoming links is a number of incoming links that are not same host links, and the second number of incoming links is a total number of incoming links including same host links; pushing the link score into the database table, wherein the link score is associated with the particular document; receiving a search query string from a user; and querying the database table with the search query string and a requested link score such that documents including the query string and associated with the requested link score are returned. - View Dependent Claims (12)
-
Specification