System and method for semantic analysis of intelligent device discovery
First Claim
1. A method of analyzing electronic documents in an intranet, wherein the intranet comprises a plurality of web sites, the method comprising:
- crawling HTML content and text content in a set of the sites, wherein the crawling compriseswriting each crawled content to a file-based storage system,annotating each crawled content, andgenerating a first 4-tuple entry for each annotated, crawled content;
deep-scanning non-HTML content and non-text content in the set of sites, wherein the deep-scanning comprises fetching all static non-HTML content and non-text content in the set of sites and generating a second 4-tuple entry;
reverse-scanning the set of sites;
performing a semantic analysis of the crawled content and the deep-scanned content by using the first 4-tuple entry and the second 4-tuple entry;
correlating the results of the semantic analysis with the results of the reverse-scanning; and
comparing user navigation patterns and content from members of the set of sites.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method, system, and service of analyzing electronic documents in an intranet, where the intranet includes a plurality of web sites. In an exemplary embodiment, the method, system, and service include (1) crawling HTML content and text content in a set of the sites, (2) deep-scanning non-HTML content and non-text content in the set of sites, (3) reverse-scanning the set of sites, (4) performing a semantic analysis of the crawled content and the deep-scanned content, (5) correlating the results of the semantic analysis with the results of the reverse-scanning, and (6) comparing user navigation patterns and content from the members of the set of sites. In a further embodiment, the method, system, and service further include combining the results of the performing, the results of the correlating, and the results of the comparing.
-
Citations
22 Claims
-
1. A method of analyzing electronic documents in an intranet, wherein the intranet comprises a plurality of web sites, the method comprising:
-
crawling HTML content and text content in a set of the sites, wherein the crawling comprises writing each crawled content to a file-based storage system, annotating each crawled content, and generating a first 4-tuple entry for each annotated, crawled content; deep-scanning non-HTML content and non-text content in the set of sites, wherein the deep-scanning comprises fetching all static non-HTML content and non-text content in the set of sites and generating a second 4-tuple entry; reverse-scanning the set of sites; performing a semantic analysis of the crawled content and the deep-scanned content by using the first 4-tuple entry and the second 4-tuple entry; correlating the results of the semantic analysis with the results of the reverse-scanning; and comparing user navigation patterns and content from members of the set of sites. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of providing a service to analyze electronic documents in an intranet, wherein the intranet comprises a plurality of web sites, the method comprising:
-
crawling HTML content and text content in a set of the sites, wherein the crawling comprises writing each crawled content to a file-based storage system, annotating each crawled content, and generating a first 4-tuple entry for each annotated, crawled content; deep-scanning non-HTML content and non-text content in the set of sites, wherein the deep-scanning comprises fetching all static non-HTML content and non-text content in the set of sites and generating a second 4-tuple entry; reverse-scanning the set of sites; performing a semantic analysis of the crawled content and the deep-scanned content by using the first 4-tuple entry and the second 4-tuple entry; correlating the results of the semantic analysis with the results of the reverse-scanning; and comparing user navigation patterns and content from members of the set of sites. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of analyzing electronic documents in an intranet, wherein the intranet comprises a plurality of web sites, the method comprising:
-
crawling HTML content and text content in a set of the sites, wherein the crawling comprises writing each crawled content to a file-based storage system, annotating each crawled content, and generating a first 4-tuple entry for each annotated, crawled content; deep-scanning non-HTML content and non-text content in the set of sites, wherein the deep-scanning comprises fetching all static non-HTML content and non-text content in the set of sites and generating a second 4-tuple entry; reverse-scanning the set of sites; performing a semantic analysis of the crawled content and the deep-scanned content by using the first 4-tuple entry and the second 4-tuple entry; correlating the results of the semantic analysis with the results of the reverse-scanning; comparing user navigation patterns and content from members of the set of sites; and combining the results of the performing, the results of the correlating, and the results of the comparing. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification