Personalization engine for classifying unstructured documents
First Claim
1. A computer-implemented method for classifying an electronic document, the method comprising:
- analyzing, with a computing device, author-generated classification information regarding a document and assigning a set of first taxonomic nouns to characterize the document based upon the author-generated classification information;
examining, with a computing device, a user-generated tag from a client computer characterizing a portion of the document and assigning a set of second taxonomic nouns to characterize the document based upon the user-generated tag characterization;
identifying, with a computing device, a method of access through which the document has been accessed from a content provider and assigning a set of third taxonomic nouns to characterize the document based upon the method of access;
evaluating, with a computing device, attributes related to the method of access and assigning a set of fourth taxonomic nouns to characterize the document based upon the attributes related to the method of access;
processing, with a computing device, the document to extract a set of fifth taxonomic nouns to characterize the document based upon a predetermined pattern rule;
aggregating, with a computing device, the taxonomic nouns to determine at least one term vector that represents the document; and
categorizing, with a computing device, the document based upon the taxonomic nouns, the author-generated classification information, and at least one of the term vectors.
2 Assignments
0 Petitions
Accused Products
Abstract
Unstructured electronic documents are classified for profiling and targeting users for additional relevant content. Behavioral data is gathered from user activity, and user documents and actions are categorized. Profile information is combined with collaborative and editorial data to provide users with credible information regarding products. Author-generated document classification information is analyzed and assigned a first taxonomic noun to characterize the document. User-generated tags characterizing a portion of the document are assigned a second taxonomic noun. Search terms that resulted in the user accessing the document are identified and assigned a third taxonomic noun. Attributes related to how the document was accessed are evaluated and assigned a fourth taxonomic noun. The document is processed using pattern rules to extract a fifth taxonomic noun. The taxonomic nouns are aggregated to determine term vectors representing the document, and the document is categorized using the term vectors, the taxonomic nouns, or the author-generated classification.
-
Citations
37 Claims
-
1. A computer-implemented method for classifying an electronic document, the method comprising:
-
analyzing, with a computing device, author-generated classification information regarding a document and assigning a set of first taxonomic nouns to characterize the document based upon the author-generated classification information; examining, with a computing device, a user-generated tag from a client computer characterizing a portion of the document and assigning a set of second taxonomic nouns to characterize the document based upon the user-generated tag characterization; identifying, with a computing device, a method of access through which the document has been accessed from a content provider and assigning a set of third taxonomic nouns to characterize the document based upon the method of access; evaluating, with a computing device, attributes related to the method of access and assigning a set of fourth taxonomic nouns to characterize the document based upon the attributes related to the method of access; processing, with a computing device, the document to extract a set of fifth taxonomic nouns to characterize the document based upon a predetermined pattern rule; aggregating, with a computing device, the taxonomic nouns to determine at least one term vector that represents the document; and categorizing, with a computing device, the document based upon the taxonomic nouns, the author-generated classification information, and at least one of the term vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for classifying an electronic document, the system comprising:
-
a computing device configured for analyzing author-generated classification information regarding a document and assigning a set of first taxonomic nouns to characterize the document based upon the author-generated classification information; a computing device configured for examining a user-generated tag from a client computer characterizing a portion of the document and assigning a set of second taxonomic nouns to characterize the document based upon the user-generated tag characterization; a computing device configured for identifying a method of access through which the document has been accessed and assigning a set of third taxonomic nouns to characterize the document based upon the method of access;
a computing device configured for evaluating attributes related to the method of access and assigning a set of fourth taxonomic nouns to characterize the document based upon the attributes related to the method of access;a computing device configured for processing the document to extract a set of fifth taxonomic nouns to characterize the document based upon a predetermined pattern rule; a computing device configured for aggregating the taxonomic nouns to determine term vectors that represent the document; and a computing device configured for categorizing the document based upon the taxonomic nouns, the author-generated classification information, and at least one of the term vectors. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer program product with instructions recorded on a non-transitory computer readable storage medium, which, when executed by a processor, cause the processor to carry out a method for classifying an electronic document, the computer program product comprising:
-
instructions for analyzing, with a computing device, author-generated classification information regarding a document and assigning a set of first taxonomic nouns to characterize the document based upon the author-generated classification information; instructions for examining, with a computing device, a user-generated tag from a client computer characterizing a portion of the document and assigning a set of second taxonomic nouns to characterize the document based upon the user-generated tag characterization; instructions for identifying, with a computing device, a method of access through which the document has been accessed from a content provider and assigning at set of third taxonomic nouns to characterize the document based upon the method of access; instructions for evaluating, with a computing device, attributes related to the method of access and assigning a set of fourth taxonomic nouns to characterize the document based upon the attributes related to the method of access; instructions for processing, with a computing device, the document to extract a set of fifth taxonomic nouns to characterize the document based upon a predetermined pattern rule; instructions for aggregating, with a computing device, the taxonomic nouns to determine at least one term vector that represents the document; and instructions for categorizing, with a computing device, the document based upon the taxonomic nouns, the author-generated classification information, and at least one of the term vectors. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
Specification