System and method for facilitating evergreen discovery of digital information
First Claim
1. A computer-implemented system for facilitating evergreen discovery of digital information, comprising:
- a hierarchy of topics for topically-limited subject areas, each of the subject areas comprising pages of electronically-stored digital information maintained in a storage device;
a computer comprising a processor and memory within which code for execution by the processor is stored, comprising;
a user interface of the computer configured to select seed words that are characteristic of each of the topics and to designate training material from the digital information that corresponds to the respective subject area of each of the topics;
a topic modeler configured to form candidate topic models from the seed words, each candidate topic model comprising a pattern evaluable against the digital information;
a topic tester configured to test an ability of each of the candidate topic models to identify such digital information matching the candidate topic model'"'"'s topic by matching the pattern in the candidate topic model to the training material;
a topic rater configured to rate the respective abilities of the candidate topic models, comprising;
a performance rater configured to rank each candidate topic model'"'"'s performance in matching the training material correctly for the corresponding topic;
a simplicity rater configured to prefer those candidate topic models with simpler patterns over the patterns of other candidate topic models that correctly match the same training material; and
a bias rater configured to assign a bias to those candidate topic models that comprise terms also found in the corresponding topic;
a topic model selector configured to choose the candidate topic model for each topic that comprises the highest abilities with respect to the topic in performance, simplicity and bias; and
an index builder configured to form an evergreen index by pairing the chosen candidate topic model to each topic in the hierarchy.
6 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for facilitating evergreen discovery of digital information is provided. A hierarchy of topics for topically-limited subject areas is defined. Seed words characteristic of each topic are selected. Training material from the digital information that corresponds to the respective subject area of each of the topics is designated. Candidate topic models are formed from the seed words. Each candidate topic model includes a pattern evaluable against the digital information. An ability of each of the candidate topic models to identify such digital information matching the candidate topic model'"'"'s topic is tested by matching the pattern in the candidate topic model to the training material. The candidate topic model for each topic that includes the highest abilities with respect to the topic in performance, simplicity and bias is chosen. An evergreen index is formed by pairing the chosen candidate topic model to each topic in the hierarchy.
-
Citations
24 Claims
-
1. A computer-implemented system for facilitating evergreen discovery of digital information, comprising:
-
a hierarchy of topics for topically-limited subject areas, each of the subject areas comprising pages of electronically-stored digital information maintained in a storage device; a computer comprising a processor and memory within which code for execution by the processor is stored, comprising; a user interface of the computer configured to select seed words that are characteristic of each of the topics and to designate training material from the digital information that corresponds to the respective subject area of each of the topics; a topic modeler configured to form candidate topic models from the seed words, each candidate topic model comprising a pattern evaluable against the digital information; a topic tester configured to test an ability of each of the candidate topic models to identify such digital information matching the candidate topic model'"'"'s topic by matching the pattern in the candidate topic model to the training material; a topic rater configured to rate the respective abilities of the candidate topic models, comprising; a performance rater configured to rank each candidate topic model'"'"'s performance in matching the training material correctly for the corresponding topic; a simplicity rater configured to prefer those candidate topic models with simpler patterns over the patterns of other candidate topic models that correctly match the same training material; and a bias rater configured to assign a bias to those candidate topic models that comprise terms also found in the corresponding topic; a topic model selector configured to choose the candidate topic model for each topic that comprises the highest abilities with respect to the topic in performance, simplicity and bias; and an index builder configured to form an evergreen index by pairing the chosen candidate topic model to each topic in the hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for facilitating evergreen discovery of digital information, comprising the steps of:
-
defining a hierarchy of topics for topically-limited subject areas, each of the subject areas comprising pages of electronically-stored digital information; selecting seed words that are characteristic of each of the topics; designating training material from the digital information that corresponds to the respective subject area of each of the topics; forming candidate topic models from the seed words, each candidate topic model comprising a pattern evaluable against the digital information; testing an ability of each of the candidate topic models to identify such digital information matching the candidate topic model'"'"'s topic by matching the pattern in the candidate topic model to the training material; rating the respective abilities of the candidate topic models, comprising; ranking each candidate topic model'"'"'s performance in matching the training material correctly for the corresponding topic; preferring those candidate topic models with simpler patterns over the patterns of other candidate topic models that correctly match the same training material; and assigning a bias to those candidate topic models that comprise terms also found in the corresponding topic; choosing the candidate topic model for each topic that comprises the highest abilities with respect to the topic in performance, simplicity and bias; and forming an evergreen index by pairing the chosen candidate topic model to each topic in the hierarchy. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer-implemented apparatus for facilitating evergreen discovery of digital information, comprising:
-
means for defining a hierarchy of topics for topically-limited subject areas, each of the subject areas comprising pages of electronically-stored digital information; means for selecting seed words that are characteristic of each of the topics though a user interface of a computer; means for designating training material from the digital information that corresponds to the respective subject area of each of the topics; means for forming candidate topic models from the seed words, each candidate topic model comprising a pattern evaluable against the digital information; means for testing an ability of each of the candidate topic models to identify such digital information matching the candidate topic model'"'"'s topic by means for matching the pattern in the candidate topic model to the training material; means for rating the respective abilities of the candidate topic models, comprising; means for ranking each candidate topic model'"'"'s performance in matching the training material correctly for the corresponding topic; means for preferring those candidate topic models with simpler patterns over the patterns of other candidate topic models that correctly match the same training material; and means for assigning a bias to those candidate topic models that comprise terms also found in the corresponding topic; means for choosing the candidate topic model for each topic that comprises the highest abilities with respect to the topic in performance, simplicity and bias; and means for forming an evergreen index by means for pairing the chosen candidate topic model to each topic in the hierarchy.
-
Specification