Automatic expert identification, ranking and literature search based on authorship in large document collections
First Claim
1. A computer-implemented method for an author-centric search, comprising:
- initializing a first data structure and a second data structure for each of a plurality of documented communications wherein each of the plurality of documented communications has at least one author to which the respective documented communication is attributed;
utilizing the first data structure and the second data structure to compute a relevancy score for each of the plurality of documented communications;
determining a score for an author of at least one of the plurality of documented communications based in part on the relevancy score for each of the plurality of documented communications authored by the author;
prompting a user to enter a search string;
parsing the search string into one or more words;
populating at least one memory space of the first data structure for each documented communication with data based on the occurrence of the one or more words in the documented communication;
populating at least one memory space of the second data structure for each documented communication with a weighted value for an author of a given documented communication that signifies a statistical preference for the data in the corresponding memory space of the first data structure;
executing a mathematical function based on an aggregate of the data and the weighted value of the first and second data structures for each documented communication in order to compute the relevancy score for the documented communication; and
displaying search results based at least in part upon a ranked listing of one or more author scores, wherein the weighted value for the author comprises a predefined value utilized to create the statistical preference for data in the corresponding memory space of the first data structure, the weighted value for the author being determined based on at least two of;
a time of publication for the documented communication, a number of documented communications having the author, a prestige of the documented communication, and a number of authors for the documented communication.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is an author-centric search that facilitates identifying a source commonly associated with a topic by, for example, providing a ranked listing of experts in a field of knowledge related to a search phrase. The search phrase can be captured and parsed into the individual words (e.g., substrings) of the search phrase. Based on occurrences of the words in one or more documented communications, statistics can be generated to determine the relevancy of each documented communication in relation to the search phrase. Further, additional statistics can be generated describing the occurrence of multiple words in a documented communication and/or a distance of words between the search phrase words in a documented communication. The statistics can be utilized to generate expert scores. The expert scores can be sorted for and/or displayed to the user.
-
Citations
15 Claims
-
1. A computer-implemented method for an author-centric search, comprising:
-
initializing a first data structure and a second data structure for each of a plurality of documented communications wherein each of the plurality of documented communications has at least one author to which the respective documented communication is attributed; utilizing the first data structure and the second data structure to compute a relevancy score for each of the plurality of documented communications; determining a score for an author of at least one of the plurality of documented communications based in part on the relevancy score for each of the plurality of documented communications authored by the author; prompting a user to enter a search string; parsing the search string into one or more words; populating at least one memory space of the first data structure for each documented communication with data based on the occurrence of the one or more words in the documented communication; populating at least one memory space of the second data structure for each documented communication with a weighted value for an author of a given documented communication that signifies a statistical preference for the data in the corresponding memory space of the first data structure; executing a mathematical function based on an aggregate of the data and the weighted value of the first and second data structures for each documented communication in order to compute the relevancy score for the documented communication; and displaying search results based at least in part upon a ranked listing of one or more author scores, wherein the weighted value for the author comprises a predefined value utilized to create the statistical preference for data in the corresponding memory space of the first data structure, the weighted value for the author being determined based on at least two of;
a time of publication for the documented communication, a number of documented communications having the author, a prestige of the documented communication, and a number of authors for the documented communication. - View Dependent Claims (2, 3, 4, 5, 6, 7, 14)
-
-
8. A system for an author-centric search, the system including a processor and memory, the system comprising:
-
means for initializing a first data structure and a second data structure for each of a plurality of documented communications wherein each of the plurality documented communications has at least one author to which the respective documented communication is attributed; means for utilizing the first data structure and the second data structure to compute a relevancy score for each of the plurality of documented communications; means for determining a score for each author of at least one of the plurality of documented communications based in part on the relevancy score for each of the plurality of documented communications authored by each respective author; means for prompting a user to enter a search string; means for parsing the search string into one or more words; means for filling at least one memory space of the first data structure for each documented communication with data based on the occurrence of the one or more words in the documented communication; means for filling at least one memory space of the second data structure for each documented communication with a weighted value for an author of a given documented communication that signifies a statistical preference for the data in the corresponding memory space of the first data structure; and means for executing a mathematical function based on an aggregate of the data and the weighted value of the first and second data structures for each documented communication in order to compute the relevancy score for the documented communication; and means for displaying search results based at least in part upon a ranked listing of the score determined for each of the authors, wherein the weighted value for an author comprises a predefined value utilized to create the statistical preference for data in the corresponding memory space of the first data structure, the weighted value for the author being determined based on at least two of;
a time of publication for the documented communication, a number of documented communications having the given author, a prestige of the documented communications, and a number of authors for the documented communication. - View Dependent Claims (9, 10, 11, 12, 13, 15)
-
Specification