METHODOLOGIES AND ANALYTICS TOOLS FOR LOCATING EXPERTS WITH SPECIFIC SETS OF EXPERTISE
1 Assignment
0 Petitions
Accused Products
Abstract
A method and analytics tools for locating experts with specific sets of expertise are disclosed, the method including providing a collection of documents P0; generating categories representing fields of expertise derived from the collection of documents P0; refining the taxonomy of the categories by applying user domain knowledge; extracting structured fields from the collection of documents P0; constructing a contingency table having a first axis defined by the extracted structured fields and a second axis defined by the categories; and using the contingency table to identify a set of experts having a related expertise. The method may also include a network graph analysis that aids visualization of the relationship between people and expertise.
9 Citations
22 Claims
-
1-20. -20. (canceled)
-
21. A method for use with a set of seed documents P0 extracted from a data warehouse, the method comprising:
-
searching the data warehouse to provide a set of additional documents P1 similar to documents of P0, wherein said similarity is determined using a query search; generating an initial taxonomy for a combined document set P0+P1 that includes all documents from both the set of seed documents P0 and the set of additional documents P1, wherein said generating of the initial taxonomy includes an analysis using words, bags of words, phrases, structured and unstructured features; classifying the combined document set P0+P1 using terms obtained from structured fields, wherein the structured fields are extracted from the combined document set P0+P1 wherein at least one structured field includes the names of people and is formed using a name annotator to extract names from documents of the combined document set P0+P1; generating a contingency table that compares categories of a refined taxonomy to the structured fields in cells and assigns a degree of significance to the comparisons in each cell; examining the contingency table to find a relationship between the categories of the refined taxonomy and the names of people wherein said relationship is plotted according to the degree of significance of respective cells to determine who is related to which document categories, hence who has what expertise; overlaying trending information on top of a document taxonomy and contingency table; using the contingency table to identify recent and most related expertise taxonomies and people; iterating the processes of extracting, searching, and generating using domain knowledge to produce the refined taxonomy from the initial taxonomy; comparing one name classification against another name classification in the refined taxonomy; and identifying people with similar expertise by examining highly related people names using network graph analysis or other relationships in the structured and unstructured fields.
-
-
22. A computer program product comprising a computer useable medium including a computer readable program, wherein the computer readable program when executed to perform a query search for a predetermined document category on a computer causes the computer to:
-
access a corpus of documents stored within a data warehouse; extract a set of documents from the corpus of documents including structured fields wherein at least one structured field further includes authors of the set of documents; generate a document taxonomy from the set of documents by analyzing words, bags of words, and phrases in the structured fields to identify and classify related documents from the set of documents according to classifications including categories of expertise and the authors of the set of documents; construct a contingency table of cells having an actual value and having a first axis defined by the authors of the set of documents and a second axis defined by the categories of expertise; calculate an expected value for each cell based on a size of a selected category of expertise and a total number of documents associated with a selected author; determine a degree significance for each cell based on a comparison of the actual value and the expected value for each cell, and code the cell according the degree of significance; plot a relationship between the categories of expertise and the authors of the set of documents based on the degree of significance found for each cell of the contingency table as a network graph using lines of varying thickness to signify a strength of the relationship; perform a first network graph analysis to find a set of names from the authors of the set of documents that is most related to a set of the categories of expertise; perform a second network graph analysis to find a co-authoring relationship from the found set of names that is most related to the set of the categories of expertise; overlay trend information on top of the document taxonomy and the contingency table; and identify people of similar expertise by analyzing the contingency table, document taxonomy, first and second graph analyses, and trend information.
-
Specification