METHOD AND SYSTEM FOR DISAMBIGUATING INFORMATIONAL OBJECTS
First Claim
1. A content management system in communication with one or more publications databases, each comprising a plurality of publications, and with a plurality of remote users, the content management system comprising:
- a disambiguation computer;
a disambiguation database operatively connected to the disambiguation computer and adapted to receive and store for processing by the disambiguation computer at least a first set of information derived from one or more publications databases each comprising a plurality of publications with each publication having at least one cited reference and one or more authorships;
an authorship similarity routine executing on the disambiguation computer and adapted to process at least some of the first set of electronic information based on cited reference data from the plurality of publications to determine a degree of authorship similarity;
a linking routine executing on the disambiguation computer and adapted to link authorships based on the degree of authorship similarity; and
a clustering routine executing on the disambiguation computer and adapted to cluster two or more linked authorships to form a first cluster and adapted to form a first author entity associated with the first cluster, whereby the clustering routine is executed to produce an authority database of authors operatively stored on the disambiguation database and comprised of a plurality of unique author entities each associated with a unique actual author and a cluster.
16 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a Distinct Author Identification System (“DAIS”) for disambiguating data to discern author entities and link or associate authorships with such author entities. The invention provides powerful disambiguation processes applied across one or more databases to yield a disambiguated authority database of authors. An entire database of publications may be processed by the DAIS to group/link authorships and to identify author entities. The author entities may then be matched or associated with actual authors to establish an authority database of authors. After initial evaluation, the DAIS may be used to reevaluate some or all of the database(s) and/or the authority database established by the DAIS may be used to add or update information. DAIS may use “hierarchical clustering” to link authorships and identify authors based on authorship similarity. DAIS evaluates the likelihood that authorships are from the same author.
51 Citations
57 Claims
-
1. A content management system in communication with one or more publications databases, each comprising a plurality of publications, and with a plurality of remote users, the content management system comprising:
-
a disambiguation computer; a disambiguation database operatively connected to the disambiguation computer and adapted to receive and store for processing by the disambiguation computer at least a first set of information derived from one or more publications databases each comprising a plurality of publications with each publication having at least one cited reference and one or more authorships; an authorship similarity routine executing on the disambiguation computer and adapted to process at least some of the first set of electronic information based on cited reference data from the plurality of publications to determine a degree of authorship similarity; a linking routine executing on the disambiguation computer and adapted to link authorships based on the degree of authorship similarity; and a clustering routine executing on the disambiguation computer and adapted to cluster two or more linked authorships to form a first cluster and adapted to form a first author entity associated with the first cluster, whereby the clustering routine is executed to produce an authority database of authors operatively stored on the disambiguation database and comprised of a plurality of unique author entities each associated with a unique actual author and a cluster. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method comprising:
-
a. receiving a set of electronic information associated with a set of publications, each publication in the set of publications comprising at least one cited reference and having at least one authorship; b. comparing at least a portion of the set of electronic information with authorship data contained in an authority database, the authorship data related to authorship entities represented in the authority database; and c. associating the set of electronic information with one or more authorship entities. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer-implemented method comprising:
-
a. presenting data representing a set of publications to a user; b. providing a user interface for allowing a user to input a selection related to authorship of one or more of the set of publications; and c. updating an authority database to reflect an association of a unique author with the selection related to authorship of one or more of the set of publications. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer-based system comprising:
-
a computer adapted to process a set of electronic information associated with a set of publications, each publication in the set of publications comprising at least one cited reference and having at least one authorship; software executing on the computer and adapted to; a. receive a set of electronic information associated with a set of publications; b. compare at least a portion of the set of electronic information with authorship data contained in an authority database, the authorship data related to authorship entities represented in the authority database; and c. associate the set of electronic information with one or more authorship entities. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
48. A computer implemented method for maintaining an authority database of authors, the method comprising:
-
a. receiving from a user data representing a user-defined set of publications each having at least one authorship and related to a unique author; b. recognizing the received data as being associated with a researcher identifier; c. using an authority database, verifying the received data to render a threshold confirmation of correctness in association of the set of publications with the unique author; d. doing one or the other of
1) matching the unique author with an existing unique author profile record stored by the authority database;
or
2) creating a new unique author profile record and storing the new unique author profile record by the authority database. - View Dependent Claims (49, 50, 51, 52)
-
-
53. A computer-based system comprising:
-
a computer adapted to process a set of electronic information associated with a set of publications, each publication in the set of publications comprising at least one cited reference and having at least one authorship; software executing on the computer and adapted to; a. receive from a user data representing a user-defined set of publications each having at least one authorship and related to a unique author; b. recognize the received data as being associated with a researcher identifier; c. accessing an authority database and verifying the received data to render a threshold confirmation of correctness in association of the set of publications with the unique author; d. process the received data to do one or the other of
1) match the unique author with an existing unique author profile record stored by the authority database;
or
2) create a new unique author profile record and storing the new unique author profile record by the authority database. - View Dependent Claims (54, 55, 56, 57)
-
Specification