Method and system for disambiguating informational objects
First Claim
1. A computer implemented method comprising:
- a. selecting a set of electronic information associated with a set of publications, each publication in the set of publications comprising at least one cited reference and having at least one authorship;
b. disambiguating at least part of the set of electronic information by using a set of at least two cited references associated with a set of at least two publications from the set of publications to determine an authorship similarity, disambiguating including scoring authorship similarity;
wherein the disambiguating step includes arriving at a scored authorship similarity attribute;
c. linking authorships based on the determined authorship similarity and clustering two or more linked authorships to form a first cluster and forming a first author entity associated with the first cluster;
d. matching the first author entity with a first actual author, the first cluster of authorships being attributable to the first actual author, and repeating the clustering step to form a plurality of clusters respectively associated with a plurality of unique author entities; and
e. incorporating into an authority database of authors the plurality of unique author entities each associated with a unique actual author and a cluster.
14 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a Distinct Author Identification System (“DAIS”) for disambiguating data to discern author entities and link or associate authorships with such author entities. The invention provides powerful disambiguation processes applied across one or more databases to yield a disambiguated authority database of authors. An entire database of publications may be processed by the DAIS to group/link authorships and to identify author entities. The author entities may then be matched or associated with actual authors to establish an authority database of authors. After initial evaluation, the DAIS may be used to reevaluate some or all of the database(s) and/or the authority database established by the DAIS may be used to add or update information. DAIS may use “hierarchical clustering” to link authorships and identify authors based on authorship similarity. DAIS evaluates the likelihood that authorships are from the same author.
28 Citations
37 Claims
-
1. A computer implemented method comprising:
-
a. selecting a set of electronic information associated with a set of publications, each publication in the set of publications comprising at least one cited reference and having at least one authorship; b. disambiguating at least part of the set of electronic information by using a set of at least two cited references associated with a set of at least two publications from the set of publications to determine an authorship similarity, disambiguating including scoring authorship similarity;
wherein the disambiguating step includes arriving at a scored authorship similarity attribute;c. linking authorships based on the determined authorship similarity and clustering two or more linked authorships to form a first cluster and forming a first author entity associated with the first cluster; d. matching the first author entity with a first actual author, the first cluster of authorships being attributable to the first actual author, and repeating the clustering step to form a plurality of clusters respectively associated with a plurality of unique author entities; and e. incorporating into an authority database of authors the plurality of unique author entities each associated with a unique actual author and a cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-based system comprising:
-
a computer adapted to process a set of electronic information associated with a set of publications, each publication in the set of publications comprising at least one cited reference and having at least one authorship; software executing on the computer and adapted to disambiguate at least part of the set of electronic information by using a set of at least two cited references associated with a set of at least two publications from the set of publications to determine an authorship similarity; a database operatively connected to the computer and adapted to receive and store for processing by the computer the set of information; an authorship similarity routine executing on the computer and adapted to process at least some of the set of electronic information using cited reference data to determine a degree of authorship similarity; a linking routine executing on the computer and adapted to link authorships based on the degree of authorship similarity; a clustering routine executing on the computer and adapted to cluster two or more linked authorships to form a first cluster and adapted to form a first author entity associated with the first cluster, and wherein the database comprises an authority database of authors comprised of a plurality of distinct actual authors matched respectively with a plurality of unique author entities. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A computer implemented method for maintaining an authority database of authors used in searching at least one publications database for publications of interest, the method comprising:
-
a. receiving publications, each publication containing at least one cited reference and having at least one authorship; and b. disambiguating the received publications by comparing the at least one cited references with data associated with the authority database of authors to determine an authorship similarity between publication authorships, disambiguating including scoring authorship similarity;
wherein the disambiguating step includes arriving at a scored authorship similarity attribute;c. linking authorships based on the determined authorship similarity and clustering two or more linked authorships to form a first cluster and forming a first author entity associated with the first cluster; d. matching the first author entity with a first actual author, the first cluster of authorships being attributable to the first actual author, and repeating the clustering step to form a plurality of clusters respectively associated with a plurality of unique author entities; and e. incorporating into the authority database of authors the plurality of unique author entities each associated with a unique actual author and a cluster.
-
Specification