Clique based clustering for named entity recognition system
First Claim
1. An annotation method comprising:
- identifying named entities in a corpus together with contexts wherein the identifying of each named entity identifies the named entity as a noun or noun phrase starting with an upper-case letter;
grouping the named entities into cliques based on mutual context similarity, each clique including a plurality of different named entities each named entity being a noun or noun phrase starting with an upper-case letter, the named entities of each clique having mutual context similarity, the grouping of the named entities into cliques being non-exclusive in that a named entity can belong to more than one clique;
clustering the cliques to generate named entity groups each named entity group consisting of one or more cliques, the clustering being performed on the basis of mutual similarity of the contexts of the named entities constituting the cliques;
assigning annotations to the named entity groups; and
annotating named entity instances in the corpus based on the named entity groups and corresponding assigned annotations;
wherein at least the identifying, the grouping, and the clustering are performed by a computer.
1 Assignment
0 Petitions
Accused Products
Abstract
A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.
-
Citations
21 Claims
-
1. An annotation method comprising:
-
identifying named entities in a corpus together with contexts wherein the identifying of each named entity identifies the named entity as a noun or noun phrase starting with an upper-case letter; grouping the named entities into cliques based on mutual context similarity, each clique including a plurality of different named entities each named entity being a noun or noun phrase starting with an upper-case letter, the named entities of each clique having mutual context similarity, the grouping of the named entities into cliques being non-exclusive in that a named entity can belong to more than one clique; clustering the cliques to generate named entity groups each named entity group consisting of one or more cliques, the clustering being performed on the basis of mutual similarity of the contexts of the named entities constituting the cliques; assigning annotations to the named entity groups; and annotating named entity instances in the corpus based on the named entity groups and corresponding assigned annotations; wherein at least the identifying, the grouping, and the clustering are performed by a computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An annotation system comprising:
-
a named entity detector configured to identify named entities in a corpus together with contexts wherein each named entity identified by the named entity detector is a noun or noun phrase starting with an upper-case letter; a cliques identifier configured to receive and group the named entities into cliques based on mutual context similarity, each clique including a plurality of different named entities each named entity being a noun or noun phrase starting with an upper-case letter, the named entities of each clique having mutual context similarity, the cliques being nonexclusive in that a named entity can belong to more than one clique; and a cliques clusterer configured to receive and cluster the cliques to generate named entity groups each named entity group consisting of one or more cliques, the clustering being performed on the basis of mutual similarity of the contexts of the named entities constituting the cliques; wherein the named entity detector, the cliques identifier, and the cliques clusterer comprise a digital processing device. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 21)
-
-
17. A non-transitory storage medium storing instructions executable to perform a soft clustering method comprising:
-
(i) Grouping items into non-exclusive cliques based on mutual context similarity, each clique including a plurality of different named entities each named entity being a noun or noun phrase starting with an upper-case letter, the named entities of each clique having mutual context similarity, the grouping of the named entities into cliques being non-exclusive in that a named entity can belong to more than one clique, and (ii) Clustering the non-exclusive cliques using a hard clustering algorithm to generate named entity groups each named entity group consisting of one or more cliques, the clustering being performed on the basis of mutual similarity of the contexts of the named entities constituting the cliques. - View Dependent Claims (18, 19, 20)
-
Specification