Clique based clustering for named entity recognition system
First Claim
1. An annotation method comprising:
- identifying named entities in a corpus together with contexts;
grouping the named entities into cliques based on mutual context similarity, each clique including a plurality of different named entities having mutual context similarity, the grouping of the named entities into cliques being non-exclusive;
clustering the cliques to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques;
assigning annotations to the named entity groups; and
annotating named entity instances in the corpus based on the named entity groups and corresponding assigned annotations.
1 Assignment
0 Petitions
Accused Products
Abstract
A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.
38 Citations
21 Claims
-
1. An annotation method comprising:
-
identifying named entities in a corpus together with contexts; grouping the named entities into cliques based on mutual context similarity, each clique including a plurality of different named entities having mutual context similarity, the grouping of the named entities into cliques being non-exclusive; clustering the cliques to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques; assigning annotations to the named entity groups; and annotating named entity instances in the corpus based on the named entity groups and corresponding assigned annotations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An annotation system comprising:
-
a named entity detector configured to identify named entities in a corpus together with contexts; a cliques identifier configured to receive and group the named entities into cliques based on mutual context similarity, each clique including a plurality of different named entities having mutual context similarity; and a cliques clusterer configured to receive and cluster the cliques to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
- 17. A storage medium storing instructions executable to perform a soft clustering method comprising (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques.
Specification