NLP-based entity recognition and disambiguation

US 10,282,389 B2
Filed: 02/27/2017
Issued: 05/07/2019
Est. Priority Date: 10/17/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for disambiguating one or more entities in an indicated text segment to present entity information to a user using a web browser, comprising:

processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;

performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names;

generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context;

disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities; and

presenting the entity information to the user using the web browser based on the disambiguation.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.

Citations

24 Claims

1. A computer-implemented method for disambiguating one or more entities in an indicated text segment to present entity information to a user using a web browser, comprising:
- processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
  
  performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names;
  
  generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context;
  
  disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities; and
  
  presenting the entity information to the user using the web browser based on the disambiguation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the candidate entities are entity entries retrieved from a knowledge repository or an ontology knowledge base.
  - 3. The method of claim 1 wherein the disambiguating which entities are being referred to in the indicated text segment by determining the one or more most likely entities which are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity name with attributes of one or more candidate entities further comprises:
    - searching a knowledge repository for a set of candidate entities that have similar characteristics to the properties of one or more of the generated entity profiles;
      
      ranking the candidate entities in the set of candidate entities to determine a set of mostly likely entities which are referred to in the text segment; and
      
      providing the determined set of mostly likely entities.
  - 4. The method of claim 3 wherein the ranking the candidate entities weights the candidate entities according to contextual information surrounding portions of the text segment that refer to the potential entity names.
  - 5. The method of claim 3 wherein the ranking the candidate entities weights the candidate entities according to preference information.
  - 6. The method of claim 3 wherein the ranking the candidate entities further comprises using a classification model to classify the candidate entities and to order them based upon closes matches.
  - 7. The method of claim 1 wherein the disambiguating which entities are being referred to in the indicated text segment by determining the one or more most likely entities which are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity name with attributes of one or more candidate entities further comprises:
    - resolving the one or more most likely entities to a single identified entity by performing iterative comparisons reusing entity disambiguation information gained from a prior comparison until no new entity disambiguation information is gained.
  - 8. The method of claim 1 wherein each entity profile comprises a feature vector of terms collected from modifiers and/or actions associated with the potential entity name based upon the linguistic analysis of the processed text segment.
  - 9. The method of claim 1, further comprising using the determined one or more likely entities to inform a relationship search.
  - 10. The method of claim 1 wherein the method is embedded in code that supports a widget presented on a web page.

11. A non-transitory computer-readable medium containing contents that, when executed, causes a computing system to perform a method to present entity information to a user using a web browser comprising:
- processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
  
  performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names;
  
  generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the entity name that have been retrieved from a linguistic analysis of the surrounding context;
  
  automatically disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities retrieved from a knowledge base; and
  
  presenting the entity information to the user using the web browser based on the disambiguation.
- View Dependent Claims (12)
- - 12. The non-transitory computer-readable medium of claim 11 embedded in a computing system configured to perform indexing and storing of a corpus of documents for searching using natural language processing.

13. An entity recognition and disambiguation computing system to present entity information to a user using a web browser, comprising:
- a memory;
  
  a computer processor; and
  
  a recognition and disambiguation module stored in the memory that is configured, when executed on the computer processor, to receive a text segment for processing;
  
  process the received text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
  
  perform linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the processed text segment by potential entity names;
  
  generate and store, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context;
  
  disambiguate the potential entities to determine a single named entity to which the received text segment is deemed to refer, based upon a combination of linguistic analysis, contextual information gleaned from text surrounding the potential entity name in the received text segment, and information retrieved from one or more knowledge repositories, by comparing the entity entity profiles generated for each potential entity with attributes of one or more stored candidate entities; and
  
  present the entity information to the user using the web browser based on the disambiguation.
- View Dependent Claims (14, 15)
- - 14. The system of claim 13, wherein the module is further configured, when executed, to disambiguate the candidate named entities using information based upon a relationship search.
  - 15. The system of claim 13 wherein the module is further configured, when executed, to disambiguate the candidate named entities using a classification modeling approach.

16. A computer-implemented method for presenting entity information to a user using a web browser comprising:
- receiving an indication of a segment of text; and
  
  invoking a recognition and disambiguation module to process the indicated text segment to automatically determine one or more named entities referred to in the text segment, wherein the recognition and disambiguation module is configured toreceive the indiciated text segment for processing;
  
  process the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
  
  perform linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the processed text segment by potential entity names;
  
  generate and store, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the linguistic analysis of the processed text segment;
  
  disambiguate the potential entities to determine one or more named entities to which the received text segment is deemed to refer, based upon a combination of linguistic analysis, contextual information gleaned from text surrounding the potential entity names in the received text segment, and information retrieved from one or more knowledge repositories, by comparing the entity profiles generated for each potential entity with attributes of one or more candidate entities; and
  
  for each determined one or more named entities, cause entity information to be annotated by presenting, to the user using the web browser, a link to the entity information associated with the named entity based on the disambiguation.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
- - 17. The method of claim 16 wherein the entity information is based in part upon an ontology entry.
  - 18. The method of claim 16 wherein the link is presented on a web page and wherein the entity information associated with each named entity presents summary information for the web page.
  - 19. The method of claim 16 wherein at least one of the one or more named entities determined from the disambiguated potential entities is a name of a place, and wherein the method further comprises:
    - determining coordinates of the place named by the at least one of the one or more named entities determined from the disambiguated potential entities; and
      
      displaying the name of the place on a map according to the determined coordinates.
  - 20. The method of claim 16 wherein the entity information annotated is a document.
  - 21. The method of claim 16 wherein each determined one or more named entities is ranked and the link to entity information associated with each ranked entity is presented in ranked order.
  - 22. The method of claim 16, further comprising:
    - presenting advertising targeted to each named entity.
  - 23. The method of claim 16 wherein the link is presented in a pop-up window.
  - 24. The method of claim 16 wherein the link is presented using a programming module accessible from an application programming interface.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fiver LLC
Original Assignee
Fiver LLC
Inventors
Liang, Jisheng, Koperski, Krzysztof, Dhillon, Navdeep S., Tusk, Carsten, Bhatti, Satish
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
Villena, Mark

Application Number

US15/444,302
Publication Number

US 20170262412A1
Time in Patent Office

799 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/10   Text processing natural lan...

G06F 40/247   Thesauruses; Synonyms

G06F 40/268   Morphological analysis

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

NLP-based entity recognition and disambiguation

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

NLP-based entity recognition and disambiguation

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links