Entity name recognition
First Claim
1. A method for recognizing entity names from a plurality of web documents, comprising:
- at a server having one or more processors and memory storing one or more programs executed by the one or more processors,identifying a common pattern in titles of the plurality of web documents;
selecting a selection of documents from the plurality of web documents, a title of each document in the selection sharing the common pattern;
for each document in the selection, generating an entity name candidate from the title of the document in accordance with the common pattern;
matching the entity name candidates of the selection of documents with a collection of entity names;
determining that the entity name candidates of the documents in the selection are valid entity names in accordance with a result of the matching; and
updating the collection of entity names in accordance with a result of the determining.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, method, and computer program product for recognizing entity names from a plurality of documents. Embodiments of the methods comprise selecting a selection of documents from a plurality of documents, the selection of documents sharing a common pattern in their titles. The method further comprises determining a name candidate for each document in the selection by applying the common pattern to the title of the document, and matching the name candidates with a collection of entity names (the white list). Responsive to determining a match between the name candidates and the entity names in the white list, the method determines that the name candidates are valid entity names. In one embodiment, the name candidates are added to the white list after being determined to be valid entity names.
-
Citations
20 Claims
-
1. A method for recognizing entity names from a plurality of web documents, comprising:
at a server having one or more processors and memory storing one or more programs executed by the one or more processors, identifying a common pattern in titles of the plurality of web documents; selecting a selection of documents from the plurality of web documents, a title of each document in the selection sharing the common pattern; for each document in the selection, generating an entity name candidate from the title of the document in accordance with the common pattern; matching the entity name candidates of the selection of documents with a collection of entity names; determining that the entity name candidates of the documents in the selection are valid entity names in accordance with a result of the matching; and updating the collection of entity names in accordance with a result of the determining. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A method for recognizing entity names from a plurality of web documents, comprising:
at a server having one or more processors and memory storing one or more programs executed by the one or more processors, (a) identifying common patterns in titles of the plurality of web documents; (b) selecting multiple selections of documents from the plurality of web documents, titles of documents in each selection sharing a respective common pattern associated with the selection of documents; (c) for each of the multiple selections of documents, generating a selection of entity name candidates by applying the respective common pattern to the titles of the documents in the selection of documents; and (d) in response to matching a collection of entity names and a selection of entity name candidates, performing steps comprising; (i) determining that the selection of name candidates are valid entity names, (ii) updating the collection of entity names in accordance with a result of the determining, (iii) removing a selection of documents associated with the selection of entity name candidates from the multiple selections of documents, and (iv) repeating step (d).
-
13. A system for recognizing entity names from a plurality of web documents, the system comprising:
-
a processor for executing computer program code; and a subsystem executable by the processor, the subsystem including computer program code for performing a method comprising; identifying a common pattern in titles of the plurality of web documents; selecting a selection of documents from the plurality of web documents, a title of each document in the selection sharing the common pattern; for each document in the selection, generating an entity name candidate from the title of the document in accordance with the common pattern; matching the entity name candidates of the selection of documents with a collection of entity names; determining that the name candidates of the documents in the selection are valid entity names in accordance with a result of the matching; and updating the collection of entity names in accordance with a result of the determining. - View Dependent Claims (14, 15, 16)
-
-
17. A computer readable storage medium storing one or more programs for execution by one or more processors of a computer system, the one or more programs including:
-
instructions for identifying a common pattern in titles of a plurality of web documents; instructions for selecting a selection of documents from the plurality of web documents, a title of each document in the selection sharing the common pattern; instructions for generating an entity name candidate for each document in the selection from the title of the document in accordance with the common pattern; instructions for matching the entity name candidates of the selection of documents with a collection of entity names; instructions for determining that the name candidates of the documents in the selection are valid entity names in accordance with a result of the matching; and instructions for updating the collection of entity names in accordance with a result of the determining. - View Dependent Claims (18, 19, 20)
-
Specification