Identifying entities in a digital work
First Claim
1. One or more non-transitory computer-readable media maintaining instructions executable by one or more processors to perform operations comprising:
- extracting text from a digital work;
identifying a plurality of names from the text extracted from the digital work, wherein location information is associated with each name, the location information including a location within the digital work of each occurrence of each name;
sorting the plurality of names in a sorted list by ordering the names based, at least in part, on a fullness of each of the names relative to other names in the sorted list, the fullness indicating an amount of information contained in each name;
generating a first name set and a second name set from the sorted list of names;
determining that a first name is present in the first name set and the second name set;
determining a proximity of a particular occurrence of the first name to an occurrence of a second name based on the location information;
identifying whether the particular occurrence of the first name belongs to the first name set or the second name set based, at least in part, on the proximity;
generating a digital supplemental information file comprising a visual representation of locations within the digital work where at least one name of the plurality of names in the first name set occur, wherein the visual representation comprises an object with markings, each of the markings corresponding to a respective occurrence;
receiving, from an electronic device, a request for the digital supplemental information file; and
sending the digital supplemental information file to the electronic device, the digital supplemental information file altering functionality of the digital work to include at least one selectable portion that enables display of the visual representation of the locations within the digital work.
1 Assignment
0 Petitions
Accused Products
Abstract
In some implementations, text is extracted from a digital work and proper nouns are identified in the text to generate a list of names. The list of names may be sorted so that names containing more information are positioned toward the beginning of the list. The list may be traversed to cluster names and alternate names into name sets that correspond to particular entities in the digital work. Non-unique names that appear in more than one name set may be disambiguated based on proximity to unique names in the same name sets to determine which occurrences of the non-unique names belong with which name sets. Furthermore, a representative name may be selected from among multiple names in a name set for use in representing an entity or object corresponding to the name set. In some examples, the representative name may be selected based on a fullness of the name.
129 Citations
26 Claims
-
1. One or more non-transitory computer-readable media maintaining instructions executable by one or more processors to perform operations comprising:
-
extracting text from a digital work; identifying a plurality of names from the text extracted from the digital work, wherein location information is associated with each name, the location information including a location within the digital work of each occurrence of each name; sorting the plurality of names in a sorted list by ordering the names based, at least in part, on a fullness of each of the names relative to other names in the sorted list, the fullness indicating an amount of information contained in each name; generating a first name set and a second name set from the sorted list of names; determining that a first name is present in the first name set and the second name set; determining a proximity of a particular occurrence of the first name to an occurrence of a second name based on the location information; identifying whether the particular occurrence of the first name belongs to the first name set or the second name set based, at least in part, on the proximity; generating a digital supplemental information file comprising a visual representation of locations within the digital work where at least one name of the plurality of names in the first name set occur, wherein the visual representation comprises an object with markings, each of the markings corresponding to a respective occurrence; receiving, from an electronic device, a request for the digital supplemental information file; and sending the digital supplemental information file to the electronic device, the digital supplemental information file altering functionality of the digital work to include at least one selectable portion that enables display of the visual representation of the locations within the digital work. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method comprising:
under control of one or more processors configured with executable instructions, obtaining, by one or more computing devices, names from a digital work; generating a digital supplemental information file comprising one or more name sets relating to the names, each name set corresponding to a different entity in the digital work, each name set including at least one name for the respective entity corresponding to the name set, and at least one name set including multiple different names in the name set for the respective entity corresponding to that name set, wherein the one or more name sets are generated by at least; sorting the names in the one or more name sets into a sorted list; obtaining information from a source external to the digital work, the information indicating that a particular name in the sorted list is an alternate name for a name in the name set; and adding the particular name from the sorted list to a particular name set based at least in part on external information; receiving, from an electronic device, a request for the digital supplemental information file, wherein the digital supplemental information file comprises a visual representation of locations within the digital work where at least one name of the names in the one or more name sets occur, wherein the visual representation comprises an object with markings, each of the markings corresponding to a respective occurrence; and sending the digital supplemental information file to the electronic device, the digital supplemental information file altering functionality of the digital work to include at least one selectable portion that enables display of the visual representation of the locations within the digital work. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
21. A system comprising:
-
one or more processors; one or more non-transitory computer-readable media; and one or more modules maintained on the one or more non-transitory computer-readable media to be executed by the one or more processors to perform operations including; obtaining a plurality of names from a digital work; generating a digital supplemental information file comprising a first name set and a second name set relating to the plurality of names, the first name set and the second name set generated by; selecting a first name from the plurality of names to generate a first name set; and selecting a second name from the plurality of names to generate the second name set; adding a third name to either the first name set or the second name set if the third name corresponds to the first name or the second name; determining that the third name corresponds to both the first name set and the second name set; determining a proximity in the digital work of the third name to an occurrence of a first name and an occurrence the second name; and identifying whether the third name belongs to the first name set or the second name set based, at least in part, on the proximity; receiving, from an electronic device, a request for the digital supplemental information file, wherein the digital supplemental information file comprises a visual representation of locations within the digital work where at least one name of the plurality of names in the first name set occur, wherein the visual representation comprises an object with markings, each of the markings corresponding to a respective occurrence; and sending the digital supplemental information file to the electronic device, the digital supplemental information file altering functionality of the digital work to include at least one selectable portion that enables display of the visual representation of the locations within the digital work. - View Dependent Claims (22, 23, 24, 25, 26)
-
Specification