Detecting relationships in unstructured text
First Claim
1. A computer-implemented method of detecting that different entities identified by different proper names are in a relationship, said method comprising:
- storing, in a storage device, an input file comprising text patterns, each text pattern comprising;
a specific text expression that represents a specific type of relationship between a first entity and a second entity;
a first slot location identifier indicating a location of a first slot for a first proper name of said first entity in said specific type of relationship relative to said specific text expression; and
a second slot location identifier indicating a location of a second slot for a second proper name of said second entity in said specific type of relationship relative to said specific text expression;
locating and tagging, by a processor, occurrences of all proper names within multiple text-based documents in order to generate a list of proper names and locations of said proper names within said multiple text-based documents;
analyzing, by said processor, said multiple text-based documents so as to locate a document that contains said specific text expression of a specific text pattern; and
accessing, by said processor, said list to determine if any of said proper names on said list are located in said document within said first slot relative to said specific text expression, as determined based on said first slot location identifier, and within said second slot relative to said specific text expression, as determined based on said second slot location identifier so as to detect said specific type of relationship and so as to identify, said first entity in said specific type of relationship by said first proper name and said second entity in said specific type of relationship by said second proper name, respectively.
0 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are embodiments of a system and a method for detecting relationships described in unstructured text-based electronic documents. The system and method incorporate the use of an input file that contains one or more text patterns that represent particular relationships. The text patterns each include regular text expressions that describe the particular relationship and slots for the location of each entity in that relationship. Document(s) are selected by a user and scanned by a proper noun tagger that identifies and tags every occurrence of proper names within the document(s). Then, a pattern matcher scans the document(s) to match text patterns. If a text pattern is matched within a document a relationship detector extracts all pairs of proper names found in the slots for each matched text pattern. The output from the relationship detector includes the names for each entity in the relationship, the type of relationship, and the identity of the document and the location of the sentence describing the relationship in the document.
53 Citations
21 Claims
-
1. A computer-implemented method of detecting that different entities identified by different proper names are in a relationship, said method comprising:
-
storing, in a storage device, an input file comprising text patterns, each text pattern comprising; a specific text expression that represents a specific type of relationship between a first entity and a second entity; a first slot location identifier indicating a location of a first slot for a first proper name of said first entity in said specific type of relationship relative to said specific text expression; and a second slot location identifier indicating a location of a second slot for a second proper name of said second entity in said specific type of relationship relative to said specific text expression; locating and tagging, by a processor, occurrences of all proper names within multiple text-based documents in order to generate a list of proper names and locations of said proper names within said multiple text-based documents; analyzing, by said processor, said multiple text-based documents so as to locate a document that contains said specific text expression of a specific text pattern; and accessing, by said processor, said list to determine if any of said proper names on said list are located in said document within said first slot relative to said specific text expression, as determined based on said first slot location identifier, and within said second slot relative to said specific text expression, as determined based on said second slot location identifier so as to detect said specific type of relationship and so as to identify, said first entity in said specific type of relationship by said first proper name and said second entity in said specific type of relationship by said second proper name, respectively. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for detecting that different entities identified by different proper names are in a relationship, said system comprising:
-
a storage device storing an input file comprising text patterns, each text pattern comprising; a specific text expression that represents a specific type of relationship between a first entity and a second entity; a first slot location identifier indicating a location of a first slot for a first proper name of said first entity in said specific type of relationship relative to said specific text expression; and a second slot location identifier indicating a location of a second slot for a second proper name of said second entity in said specific type of relationship relative to said specific text expression; and a processor comprising; a proper noun tagger locating and tagging occurrences of all proper names within multiple text-based documents in order to generate a list of proper names and locations of said proper names within said multiple text-based documents; a pattern matcher accessing said input file in said storage device and analyzing said multiple text-based documents so as to locate a document that contains said specific text expression of a specific text pattern within said document; and a relationship detector in communication with said pattern matcher and said proper noun tagger, said relationship detector accessing said list to determine if any of said proper names on said list are located in said document within said first slot relative to said specific text expression, as determined based on said first slot location identifier, and within said second slot relative to said specific text expression, as determined based on said second slot location identifier, so as to detect said specific type of relationship and so as to identify, by name, said first entity in said specific type of relationship by said first proper name and said second entity in said specific type of relationship by said second proper name, respectively. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A program storage medium readable by a computer and tangibly embodying a program of instructions executable by said computer to perform a method of detecting a that different entities identified by different proper names are in a relationship, said method comprising:
-
storing an input file comprising text patterns, each text pattern comprising; a specific text expression that represents a specific type of relationship between a first entity and a second entity; a first slot location identifier indicating a location of a first slot for a first proper name of said first entity in said specific type of relationship relative to said specific text expression; and a second slot location identifier indicating a location of a second slot for a second proper name of said second entity in said specific type of relationship relative to said specific text expression; locating and tagging occurrences of all proper names within multiple text-based documents in order to generate a list of proper names and locations of said proper names within said multiple text-based documents; analyzing said multiple text-based documents so as to locate a document that contains said specific text expression of a specific text pattern; and accessing said list to determine if any of said proper names on said list are located in said document within said first slot relative to said specific text expression, as determined based on said first slot location identifier, and within said second slot relative to said specific text expression, as determined based on said second slot location identifier so as to detect said specific type of relation ship and so as to identify said first entity in said specific type of relationship by said first proper name and said second entity in said specific type of relationship by said second proper name, respectively. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A computer-implemented method of detecting that different entities identified by different proper names are in a relationship, said method comprising:
-
storing, in a storage device, an input file comprising text patterns, each text pattern comprising; a specific text expression that represents a specific type of relationship between a first entity and a second entity; a first slot location identifier indicating a location of a first slot for a first proper name of said first entity in said specific type of relationship relative to said specific text expression; and a second slot location identifier indicating a location of a second slot for a second proper name of said second entity in said specific type of relationship relative to said specific text expression; locating and tagging, by a processor, occurrences of all proper names within multiple text-based documents in order to generate a list of proper names and locations of said proper names within said multiple text-based documents; analyzing, by said processor, said multiple text-based documents so as to locate a document that contains said specific text expression of a specific text pattern; and accessing, by said processor, said list to determine if any of said proper names on said list are located in said document within said first slot relative to said specific text expression, as determined based on said first slot location identifier, and within said second slot relative to said specific text expression, as determined based on said second slot location identifier so as to detect said specific type of relationship and so as to identify said first entity in said specific type of relationship by said first proper name and said second entity in said specific type of relationship by said second proper name, respectively, said specific text expression comprising a plurality of words that describe said specific type of relationship with one of said words being a keyword and said method further comprising before said analyzing of said multiple text-based documents, scanning said multiple text-based documents to determine if said keyword is located in said document and only performing said analyzing if said keyword is located in said document, said specific text expression comprising a plurality of words that describe said specific type of relationship, said first slot location identifier indicating whether said first slot for said first entity is located before, within, or after said plurality of words, and said second slot location identifier indicating whether said second slot for said second entity is located before, within, or after said plurality of words, said locating and said tagging comprising; scanning said multiple text-based documents to identify any proper names occurring within said multiple text-based documents based on a set of matching rules; re-scanning said document to tag locations for each of said proper names identified; and recording said locations in said list, said set of matching rules being based on at least one of word capitalization, sentence structure, sentence boundaries, and excluded words, each of said test patterns further comprising a relationship order identifier indicating relative positions of said two different entities in said specific type of relationship as a function of said locations of said proper names within said first slot and said second slot, and said method further comprising storing a record comprising said first proper name of said first entity, said second proper name of said second entity, said specific type of relationship between said first entity and said second entity, a first position of said first entity and a second position of said second entity in said specific type of relationship, and an identifier for said document and a location in said document where said specific type of relationship is detected.
-
Specification