Systems and methods for extracting attributes from text content
First Claim
1. A method implemented by one or more computers for extracting one or more descriptors from text data associated with a specified term in the text data, the method comprising:
- receiving, by at least one of the one or more computers, the text data;
receiving, by at least one of the one or more computers, the specified term to be located in the text data, the specified term being at least one word;
creating, by at least one of the one or more computers, a tagged information file by associating part of speech tags to words in the text data, including any descriptors present in the text data, wherein a descriptor comprises one or more words of the text data that succeed or precede the specified term;
identifying, by at least one of the one or more computers, a location of the specified term in the tagged information file using an approximate text matching technique, wherein the approximate text matching technique;
detects the specified term grouped together with the descriptors of the specified term in the text data using the tagged information file, the specified term grouped together with the descriptors of the specified term forming a variable region or variable window that is context sensitive and not of a fixed size; and
identifies, through a finite state machine, a grammatical context shift in the context sensitive region pertaining to the specified term in the text data by analyzing the part of speech tags of the tagged information file,wherein the grammatical context shift is indicated by an autonomous transition of the finite state machine from a first state associated with a first part of speech tag of the tagged information file to a second state associated with a second part of speech tag of the tagged information file for parts of speech associated with words before and after the specified term;
determining based on the determined grammatical context shift, by at least one of the one or more computers, the one or more descriptors of the specified term;
extracting, by at least one of the one or more computers, the one or more descriptors of the specified term from the text data; and
providing, by at least one of the one or more computers, a report comprising the extracted one or more descriptors of the specified term.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and method for extracting attributes from text content are described. Example embodiments may include a computer implemented method for extracting attributes from text data, wherein the text data is obtained from at least one information source. As described, the implementation may include receiving, from a user, an address for the at least one information source and an attribute name, creating a tagged information file by associating a part of speech tag to text data obtained from the at least one information source, identifying a location of the attribute name in the tagged information file using an approximate text matching technique and determining at least one attribute descriptor from the tagged information file wherein the tagged information file is parsed based on a part of speech tag associated with the attribute name to determine a conclusion of the attribute descriptor.
-
Citations
19 Claims
-
1. A method implemented by one or more computers for extracting one or more descriptors from text data associated with a specified term in the text data, the method comprising:
-
receiving, by at least one of the one or more computers, the text data; receiving, by at least one of the one or more computers, the specified term to be located in the text data, the specified term being at least one word; creating, by at least one of the one or more computers, a tagged information file by associating part of speech tags to words in the text data, including any descriptors present in the text data, wherein a descriptor comprises one or more words of the text data that succeed or precede the specified term; identifying, by at least one of the one or more computers, a location of the specified term in the tagged information file using an approximate text matching technique, wherein the approximate text matching technique; detects the specified term grouped together with the descriptors of the specified term in the text data using the tagged information file, the specified term grouped together with the descriptors of the specified term forming a variable region or variable window that is context sensitive and not of a fixed size; and identifies, through a finite state machine, a grammatical context shift in the context sensitive region pertaining to the specified term in the text data by analyzing the part of speech tags of the tagged information file, wherein the grammatical context shift is indicated by an autonomous transition of the finite state machine from a first state associated with a first part of speech tag of the tagged information file to a second state associated with a second part of speech tag of the tagged information file for parts of speech associated with words before and after the specified term; determining based on the determined grammatical context shift, by at least one of the one or more computers, the one or more descriptors of the specified term; extracting, by at least one of the one or more computers, the one or more descriptors of the specified term from the text data; and providing, by at least one of the one or more computers, a report comprising the extracted one or more descriptors of the specified term. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for extracting one or more descriptors from text data for a specified term in the text data, wherein the text data is obtained from at least one information source, the system comprising:
-
a user interface configured to receive, from a user; an address for the at least one information source, the address being a uniform resource locator (URL) address or a location of a text file within a storage device, the term being at least one word to be located in the text data; and the specified term; and at least one hardware processor operatively coupled to a memory and a non-transitory storage storing instructions which when executed by at least one of the processors cause the at least one hardware processor to; generate a tagged information file by associating part of speech tags to the text data obtained from the at least one information source, including any descriptors present in the text data, wherein a descriptor comprises one or more words of the text data that succeed or precede the specified term; identify a location of the specified term in the tagged information file using an approximate text matching technique, wherein the approximate text matching technique; detects the specified term grouped together with the descriptors of the specified term in the text data using the tagged information file, the specified term grouped together with the descriptors of the specified term forming a variable region or variable window that is context sensitive and not of a fixed size; and
;identifies, through a finite state machine, a grammatical context shift in the context sensitive region pertaining to the specified term in the text data by analyzing the part of speech tags of the tagged information file, wherein the grammatical context shift is indicated by an autonomous transition of the finite state machine from a first state associated with a first part of speech tag of the tagged information file to a second state associated with a second part of speech tag of the tagged information file for parts of speech associated with words before and after the specified term; determine based on the grammatical context shift the one or more descriptors of the specified term; extract the one or more descriptors of the specified term from the text data; and return the one or more extracted descriptors of the specified term. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer readable medium comprising a plurality of computer-executable instructions stored thereon that, when executed, cause a computing system to perform processing for extracting one or more descriptors of a specified term in text data from the text data, the processing comprising:
-
receiving, from a user; an address for at least one information source, the address being a uniform resource locator (URL) address or a location of a text file within a storage device, the term being at least one word or other text token; and the specified term; creating a tagged information file by associating part of speech tags to text data obtained from the at least one information source, including to any descriptors present in the text data, wherein a descriptor comprises one or more words of the text data that succeed or precede the specified term; identifying a location of the specified term in the tagged information file using an approximate text matching technique, wherein the approximate text matching technique; detects the specified term grouped together with the descriptors of the specified term in the text data using the tagged information file, the specified term grouped together with the descriptors of the specified term forming a variable region or variable window that is context sensitive and not of a fixed size; and identifies, through a finite state machine, a grammatical context shift in the context sensitive region pertaining to the specified term in the text data by analyzing the part of speech tags of the tagged information file, wherein the grammatical context shift is indicated by an autonomous transition of the finite state machine from a first state associated with a first part of speech tag of the tagged information file to a second state associated with a second part of speech tag of the tagged information file for parts of speech associated with words before and after the specified term; determining based on the determined grammatical context shift the one or more descriptors of the specified term; extracting the one or more descriptors of the specified term from the text data; and providing a report comprising the extracted one or more descriptors of the specified term. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification