×

Systems and methods for extracting attributes from text content

  • US 9,934,218 B2
  • Filed: 04/18/2012
  • Issued: 04/03/2018
  • Est. Priority Date: 12/05/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method implemented by one or more computers for extracting one or more descriptors from text data associated with a specified term in the text data, the method comprising:

  • receiving, by at least one of the one or more computers, the text data;

    receiving, by at least one of the one or more computers, the specified term to be located in the text data, the specified term being at least one word;

    creating, by at least one of the one or more computers, a tagged information file by associating part of speech tags to words in the text data, including any descriptors present in the text data, wherein a descriptor comprises one or more words of the text data that succeed or precede the specified term;

    identifying, by at least one of the one or more computers, a location of the specified term in the tagged information file using an approximate text matching technique, wherein the approximate text matching technique;

    detects the specified term grouped together with the descriptors of the specified term in the text data using the tagged information file, the specified term grouped together with the descriptors of the specified term forming a variable region or variable window that is context sensitive and not of a fixed size; and

    identifies, through a finite state machine, a grammatical context shift in the context sensitive region pertaining to the specified term in the text data by analyzing the part of speech tags of the tagged information file,wherein the grammatical context shift is indicated by an autonomous transition of the finite state machine from a first state associated with a first part of speech tag of the tagged information file to a second state associated with a second part of speech tag of the tagged information file for parts of speech associated with words before and after the specified term;

    determining based on the determined grammatical context shift, by at least one of the one or more computers, the one or more descriptors of the specified term;

    extracting, by at least one of the one or more computers, the one or more descriptors of the specified term from the text data; and

    providing, by at least one of the one or more computers, a report comprising the extracted one or more descriptors of the specified term.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×