×

Systems and methods for semantic information retrieval

  • US 9,280,520 B2
  • Filed: 08/02/2012
  • Issued: 03/08/2016
  • Est. Priority Date: 08/02/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving, by a computer-based system, a body of text from a data source,wherein the body of text is an electronic text and is one of an email, a website chat room, an internet forum, or a text message;

    parsing, by the computer-based system, the body of text by determining a language and structure of the body of text;

    determining, by the computer-based system, a known format of the body of the text;

    wherein the known format is based on the data source from which the body of text was received,identifying, by the computer-based system and in response to the determining, structured contextual information based on the known format of the body of the text,wherein the structured contextual information includes at least one of a sender email address, one or more recipient email addresses, a subject field, a message date and time stamp, or an attachment title;

    tokenizing, by the computer-based system and in response to the parsing, the body of text by splitting the body of text into individual tokens;

    generating, by the computer-based system and based on the tokenizing, a tagged body of text,wherein the generating comprises assigning each individual token a part-of-speech tag indicating a grammatical role of the individual token,wherein the part-of-speech tag may include custom terminology from a tagging database, andwherein the grammatical role includes one of a noun, a pronoun, a verb, an adverb, an adjective, a conjunction, a preposition, an article, an auxiliary verb, an infinitive, an interjection, modal verb, an object, a participle, a phrase, or a predicate;

    splitting, by the computer-based system, the tagged body of text into grammatical chunks;

    identifying, by the computer-based system, named entities within the body of text;

    resolving, by the computer-based system and based on the tokenizing, the individual tokens having a pronoun grammatical role with corresponding noun phrases;

    wherein the resolving the individual tokens comprises weighting the individual tokens having a pronoun grammatical role based on the structured contextual information,deciding, by the computer-based system and in response to the resolving, a context and purpose of the body of text,translating, by the computer-based system and in response to the deciding, semantic concepts of the body of text into one or more semantic tags;

    identifying, by the computer-based system and in response to the translating, one or more communication topics and presuppositions of the body of text,wherein the identifying the one or more communication topics and presuppositions comprises analysis of prior communications within the body of text to facilitate the tokenizing the body of text,wherein the analysis of prior communications within the body of text comprises, in response to the identifying the structured contextual information comprises;

    analyzing structured contextual information to facilitate a homophora resolution; and

    integrating, in response to the analyzing and in response to the weighting of the individual tokens having a pronoun grammatical role based on the structured contextual information, the homophora resolution into an anaphora resolution algorithm by substituting the structured contextual information into the body of text to help interpret the body of text;

    generating, by the computer-based system and in response to the translating, a list of the one or more semantic tags; and

    conducting, by the computer-based system, in response to the translating and using the one or more semantic tags, semantic reasoning to facilitate pattern identification within a group of documents,wherein the pattern identification includes analyzing implied relationships of the text within the group of documents to identify a specific topic, wherein the pattern identification is based on at least one of progress or consensus of the text within the group of documents; and

    displaying, by the computer-based system, in response to the conducting and to a user interface, the specific identified topic of the body of text.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×