×

Identifying content and content relationship information associated with the content for ingestion into a corpus

  • US 10,642,935 B2
  • Filed: 05/12/2014
  • Issued: 05/05/2020
  • Est. Priority Date: 05/12/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method, in a data processing system comprising a processor and a memory configured to implement a natural language processing (NLP) system, for identifying content relationship for content copied by a content identification mechanism, the method comprising:

  • executing, by the processor of a computing device, a content identification mechanism, the content identification mechanism being resident in the memory device of the computing device;

    identifying, by the content identification mechanism in the data processing system, the content from a website on another data processing system via a network using natural language processing (NLP);

    generating, by the content identification mechanism, a file structure in the data processing system, wherein the file structure comprises the content parsed into a hierarchy and a set of cross reference information for the hierarchy;

    populating, by the content identification mechanism, the file structure with path information for the content on the other data processing system that identifies a path to a current web page of the website;

    identifying, by the content identification mechanism, relationship content information associated with the current web page based on at least one of the set of cross reference information or contextual clues of the content, wherein the relationship content is a path to a current web page where the relationship content is found as well as other identified content, including headers, section titles, page titles, web site structure, extracted concepts, information type, metadata, or other data about the content itself that is not within the content, including location of the content on the website, type or classification details of the website;

    modifying, by the content identification mechanism, the file structure associated with the content with the relationship content information, wherein modifying the file structure associated with the content with the relationship content information is performed either through generating a new file structure with the path information as well as other identified content, augmenting an existing file structure with new information, or updating the existing file structure with a change in the path information or the other identified content;

    identifying, by the content identification mechanism, one or more classification identifiers associated with the web page in order to classify the content from the website;

    ingesting, by the content identification mechanism, the content from the website on the other data processing system via the network;

    transmitting, by the content identification mechanism, the content and the file structure associated with the content to a specific corpus in the NLP system based on the one or more classification identifiers so that the NLP system may respond to inquiries using the content and information in the file structure associated with the content;

    responsive to the content identification mechanism identifying changes to the content or the relationship content from the website or information associated with the current web page where the content is found on the website, updating, by the content identification mechanism, the file structure associated with the content thereby forming an updated file structure;

    transmitting, by the content identification mechanism, the updated file structure associated with the content to the specific corpus in the NLP system based on the one or more classification identifiers so that the NLP system may respond to new inquiries using the content and information in the updated file structure associated with the content;

    receiving, by a Question Answering (QA) system, a first question from a first user;

    processing the first question, by one or more software engines of the QA system, using the updated file structure, into one or more queries to apply to a corpora and/or knowledge domain;

    generating, by the QA system, one or more potential candidate answers for answering the first question;

    generating, by the QA system, a confidence score for the one or more potential candidate answers to the first question, wherein the score is determined by comparing the one or more candidate answers to the first question using one or more reasoning algorithms;

    generating a first set ranked list of candidate answers based on the confidence score for the one or more candidate answers;

    storing the generated first set ranked list of candidate answers, by the QA system, in association with the first question received by the first user;

    receiving, by the Question Answering (QA) system, a second question from a second user subsequent to the first question, the second question being the same as the first question received by the first user;

    processing the second question, by one or more software engines of the QA system, using the updated file structure, into one or more queries to apply to a corpora and/or knowledge domain;

    generating, by the QA system, one or more potential candidate answers for answering the second question;

    generating, by the QA system, a confidence score for the one or more potential candidate answers to the second question, wherein the score is determined by comparing the one or more candidate answers to the second question using one or more reasoning algorithms;

    generating a second set ranked list of candidate answers based on the confidence score for the one or more candidate answers to the second question;

    comparing, by the QA system, the generated second set ranked list of candidate answers to the second question to the stored generated first set ranked list of candidate answers to the first question; and

    identifying, by the QA system, differences between the first set ranked list of candidate answers to the second set ranked list of candidate answers.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×