×

Domain-specific unstructured text retrieval

  • US 10,318,564 B2
  • Filed: 09/28/2015
  • Issued: 06/11/2019
  • Est. Priority Date: 09/28/2015
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus for retrieving unstructured text from the Internet related to a specified domain, the apparatus comprising:

  • one or more processors; and

    a memory having instructions stored therein, the instructions executable by the one or more processors to perform operations asa first classifier having been trained using training data comprising unstructured text related to the specified domain, the training data having a plurality of features, the unstructured text being separated from structured data and semi-structured data;

    a similar web page retriever configured to retrieve, from the Internet, only web pages that include text that is unstructured and do not have at least some of the plurality of features of the training data, and where the retrieved web pages are similar to web pages classified by the first classifier; and

    a second classifier having been trained using unstructured text examples which do not have at least one of the plurality of features;

    wherein the second classifier is configured to label web pages retrieved by the similar web page retriever to select web pages which are relevant to the specified domain.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×