×

Methodologies and analytics tools for identifying white space opportunities in a given industry

  • US 9,183,286 B2
  • Filed: 06/03/2008
  • Issued: 11/10/2015
  • Est. Priority Date: 02/13/2007
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for use with at least one keyword retrieved from a first set of documents, wherein the keyword corresponds to a predefined subject matter, the method comprising:

  • constructing snippets from textual material in said first set of documents stored on a computer to produce constructed snippets, each of said constructed snippets including at least one non-key word appearing within a specified text distance of said at least one keyword;

    defining, by a computer processor, a plurality of categories wherein each of said constructed snippets is assigned to one of said plurality of categories, only if said assigned snippet is not already assigned to another of said plurality of said categories, each of said plurality of categories being designated for receiving similar constructed snippets;

    creating a respective mathematical model for each of said plurality of categories;

    analyzing a second set of documents to determine an assignment for each document in said second set of documents to a selected one of said plurality of categories, said assignment based on matching each of said documents in said second set of documents to said mathematical model for the selected one of said plurality of categories;

    assigning a numeric vector to each document of the first and second sets of documents, wherein the numeric vector represents occurrences of one of the constructed snippets within the respective document;

    creating a partition taxonomy that includes less than all of the plurality of categories, wherein the partition taxonomy creation is based on a clustered configuration of the first and second sets of documents;

    editing, using a computer processor, less than all of the plurality of categories in the partition taxonomy using domain expertise to produce edited categories in an edited partition taxonomy, such that each document of the first and second sets of documents is assigned to a corresponding one of the less than all of the plurality of categories;

    creating a classification taxonomy based on the edited partition taxonomy, based on a number of documents in each of the edited categories, based on percentage similarity of words between documents in one of the edited categories, and based on distances between category centroids of the edited categories, wherein a category centroid for an edited category is an average of values of the numeric vectors for the documents in the category;

    identifying at least one white space in said classification taxonomy, said at least one white space including one or more of the edited categories that contain fewer than a specified number of documents.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×