Methodologies and analytics tools for identifying white space opportunities in a given industry

US 9,183,286 B2
Filed: 06/03/2008
Issued: 11/10/2015
Est. Priority Date: 02/13/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A method for use with at least one keyword retrieved from a first set of documents, wherein the keyword corresponds to a predefined subject matter, the method comprising:

constructing snippets from textual material in said first set of documents stored on a computer to produce constructed snippets, each of said constructed snippets including at least one non-key word appearing within a specified text distance of said at least one keyword;

defining, by a computer processor, a plurality of categories wherein each of said constructed snippets is assigned to one of said plurality of categories, only if said assigned snippet is not already assigned to another of said plurality of said categories, each of said plurality of categories being designated for receiving similar constructed snippets;

creating a respective mathematical model for each of said plurality of categories;

analyzing a second set of documents to determine an assignment for each document in said second set of documents to a selected one of said plurality of categories, said assignment based on matching each of said documents in said second set of documents to said mathematical model for the selected one of said plurality of categories;

assigning a numeric vector to each document of the first and second sets of documents, wherein the numeric vector represents occurrences of one of the constructed snippets within the respective document;

creating a partition taxonomy that includes less than all of the plurality of categories, wherein the partition taxonomy creation is based on a clustered configuration of the first and second sets of documents;

editing, using a computer processor, less than all of the plurality of categories in the partition taxonomy using domain expertise to produce edited categories in an edited partition taxonomy, such that each document of the first and second sets of documents is assigned to a corresponding one of the less than all of the plurality of categories;

creating a classification taxonomy based on the edited partition taxonomy, based on a number of documents in each of the edited categories, based on percentage similarity of words between documents in one of the edited categories, and based on distances between category centroids of the edited categories, wherein a category centroid for an edited category is an average of values of the numeric vectors for the documents in the category;

identifying at least one white space in said classification taxonomy, said at least one white space including one or more of the edited categories that contain fewer than a specified number of documents.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for analyzing predefined subject matter in a patent database being for use with a set of target patents, each target patent related to the predefined subject matter, the method comprising: creating a feature space based on frequently occurring terms found in the set of target patents; creating a partition taxonomy based on a clustered configuration of the feature space; editing the partition taxonomy using domain expertise to produce an edited partition taxonomy; creating a classification taxonomy based on structured features present in the edited partition taxonomy; creating a contingency table by comparing the edited partition taxonomy and the classification taxonomy to provide entries in the contingency table; and identifying all significant relationships in the contingency table to help determine the presence of any white space.

41 Citations

View as Search Results

21 Claims

1. A method for use with at least one keyword retrieved from a first set of documents, wherein the keyword corresponds to a predefined subject matter, the method comprising:
- constructing snippets from textual material in said first set of documents stored on a computer to produce constructed snippets, each of said constructed snippets including at least one non-key word appearing within a specified text distance of said at least one keyword;
  
  defining, by a computer processor, a plurality of categories wherein each of said constructed snippets is assigned to one of said plurality of categories, only if said assigned snippet is not already assigned to another of said plurality of said categories, each of said plurality of categories being designated for receiving similar constructed snippets;
  
  creating a respective mathematical model for each of said plurality of categories;
  
  analyzing a second set of documents to determine an assignment for each document in said second set of documents to a selected one of said plurality of categories, said assignment based on matching each of said documents in said second set of documents to said mathematical model for the selected one of said plurality of categories;
  
  assigning a numeric vector to each document of the first and second sets of documents, wherein the numeric vector represents occurrences of one of the constructed snippets within the respective document;
  
  creating a partition taxonomy that includes less than all of the plurality of categories, wherein the partition taxonomy creation is based on a clustered configuration of the first and second sets of documents;
  
  editing, using a computer processor, less than all of the plurality of categories in the partition taxonomy using domain expertise to produce edited categories in an edited partition taxonomy, such that each document of the first and second sets of documents is assigned to a corresponding one of the less than all of the plurality of categories;
  
  creating a classification taxonomy based on the edited partition taxonomy, based on a number of documents in each of the edited categories, based on percentage similarity of words between documents in one of the edited categories, and based on distances between category centroids of the edited categories, wherein a category centroid for an edited category is an average of values of the numeric vectors for the documents in the category;
  
  identifying at least one white space in said classification taxonomy, said at least one white space including one or more of the edited categories that contain fewer than a specified number of documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1 wherein said second set of documents is directed to patents.
  - 3. The method of claim 1 wherein each of said mathematical models for each of said plurality of categories comprises a numeric vector space.
  - 4. The method of claim 1 wherein said analyzing step is performed using information from said predefined subject matter.
  - 5. The method of claim 1, including:
    - analyzing the predefined subject matter in a patent database, the method being for use with a set of target patents stored on a computer storage device, each of said target patents corresponding to the predefined subject matter;
      
      creating a feature space based on terms found in said set of target patents stored on the computer storage device;
      
      creating the partition taxonomy based on a clustered configuration of said feature space;
      
      creating a contingency table by comparing said edited partition taxonomy and said classification taxonomy to provide entries in said contingency table; and
      
      identifying relationships in said contingency table which help determine the presence of an area in a corporate portfolio in which no intellectual property exists within the classification taxonomy.
  - 6. The method of claim 5 comprising assembling the set of target patents by:
    - retrieving a set of initial patents from the patent database, each initial patent containing at least one word representative of the predefined subject matter;
      
      reviewing said initial patents to derive one or more terms of interest;
      
      retrieving a set of secondary patents, each of said secondary patents containing at least one of said one or more terms of interest; and
      
      merging said set of initial patents with said set of secondary patents to produce said set of target patents.
  - 7. The method of claim 6, including at least one of a search operation and a query operation.
  - 8. The method of claim 6 wherein said feature space comprises one or more of a structured feature, an unstructured feature, and an annotation.
  - 9. The method of claim 8 wherein said structured feature comprises at least one term from one of said initial patents.
  - 10. The method of claim 8 wherein said unstructured feature comprises a textual segment from one of said initial patents.
  - 11. The method of claim 8 wherein said annotation comprises a structured feature derived from said unstructured feature.
  - 12. The method of claim 5 wherein said step of creating a partition taxonomy comprises:
    - analyzing each of said target patents to derive a count of occurrences of feature space terms within each said target patent; and
      
      partitioning said set of target patents into a plurality of patent clusters, each of said patent clusters including target patents having occurrences of feature space terms determined to be similar to one another.
  - 13. The method of claim 5 including at least one of the following:
    - deleting a taxonomy category selected for deletion;
      
      merging two or more taxonomy categories selected for consolidation; and
      
      creating a new taxonomy category.
  - 14. The method of claim 5 further comprising storing new domain knowledge in a knowledge database as a serialized object.
  - 15. The method of claim 5 further comprising performing a time dimension analysis on at least one of said entries in said contingency table.
  - 16. The method of claim 1, including:
    - storing a set of customer patents, each of said customer patents corresponding to business needs of the customer;
      
      creating a first taxonomy, using a computer processor, for said set of customer patents;
      
      creating a second taxonomy for said set of customer patents; and
      
      creating a contingency table by comparing said first taxonomy to said second taxonomy, said contingency table providing an indication of one or more relationships of interest for the customer.
  - 17. The method of claim 16 wherein said first taxonomy is based on a patent classification system and said second taxonomy is based on web page information.
  - 18. The method of claim 16 further comprising providing a classification model for classifying a given text into a technology category in said second taxonomy.

19. A computer program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method comprising the steps of:
- assembling a first set of target documents and a second set of target documents using one or more keywords, each of said first set of target documents and the second set of target documents including a predefined subject matter;
  
  analyzing each of said first set of target documents and second set of target documents to derive a count of occurrences of said keywords in each of said first set of target documents and second set of target documents;
  
  partitioning said first set of target documents and the second set of target documents into a plurality of categories based on non-key words or phrases appearing within a specified distance of one of said keywords;
  
  determining a centroid for each category of the plurality of categories as an average of values of numeric vectors representing features of the non-key words and the keywords in the category of the plurality of categories, wherein each numeric vector represents occurrences of one of the non-key words or one of the keywords within one of the first set of target documents or the second set of target documents;
  
  creating a first taxonomy for the first set of target documents, wherein the creation of the first taxonomy is based on less than all of the plurality of categories;
  
  creating a second taxonomy for the second set of target documents, wherein the second taxonomy is based on a number of documents in each of the plurality of categories, percentage similarity of the non-key words and the keywords between different ones of the second set of target documents in one of the plurality of categories, and percentage difference that the centroids for the plurality of categories differ from one another;
  
  creating a third taxonomy that includes both the first set of target documents and the second set of target documents, wherein the creation of the third taxonomy is based on both the first taxonomy and the second taxonomy, such that each one of the first set of target documents and each one of the second set of target documents are assigned to a corresponding category among the plurality of categories, wherein a total number of the plurality of categories is based on a combined size of the first set of target documents and the second set of target documents; and
  
  merging a selected two of the plurality of categories that have a closeness of centroids between the selected two of the plurality of categories such that the closeness is less than a threshold.
- View Dependent Claims (20)
- - 20. The computer program storage device of claim 19 wherein said method includes:
    - creating a contingency table by comparing said first taxonomy to said third taxonomy; and
      
      accepting input for applying domain expertise to recognize one or more relationships of interest in said contingency table.

21. A computer program product stored on a non-transitory computer storage medium for use with at least one keyword retrieval from a first set of documents corresponding to a predefined subject matter, wherein when executed on a computer the program product causes the computer to:
- construct snippets from textual material in said first set of documents, each of said constructed snippets including at least one non-key word appearing within a specified text distance of said at least one keyword;
  
  define a plurality of categories wherein each of said constructed snippets is assigned to one of said plurality of categories, only if said assigned snippet is not already assigned to another of said plurality of said categories, each of said plurality of categories designated for receiving at least one of said constructed snippets;
  
  create a mathematical model for each of said plurality of categories;
  
  analyze a second set of documents to determine an assignment for each document in said second set of documents to a first one of said plurality of categories, said assignment based on matching each of said documents in said second set of documents to said created mathematical models for the first one of said plurality of categories;
  
  analyze a third set of documents to determine an assignment for each document in said third set of documents to a second respective one of said plurality of categories, said assignment based on matching each of said documents in said third set of documents to said created mathematical model for the second respective one of said plurality of categories, wherein a total number of the categories is generated based on a size of the third set of documents;
  
  determine a centroid for each category of the plurality of categories as an average of values of numeric vectors representing features of both key and non-key words in the category of the plurality of categories, wherein each of the numeric vectors represents occurrences of one of the snippets within one of the first set of documents;
  
  perform interactive clustering of the plurality of categories using domain expertise;
  
  merge two of the plurality of categories that have a closeness of centroids between the plurality of categories such that the closeness is less than a threshold; and
  
  identify at least one white space in said second set of documents, said at least one white space including all of said plurality of categories, including the merged categories, with fewer than a specified number of documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GlobalFoundries, Inc.
Original Assignee
GLOBALFOUNDRIES U.S. 2 LLC (GlobalFoundries, Inc.)
Inventors
Chen, Ying, Kreulen, Jeffrey Thomas, Rhodes, James J., Spangler, William Scott
Primary Examiner(s)
Ahn, Sangwoo

Application Number

US12/132,561
Publication Number

US 20080235220A1
Time in Patent Office

2,716 Days
Field of Search

707/708, 707/710, 707/731, 707/738, 707/749
US Class Current

1/1
CPC Class Codes

G06F 16/353   into predefined classes

G06F 2216/11   Patent retrieval

Y10S 707/923   Intellectual property

Methodologies and analytics tools for identifying white space opportunities in a given industry

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Methodologies and analytics tools for identifying white space opportunities in a given industry

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others