Concept-based searching of unstructured objects
First Claim
Patent Images
1. A computer-readable storage medium, comprising code representing instructions to cause a processor to:
- identify a plurality of concepts present in an unstructured object present in a corpus of unstructured objects;
define a Gaussian distribution representing a number of occurrences of each concept in the plurality of concepts present in the unstructured object;
calculate a weighted value for a first concept from the plurality of concepts, the weighted value being based at least in part on at least one of;
a number of occurrences of the first concept in the unstructured object;
a ratio of a number of categories in which the first concept occurs to a total number of all categories;
a ratio of a frequency of occurrence of the first concept in the unstructured object to a frequency of occurrence of the first concept in the corpus;
ora ratio of a number of occurrences of the first concept in the unstructured object to a total number of all concepts, including the plurality of concepts, that occur in the unstructured object;
determine that the weighted value is greater than a first threshold value and less than a second threshold value, the first threshold value being five or fewer standard deviations below a mean weighted value of the Gaussian distribution, the second threshold value being five or fewer standard deviations above the mean weighted value of the Gaussian distribution; and
identify the first concept as a key concept associated with the unstructured object, the key concept representing a meaning of the unstructured object.
8 Assignments
0 Petitions
Accused Products
Abstract
A method, operating model, system, data structure, computer program and computer program product for analyzing and categorizing and exploring or querying unstructured information and for tracking trends and exceptions. Method for analytical processing of unstructured objects in a dimensional space. Method for tracking trends in concepts. Method for tracking exceptions in concepts. Tools and interface for displaying concepts, query results, tracked trends and exceptions.
-
Citations
18 Claims
-
1. A computer-readable storage medium, comprising code representing instructions to cause a processor to:
-
identify a plurality of concepts present in an unstructured object present in a corpus of unstructured objects; define a Gaussian distribution representing a number of occurrences of each concept in the plurality of concepts present in the unstructured object; calculate a weighted value for a first concept from the plurality of concepts, the weighted value being based at least in part on at least one of; a number of occurrences of the first concept in the unstructured object; a ratio of a number of categories in which the first concept occurs to a total number of all categories; a ratio of a frequency of occurrence of the first concept in the unstructured object to a frequency of occurrence of the first concept in the corpus;
ora ratio of a number of occurrences of the first concept in the unstructured object to a total number of all concepts, including the plurality of concepts, that occur in the unstructured object; determine that the weighted value is greater than a first threshold value and less than a second threshold value, the first threshold value being five or fewer standard deviations below a mean weighted value of the Gaussian distribution, the second threshold value being five or fewer standard deviations above the mean weighted value of the Gaussian distribution; and identify the first concept as a key concept associated with the unstructured object, the key concept representing a meaning of the unstructured object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable storage medium, comprising code representing instructions to cause a processor to:
-
identify a plurality of concepts present in an unstructured object present in a corpus of unstructured objects; define a reduced dimensionality vector representation for the unstructured object, the reduced dimensionality vector representation including at least one dimension corresponding to a seed concept from the plurality of concepts and a second concept from the plurality of concepts, the second concept being related to the seed concept, the at least one dimension having a dimension value based at least in part on a number of occurrences of the seed concept in the unstructured object; calculate a weighted value for the at least one dimension, the weighted value being based at least in part on the dimension value; define a Gaussian distribution representing a number of occurrences of each concept from the plurality of concepts present in the unstructured object; determine that the weighted value is greater than a first threshold value and less than a second threshold value, the first threshold value being a five or fewer standard deviations below a mean weighted value of the Gaussian distribution, the second threshold value being five or fewer standard deviations above the mean weighted value of the Gaussian distribution; and identify the seed concept as a key concept associated with the unstructured object, the key concept representing a theme of the unstructured object. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
Specification