Concept-based method and system for dynamically analyzing unstructured information
First Claim
1. A system for accessing and analyzing unstructured objects, where the system provides structured information through which a user can access the unstructured objects, the structured information including a set of concepts where each concept comprises at least one word, the system comprising:
- a first storage medium storing at least one unstructured object;
a second storage medium storing an analysis and categorization engine procedure that, when executed, accesses the unstructured objects and generates structured information about the objects;
a third storage medium storing the structured information in a form of at least one relational database data structure, wherein the at least one relational database data structure comprises a relational database table having a global seed concept ID field, a seed concept text field, and a created date field; and
a computer processor accessible to the user, having access to the structured information.
6 Assignments
0 Petitions
Accused Products
Abstract
A method, operating model, system, data structure, computer program and computer program product for analyzing and categorizing unstructured information is provided such that conventional structured data access techniques can be utilized over unstructured objects. A analysis and categorization engine builds a set of concept groupings, each grouping consisting of related words and phrases. The concept groupings are augmented by user input. A set of categories is built. The analysis and categorization engine generates a vector representation of each object based on concepts and utilizes a statistical analysis to select concepts that represent each object and assign objects to categories. Information about users, objects, and categories is stored in an open architecture, such as a relational database. An object concept based search is provided to efficiently locate meaningful objects and to provide for updating of the object categorization based on search entries.
448 Citations
40 Claims
-
1. A system for accessing and analyzing unstructured objects, where the system provides structured information through which a user can access the unstructured objects, the structured information including a set of concepts where each concept comprises at least one word, the system comprising:
-
a first storage medium storing at least one unstructured object; a second storage medium storing an analysis and categorization engine procedure that, when executed, accesses the unstructured objects and generates structured information about the objects; a third storage medium storing the structured information in a form of at least one relational database data structure, wherein the at least one relational database data structure comprises a relational database table having a global seed concept ID field, a seed concept text field, and a created date field; and a computer processor accessible to the user, having access to the structured information.
-
-
2. A system for accessing and analyzing unstructured objects, where the system provides structured information through which a user can access the unstructured objects, the structured information including a set of concepts where each concept comprises at least one word, the system comprising:
-
a first storage medium storing at least one unstructured object; a second storage medium storing an analysis and categorization engine procedure that, when executed, accesses the unstructured objects and generates structured information about the objects; a third storage medium storing the structured information in a form of at least one relational database data structure, wherein the at least one relational database data structure comprises a relational database table having a user ID field, a global seed concept ID field, a related concept ID field, a type of relationship field, and a status field; and a computer processor accessible to the user, having access to the structured information.
-
-
3. A system for accessing and analyzing unstructured objects, where the system provides structured information through which a user can access the unstructured objects, the structured information including a set of concepts where each concept comprises at least one word, the system comprising:
-
a first storage medium storing at least one unstructured object; a second storage medium storing an analysis and categorization engine procedure that, when executed, accesses the unstructured objects and generates structured information about the objects; a third storage medium storing the structured information in a form of at least one relational database data structure, wherein the at least one relational database data structure comprises a relational database table having an object ID field, a concept ID field, a cross-reference time stamp field, a cross-reference type field, an index start time field, and a total hits field; and a computer processor accessible to the user, having access to the structured information.
-
-
4. A system for accessing and analyzing unstructured objects, where the system provides structured information through which a user can access the unstructured objects, the structured information including a set of concepts where each concept comprises at least one word, the system comprising:
-
a first storage medium storing at least one unstructured object; a second storage medium storing an analysis and categorization engine procedure that, when executed, accesses the unstructured objects and generates structured information about the objects; a third storage medium storing the structured information in a form of at least one relational database data structure, wherein the at least one relational database data structure comprises a relational database table having a user object id field, a key concept id field, a probability field, and a rank field; and a computer processor accessible to the user, having access to the structured information.
-
-
5. A system for accessing and analyzing unstructured objects, where the system provides structured information through which a user can access the unstructured objects, the structured information including a set of concepts where each concept comprises at least one word, the system comprising:
-
a first storage medium storing at least one unstructured object; a second storage medium storing an analysis and categorization engine procedure that, when executed, accesses the unstructured objects and generates structured information about the objects; a third storage medium storing the structured information in a form of at least one relational database data structure, wherein the at least one relational database data structure comprises a relational database table having a user ID field, a user object ID field, an object ID field, a user object hierarchy pointer field, and object status field, and an object score field; and a computer processor accessible to the user, having access to the structured information.
-
-
6. A relational database structure, embodied on a computer-readable medium, for storing structured information about an object, the database structure comprising:
at least one relational database table having a global seed concept ID field and a seed concept text field, and a created date field whereby the structured information about the object is made accessible to a user. - View Dependent Claims (7, 8, 9, 10)
-
11. A computer-based method for automatically assigning at least one key concept to represent an unstructured object, comprising:
-
automatically selecting at least one concept from the unstructured object without requiring user input to identify the concept; expanding each selected concept into at least one concept grouping, wherein a concept grouping contains elements consisting of a seed concept and at least one related concept; scoring each concept grouping to indicate the relevance of the concept grouping to the unstructured object; and applying probabilistic analysis to the scores of the concept groupings to identify at least one relevant concept grouping, wherein for each relevant concept grouping, the seed concept of the relevant concept grouping is a key concept of the unstructured object, whereby at least one key concept is assigned to represent the unstructured object. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-based method of processing a set of unstructured objects, comprising:
-
(1) selecting an unstructured object from the set of unstructured objects; (2) automatically selecting at least one concept from the selected unstructured object without requiring user input to identify the concept; (3) expanding each selected concept into at least one concept grouping, wherein a concept grouping contains elements consisting of a seed concept and at least one related concept; (4) scoring each concept grouping to indicate the relevance of the concept grouping to the selected unstructured object; (5) applying probabilistic analysis to the scores of the concept groupings to identify at least one relevant concept grouping, wherein for each relevant concept grouping, the seed concept of the relevant concept grouping is a key concept of the selected unstructured object and the score of the relevant concept grouping is the score of the key concept of the selected unstructured object, whereby at least one key concept is assigned to represent the selected unstructured object; and (6) repeating steps (1)–
(5) for each unstructured object in the set of unstructured objects. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A computer program product comprising a computer useable medium having computer readable program code means embedded in said medium for causing a computer to process a set of unstructured objects, comprising:
-
first computer readable program code means for causing the computer to select an unstructured object from the set of unstructured objects; second computer readable program code means for causing the computer to select at least one concept from the selected unstructured object without requiring user input to identify the concept; third computer readable program code means for causing the computer to expand each selected concept into at least one concept grouping, wherein a concept grouping contains elements consisting of a seed concept and at least one related concept; fourth computer readable program code means for causing the computer to score each concept grouping to indicate the relevance of the concept grouping to the selected unstructured object; fifth computer readable program code means for causing the computer to apply probabilistic analysis to the scores of the concept groupings to identify at least one relevant concept grouping, wherein for each relevant concept grouping, the seed concept of the relevant concept grouping is a key concept of the selected unstructured object and the score of the relevant concept grouping is the score of the key concept of the selected unstructured object, whereby at least one key concept is assigned to represent the selected unstructured object; and sixth computer readable program code means for causing the first through fifth computer readable program code means to assign a key concept to represent each unstructured object in the set of unstructured objects. - View Dependent Claims (35, 36, 37, 38, 39, 40)
-
Specification