CONCEPTUAL DOCUMENT ANALYSIS AND CHARACTERIZATION
First Claim
1. A method comprising:
- receiving a plurality of data files from a plurality of data sources that comprise textual content;
categorizing the plurality of data files using a taxonomy of categories in which each category has associated sample textual content defining a concept for the category, the categorizing comprising, for each category;
comparing, for each of the plurality of data files, the textual content of the data file with the sample textual content for the category;
calculating, based on the comparing and for each of the plurality of data files, a file score corresponding to the degree of similarity between the defined concept of the category and a determined concept for the data file; and
associating, for each of the plurality of data files, the data file with the category if the file score is equal to or greater than a pre-determined minimum score for the category; and
providing at least a portion of the data file and/or the associated file score.
10 Assignments
0 Petitions
Accused Products
Abstract
Data files are received from data sources that include textual content. The data files are categorized using a taxonomy of categories, where each category has sample textual content that defines a concept for the category. The categorizing includes comparing the textual content of the data file with the sample textual content for the category. A file score is calculated for each data file to compare the degree of similarity between the defined concept of the category and a determined concept for the data file. Each data file is associated with the category if the file score is equal to or greater than a pre-determined minimum score for the category. A portion of the data file and/or file score is be provided.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving a plurality of data files from a plurality of data sources that comprise textual content; categorizing the plurality of data files using a taxonomy of categories in which each category has associated sample textual content defining a concept for the category, the categorizing comprising, for each category; comparing, for each of the plurality of data files, the textual content of the data file with the sample textual content for the category; calculating, based on the comparing and for each of the plurality of data files, a file score corresponding to the degree of similarity between the defined concept of the category and a determined concept for the data file; and associating, for each of the plurality of data files, the data file with the category if the file score is equal to or greater than a pre-determined minimum score for the category; and providing at least a portion of the data file and/or the associated file score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer program product storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:
-
receiving a plurality of data files from a plurality of data sources that comprise textual content; categorizing the plurality of data files using a taxonomy of categories in which each category has associated sample textual content defining a concept for the category, the categorizing comprising, for each category; comparing, for each of the plurality of data files, the textual content of the data file with the sample textual content for the category; calculating, based on the comparing and for each of the plurality of data files, a file score corresponding to the degree of similarity between the defined concept of the category and a determined concept for the data file; and associating, for each of the plurality of data files, the data file with the category if the file score is equal to or greater than a pre-determined minimum score for the category; and providing at least a portion of the data file and/or the associated file score. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A system comprising:
-
at least one programmable data processor; and memory storing instructions which, when executed by the at least one programmable data processor, result in operations comprising; receiving a plurality of data files from a plurality of data sources that comprise textual content; categorizing the plurality of data files using a taxonomy of categories in which each category has associated sample textual content defining a concept for the category, the categorizing comprising, for each category; comparing, for each of the plurality of data files, the textual content of the data file with the sample textual content for the category; calculating, based on the comparing and for each of the plurality of data files, a file score corresponding to the degree of similarity between the defined concept of the category and a determined concept for the data file; and associating, for each of the plurality of data files, the data file with the category if the file score is equal to or greater than a pre-determined minimum score for the category; and providing at least a portion of the data file and/or the associated file score. - View Dependent Claims (18, 19, 20)
-
Specification