Automatic data categorization with optimally spaced semantic seed terms
First Claim
Patent Images
1. A method, comprising:
- retrieving, by a computing device including a processor, a document set in response to receiving a query;
parsing a semantic distance network according to a plurality of terms in the document set into a plurality of semantic term groups;
determining respective balanced desirability values for respective semantic term groups of the plurality of semantic term groups, wherein the respective balanced desirability values comprise respective measures of prevalence of the respective semantic term groups as a function of respective measures of closeness of respective terms in the respective semantic term groups; and
selecting a semantic term group of the respective semantic term groups as a category descriptor based on a balanced desirability value associated with the semantic term group.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and system for automatic data categorization in response to a user query. A document set is retrieved in response to the user query. A semantic parser parses the document set and produces semantic term-groups by parsing a semantic network of nodes. A seed ranker produces a plurality of advantageously spaced semantic seeds based on the semantic term-groups. A category accumulator stores the advantageously spaced semantic seeds. The semantic network of nodes is augmented with the advantageously spaced semantic seeds.
110 Citations
26 Claims
-
1. A method, comprising:
-
retrieving, by a computing device including a processor, a document set in response to receiving a query; parsing a semantic distance network according to a plurality of terms in the document set into a plurality of semantic term groups; determining respective balanced desirability values for respective semantic term groups of the plurality of semantic term groups, wherein the respective balanced desirability values comprise respective measures of prevalence of the respective semantic term groups as a function of respective measures of closeness of respective terms in the respective semantic term groups; and selecting a semantic term group of the respective semantic term groups as a category descriptor based on a balanced desirability value associated with the semantic term group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
means for retrieving a document set in response to a query; means for parsing a semantic distance network according to a set of terms in the document set into a set of semantic term groups; means for determining respective term values for respective semantic term groups of the set of semantic term groups, wherein the respective term values comprise respective measures of prevalence of the respective semantic term groups as a function of respective measures of closeness of respective terms in the respective semantic term groups; means for selecting a semantic term group of the respective semantic term groups as a category descriptor based on a term value associated with the semantic term group; and means for outputting the semantic term group in response to the semantic term group being selected. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium having stored thereon computer executable instructions that, in response to execution by at least one computing device, cause the at least one computing device to perform operations comprising:
-
retrieving a document set in response to receiving a request; parsing a semantic distance network according to a plurality of terms in the document set into a plurality of semantic term groups; determining respective optimization values for respective semantic term groups of the plurality of semantic term groups, wherein the respective optimization values comprise respective measures of dominance of the respective semantic term groups divided by respective measures of closeness of respective terms in the respective semantic term groups; selecting a semantic term group as a category descriptor based on an optimization value associated with the semantic term group; and responding to the request with the semantic term group. - View Dependent Claims (18, 19, 20, 21)
-
-
22. An apparatus, comprising:
-
a processor; and a memory communicatively coupled to processor, the memory having stored therein computer-executable instructions, comprising; a semantic parser configured to parse a semantic distance network according to a plurality of terms in a document set into a plurality of semantic term groups; and a seed ranker configured to; determine respective balanced desirability values for respective semantic term groups of the plurality of semantic term groups, wherein the respective balanced desirability values comprise respective measures of prevalence of the respective semantic term groups divided by respective measures of closeness of respective terms in the respective semantic term groups, and select a semantic term group as a category descriptor based on a balanced desirability value associated with the semantic term group. - View Dependent Claims (23, 24, 25, 26)
-
Specification