Deriving ontology based on linguistics and community tag clouds
First Claim
1. A method comprising:
- receiving a tag cloud including a plurality of tags that hyperlink to web content, wherein the tag cloud is chosen by a user and wherein each of the plurality of tags are chosen by users of the web content from among words appearing in the web content;
separating each of the plurality of tags into linguistic categories;
assigning a weight to each of the plurality of tags, wherein the weight is based on a number of times the tag is selected by the users of the web content plus a number of times the tag appears as a title in the web content divided by a number of times the tag appears in the web content;
grouping at least a first subset of the plurality of tags that are associated with a noun linguistic category into tag clusters, wherein each tag in each tag cluster is associated with a common context;
determining a domain for each of the tag clusters, wherein each domain defines one or more of the tags of the noun linguistic category that belong to the tag cluster;
for each of the first subset of the plurality of tags that are associated with the noun linguistic category, determining, in accordance with the weights of the tags, a weighted ontology tree for the tags based on results from a visual thesaurus;
for each of a second subset of the plurality of tags that are associated with a verb linguistic category, identifying linguistic relationships between each tag of the second subset of the plurality of tags and each of the domains; and
determining properties associated with one or more of the plurality of tags and one or more of the domains, wherein the properties are determined using linguistic analysis.
1 Assignment
0 Petitions
Accused Products
Abstract
In some embodiments, a method comprises receiving a tag cloud including tags that hyperlink to web content. The method can also comprise separating the tags into different linguistic categories, assigning a weight to each tag, and grouping the tags into clusters, wherein tags in a cluster are associated with a context. The method can also include determining one or more domains for the tag clusters, wherein a domain is a broadest class that defines one or more of the tags in a linguistic category, determining a hierarchy for the tags based on the weights of the tags, and identifying linguistic relationships between the tags. The method can also comprise determining properties associated with one or more of the tags and one or more of the domains, wherein the tag'"'"'s properties are determined using linguistic analysis and storing the tags, the hierarchies, the linguistic relationships, and the properties.
-
Citations
25 Claims
-
1. A method comprising:
-
receiving a tag cloud including a plurality of tags that hyperlink to web content, wherein the tag cloud is chosen by a user and wherein each of the plurality of tags are chosen by users of the web content from among words appearing in the web content; separating each of the plurality of tags into linguistic categories; assigning a weight to each of the plurality of tags, wherein the weight is based on a number of times the tag is selected by the users of the web content plus a number of times the tag appears as a title in the web content divided by a number of times the tag appears in the web content; grouping at least a first subset of the plurality of tags that are associated with a noun linguistic category into tag clusters, wherein each tag in each tag cluster is associated with a common context; determining a domain for each of the tag clusters, wherein each domain defines one or more of the tags of the noun linguistic category that belong to the tag cluster; for each of the first subset of the plurality of tags that are associated with the noun linguistic category, determining, in accordance with the weights of the tags, a weighted ontology tree for the tags based on results from a visual thesaurus; for each of a second subset of the plurality of tags that are associated with a verb linguistic category, identifying linguistic relationships between each tag of the second subset of the plurality of tags and each of the domains; and determining properties associated with one or more of the plurality of tags and one or more of the domains, wherein the properties are determined using linguistic analysis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a tag cloud linguistic analyzer configured to receive a tag cloud including a plurality of tags that hyperlink to web content, wherein the tag cloud is chosen by a user and wherein each of the plurality of tags are chosen by users of the web content from among words appearing in the web content, the tag cloud linguistic analyzer configured to separate each of the plurality of tags into linguistic categories, and to assign a weight to each of the plurality of tags, wherein the weight is based on a number of times the tag is selected by the users of the web content plus a number of times the tag appears as a title in the web content divided by a number of times the tag appears in the web content; a semantic domain analyzer configured to group at least a first subset of the plurality of tags that are associated with a noun linguistic category into tag clusters, wherein tags in each tag cluster are associated with a common context, and to determine a domain for each of the tag clusters, wherein each domain defines one or more of the tags of the noun linguistic category that belong to the tag cluster; a taxonomy builder configured to determine, for each of the first subset of the plurality of tags that are associated with the noun linguistic category and in accordance with the weights of the tags, a weighted ontology tree for the subset of the plurality of tags based on results from a visual thesaurus; a relationship analyzer configured to identify, for each of a second subset of the plurality of tags that are associated with a verb linguistic category, linguistic relationships between each tag of the second subset of the plurality of tags and each of the domains; an attribute analyzer configured to determine properties associated with one or more of the plurality of tags and one or more of the domains, wherein the properties are determined using linguistic analysis; an ontology repository to store the tags, the hierarchies, the linguistic relationships, and the properties; and a processor configured to execute one or more of the tag cloud linguistic analyzer, the semantic domain analyzer, the taxonomy builder, the relationship analyzer, the attribute analyzer, and the ontology repository. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. One or more machine-readable storage devices having stored therein a program product, which when executed by a set of one or more processor units causes the set of one or more processor units to perform operations comprising:
-
receiving a tag cloud including a plurality of tags that hyperlink to web content, wherein the tag cloud is chosen by a user and wherein each of the plurality of tags are chosen by users of the web content from among words appearing in the web content; separating each of the plurality of tags into linguistic categories; assigning a weight to each of the plurality of tags, wherein the weight is based on a number of times the tag is selected by the users of the web content plus a number of times the tag appears as a title in the web content divided by a number of times the tag appears in the web content; grouping at least a first subset of the plurality of tags that are associated with a noun linguistic category into tag clusters, wherein each tag in each tag cluster is associated with a common context; determining a domain for each of the tag clusters, wherein each domain defines one or more of the tags of the noun linguistic category that belong to the tag cluster; for each of the first subset of the plurality of tags that are associated with the noun linguistic category, determining, in accordance with the weights of the tags, a weighted ontology tree for the first subset of the plurality of tags based on results from a visual thesaurus; for each of a second subset of the plurality of tags that are associated with a verb linguistic category, identifying linguistic relationships between each tag of the second subset of the plurality of tags and each of the domains; and determining properties associated with one or more of the plurality of tags and one or more of the domains, wherein the properties are determined using linguistic analysis. - View Dependent Claims (23, 24, 25)
-
Specification