System and Method for Creating Labels for Clusters
First Claim
1. A system for creating at least one label for at least one cluster in a computing environment, the system comprising:
- a processor; and
a memory coupled to the processor, wherein the processor is capable of executing a plurality of modules stored in the memory, and wherein the plurality of modules comprise;
a receiving module configured to receive an input data;
a candidate items selector configured to select a plurality of candidate items occurring repetitively in the input data using a n-gram selection technique for a predefined value of n to generate a sorted list of the plurality of candidate items with a frequency of occurrence of the plurality of candidate items based on the input data;
a combination array generator configured to select a predefined number of the plurality of candidate items from the sorted list of the plurality of candidate items to populate a two-dimensional array having a plurality of elements, wherein each element of the plurality of elements of the two-dimensional array represents a pair of the plurality of candidate items;
a coverage value analyzer configured to determine a coverage value for each pair of the plurality of candidate items present in the two-dimensional array to further populate a sorted two-dimensional array;
a candidate pair selector configured to select a predefined number of pairs of the plurality of candidate items from the sorted two-dimensional array to further process and generate a list of the pairs of the plurality of candidate items;
a unique word filter configured to accept the list of the pairs of the plurality of candidate items to determine a number of unique words in each of the pairs of the plurality of candidate items; and
a cluster label selector configured to sort the list of the pairs of the plurality of candidate items using the coverage value and the number of unique words to create a sorted list of the pairs of the plurality of candidate items for selecting a cluster label from the sorted list of the pairs of the plurality of candidate items.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a method and system for creating labels for cluster in computing environment. The system comprises receiving module, candidate items selector, combination array generator, coverage value analyzer, candidate pair selector, unique word filter and cluster label selector. Receiving module receives input data and candidate items selector selects candidate items occurring repetitively using n-gram technique to generate list of candidate items with frequency of occurrence. Combination array generator selects candidate items to populate two-dimensional array wherein each array element represents pair of n-gram. Coverage value analyzer determines coverage value for each pair of n-gram from array. Candidate pair selector selects pairs of n-gram from two-dimensional array to process and generate list of candidate pairs. The unique word filter determines number of unique words in each candidate pair. Cluster label selector sorts list of candidate pairs using coverage value and number of unique words to select cluster label.
-
Citations
20 Claims
-
1. A system for creating at least one label for at least one cluster in a computing environment, the system comprising:
-
a processor; and a memory coupled to the processor, wherein the processor is capable of executing a plurality of modules stored in the memory, and wherein the plurality of modules comprise; a receiving module configured to receive an input data; a candidate items selector configured to select a plurality of candidate items occurring repetitively in the input data using a n-gram selection technique for a predefined value of n to generate a sorted list of the plurality of candidate items with a frequency of occurrence of the plurality of candidate items based on the input data; a combination array generator configured to select a predefined number of the plurality of candidate items from the sorted list of the plurality of candidate items to populate a two-dimensional array having a plurality of elements, wherein each element of the plurality of elements of the two-dimensional array represents a pair of the plurality of candidate items; a coverage value analyzer configured to determine a coverage value for each pair of the plurality of candidate items present in the two-dimensional array to further populate a sorted two-dimensional array; a candidate pair selector configured to select a predefined number of pairs of the plurality of candidate items from the sorted two-dimensional array to further process and generate a list of the pairs of the plurality of candidate items; a unique word filter configured to accept the list of the pairs of the plurality of candidate items to determine a number of unique words in each of the pairs of the plurality of candidate items; and a cluster label selector configured to sort the list of the pairs of the plurality of candidate items using the coverage value and the number of unique words to create a sorted list of the pairs of the plurality of candidate items for selecting a cluster label from the sorted list of the pairs of the plurality of candidate items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for creating at least one label for at least one cluster in a computing environment, the method comprising:
-
receiving an input data; selecting a plurality of candidate items occurring repetitively in the input data using a n-gram selection technique for a predefined value of n to generate a sorted list of the plurality of candidate items with a frequency of occurrence of the plurality of candidate items; selecting a predefined number of the plurality of candidate items from the sorted list of the plurality of candidate items to populate a two-dimensional array having a plurality of elements, wherein each element of the plurality of elements of the two-dimensional array represents a pair of the plurality of candidate items; determining a coverage value for each pair of the plurality of candidate items from the two-dimensional array to further populate a sorted two-dimensional array; selecting a predefined number of pairs of the plurality of candidate items from the sorted two-dimensional array to further process and generate a list of the pairs of the plurality of candidate items; accepting the list of the pairs of the plurality of candidate items to determine a number of unique words in each of the pairs of the plurality of candidate items; and sorting the list of the pairs of the plurality of candidate items using the coverage value and the number of unique words to create a sorted list of the pairs of the plurality of candidate items for selecting a cluster label form the sorted list of the pairs of the plurality of candidate items; wherein the receiving, the selecting the plurality of candidates, the selecting the predefined number of the plurality of candidate items, the determining the coverage value, the selecting the predefined number of pairs, the accepting the list, and the sorting the list are performed by a processor of a computerized device. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product having embodied thereon a computer program for creating at least one label for at least one cluster, the computer program product comprising:
-
a program code for receiving an input data; a program code for selecting a plurality of candidate items occurring repetitively in the input data using a n-gram selection technique for a predefined value of n to generate a sorted list of the plurality of candidate items with a frequency of occurrence of the plurality of candidate items; a program code for selecting a foremost predefined number of the plurality of candidate items from the sorted list of the plurality of candidate items to populate a two-dimensional array having a plurality of elements, wherein each element of the plurality of elements of the two-dimensional array represents a pair of the plurality of candidate items; a program code for determining a coverage value for each pair of the plurality of candidate items from the two-dimensional array to further sort the two-dimensional array in a descending order of the coverage value for each pair of the plurality of candidate items to populate a sorted two-dimensional array; a program code for selecting a predefined number of pairs of the plurality of candidate items from the sorted two-dimensional array occurring foremost to further process and generate a list of the pairs of plurality of candidate items; a program code for accepting the list of the pairs of the plurality of candidate items to determine a number of unique words in each of the pairs of the plurality of candidate items; and a program code for sorting the list of the pairs of the plurality of candidate items using the coverage value and the number of unique words to create a sorted list of the pairs of the plurality of candidate items for selecting a cluster label form the sorted list of the pairs of the plurality of candidate items. - View Dependent Claims (18, 19, 20)
-
Specification