Computer implemented methods and apparatus for identifying similar labels using collaborative filtering
First Claim
1. A system for identifying similar labels, the system comprising:
- a database system implemented using a server system comprising one or more hardware processors, the database system configurable to cause;
maintaining, through one or more databases, a plurality of data entries, each data entry of a first portion of the data entries identifying;
a text sequence, a label, and a text-to-label association score indicating a number of times that the text sequence appears in one or more previous incoming texts associated with the label, and each data entry of a second portion of the data entries identifying;
a first label, a second label, and a similarity score;
generating a plurality of pairs based on the first portion of data entries, each pair comprising information identifying a first label and a second label;
calculating a similarity score for each of the pairs comprising calculating a collaborative filtering similarity score for the first label and the second label identified by the pair using a first vector of text sequences associated with the first label and a second vector of text sequences associated with the second label, wherein a text sequence is associated with a label when the text sequence appears in a previous incoming text associated with the label; and
updating the second portion of the data entries to identify the pairs and the respective similarity scores;
processing a request for labels having similar associated text sequences;
identifying, based on the pairs and the respective similarity scores, a set of pairs having the same first label; and
selecting a pair of the identified set of pairs as having a higher respective similarity score than one or more other pairs of the identified set of pairs.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are methods, apparatus, systems, and computer-readable storage media for identifying similar labels. In some implementations, one or more servers maintain a plurality of data entries in one or more database tables storing textual data, each data entry of a first portion of the data entries including: a text sequence, a label, and a text-to-label association score, and each data entry of a second portion of the data entries including: a first label, a second label, and a similarity score. The one or more servers analyze the data of the first portion of data entries to generate one or more pairs, each pair including information identifying a first label and a second label. The one or more servers calculate a similarity score for each of the one or more pairs and store the respective similarity scores in the second portion of the data entries.
200 Citations
19 Claims
-
1. A system for identifying similar labels, the system comprising:
a database system implemented using a server system comprising one or more hardware processors, the database system configurable to cause; maintaining, through one or more databases, a plurality of data entries, each data entry of a first portion of the data entries identifying;
a text sequence, a label, and a text-to-label association score indicating a number of times that the text sequence appears in one or more previous incoming texts associated with the label, and each data entry of a second portion of the data entries identifying;
a first label, a second label, and a similarity score;generating a plurality of pairs based on the first portion of data entries, each pair comprising information identifying a first label and a second label; calculating a similarity score for each of the pairs comprising calculating a collaborative filtering similarity score for the first label and the second label identified by the pair using a first vector of text sequences associated with the first label and a second vector of text sequences associated with the second label, wherein a text sequence is associated with a label when the text sequence appears in a previous incoming text associated with the label; and updating the second portion of the data entries to identify the pairs and the respective similarity scores; processing a request for labels having similar associated text sequences; identifying, based on the pairs and the respective similarity scores, a set of pairs having the same first label; and selecting a pair of the identified set of pairs as having a higher respective similarity score than one or more other pairs of the identified set of pairs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
18. One or more computing devices for identifying similar labels to a user, the one or more computing devices comprising:
one or more hardware processors configurable to cause; maintaining, by one or more servers, a plurality of data entries, each data entry of a first portion of the data entries identifying;
a text sequence, a label, and a text-to-label association score indicating a number of times that the text sequence appears in one or more previous incoming texts associated with the label, and each data entry of a second portion of the data entries identifying;
a first label, a second label, and a similarity score;generating a plurality of pairs based on the first portion of data entries, each pair comprising information identifying a first label and a second label; calculating a similarity score for each of the pairs comprising calculating a collaborative filtering similarity score for the first label and the second label identified by the pair using a first vector of text sequences associated with the first label and a second vector of text sequences associated with the second label, wherein a text sequence is associated with a label when the text sequence appears in a previous incoming text associated with the label; and updating the second portion of the data entries to identify the pairs and the respective similarity scores; processing a request for labels having similar associated text sequences; identifying, based on the pairs and the respective similarity scores, a set of pairs having the same first label; and selecting a pair of the identified set of pairs as having a higher respective similarity score than one or more other pairs of the identified set of pairs.
-
19. A non-transitory computer-readable storage medium storing instructions executable by a computing device for identifying similar labels to a user, the instructions being configurable to cause:
-
maintaining, through one or more databases, a plurality of data entries, each data entry of a first portion of the data entries identifying;
a text sequence, a label, and a text-to-label association score indicating a number of times that the text sequence appears in one or more previous incoming texts associated with the label, and each data entry of a second portion of the data entries identifying;
a first label, a second label, and a similarity score;generating a plurality of pairs based on the first portion of data entries, each pair comprising information identifying a first label and a second label; calculating a similarity score for each of the pairs comprising calculating a collaborative filtering similarity score for the first label and the second label identified by the pair using a first vector of text sequences associated with the first label and a second vector of text sequences associated with the second label, wherein a text sequence is associated with a label when the text sequence appears in a previous incoming text associated with the label; and updating the second portion of the data entries to identify the pairs and the respective similarity scores; processing a request for labels having similar associated text sequences; identifying, based on the pairs and the respective similarity scores, a set of pairs having the same first label; and selecting a pair of the identified set of pairs as having a higher respective similarity score than one or more other pairs of the identified set of pairs.
-
Specification