SYSTEMS AND METHODS FOR RECOGNIZING AMBIGUITY IN METADATA
First Claim
1. A method for estimating artist ambiguity in a dataset, comprising:
- at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors;
applying a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier; and
providing a report of the first dataset, including the calculated probabilities, to a user of the electronic device;
wherein each respective feature vector includes features selected from the group consisting of;
whether the corresponding respective artist identifier matches multiple artist entries in one or more second datasets;
whether a respective number of countries of registration of media items associated with the corresponding respective artist identifier exceeds a predetermined country threshold;
whether a respective number of characters in the corresponding respective artist identifier exceeds a predetermined character threshold;
whether a respective number of record labels associated with the corresponding respective artist identifier exceeds a predetermined label threshold;
whether the corresponding respective artist identifier is associated with albums in at least two different languages; and
whether a difference between an earliest release date and a latest release date of media items associated with the corresponding respective artist identifier exceeds a predetermined time span threshold.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for estimating artist ambiguity in a dataset is performed at a device with a processor and memory storing instructions for execution by the processor. The method includes applying a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier. The method further includes providing a report of the first dataset, including the calculated probabilities, to a user of the electronic device. Each respective feature vector includes a plurality of features that indicate likelihood of artist ambiguity.
223 Citations
22 Claims
-
1. A method for estimating artist ambiguity in a dataset, comprising:
-
at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors; applying a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier; and providing a report of the first dataset, including the calculated probabilities, to a user of the electronic device; wherein each respective feature vector includes features selected from the group consisting of; whether the corresponding respective artist identifier matches multiple artist entries in one or more second datasets; whether a respective number of countries of registration of media items associated with the corresponding respective artist identifier exceeds a predetermined country threshold; whether a respective number of characters in the corresponding respective artist identifier exceeds a predetermined character threshold; whether a respective number of record labels associated with the corresponding respective artist identifier exceeds a predetermined label threshold; whether the corresponding respective artist identifier is associated with albums in at least two different languages; and whether a difference between an earliest release date and a latest release date of media items associated with the corresponding respective artist identifier exceeds a predetermined time span threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for estimating artist ambiguity in a dataset, comprising:
-
at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors; applying a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier; and providing a report of the first dataset, including the calculated probabilities, to a user of the electronic device; wherein each respective feature vector includes features selected from the group consisting of; whether the corresponding respective artist identifier matches multiple artist entries in one or more second datasets; a respective number of countries of registration of media items associated with the corresponding respective artist identifier; a respective number of characters in the corresponding respective artist identifier; a respective number of record labels associated with the corresponding respective artist identifier; a respective number of languages of albums associated with the corresponding respective artist identifier; and a respective difference between an earliest release date and a latest release date of media items associated with the corresponding respective artist identifier.
-
-
16. A method for estimating artist ambiguity in a dataset, comprising:
at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors; applying a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier. - View Dependent Claims (17, 18, 19, 20)
-
21. A computer system, comprising:
-
one or more processors; and memory storing one or more programs for execution by the one or more processors, the one or more programs including instructions for; applying a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier; and providing a report of the first dataset, including the calculated probabilities, to a user of the electronic device; wherein each respective feature vector includes features selected from the group consisting of; whether the corresponding respective artist identifier matches multiple artist entries in one or more second datasets; whether a respective number of countries of registration of media items associated with the corresponding respective artist identifier exceeds a predetermined country threshold; whether a respective number of characters in the corresponding respective artist identifier exceeds a predetermined character threshold; whether a respective number of record labels associated with the corresponding respective artist identifier exceeds a predetermined label threshold; whether the corresponding respective artist identifier is associated with albums in at least two different languages; and whether a difference between an earliest release date and a latest release date of media items associated with the corresponding respective artist identifier exceeds a predetermined time span threshold.
-
-
22. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which, when executed by a portable electronic device or a computer system with one or more processors, cause the device to:
-
apply a statistical classifier to a first dataset including a plurality of media items, wherein each media item is associated with one of a plurality of artist identifiers, each artist identifier identifies a real world artist, and the statistical classifier calculates a respective probability that each respective artist identifier is associated with media items from two or more different real world artists based on a respective feature vector corresponding to the respective artist identifier; and provide a report of the first dataset, including the calculated probabilities, to a user of the electronic device; wherein each respective feature vector includes features selected from the group consisting of; whether the corresponding respective artist identifier matches multiple artist entries in one or more second datasets; whether a respective number of countries of registration of media items associated with the corresponding respective artist identifier exceeds a predetermined country threshold; whether a respective number of characters in the corresponding respective artist identifier exceeds a predetermined character threshold; whether a respective number of record labels associated with the corresponding respective artist identifier exceeds a predetermined label threshold; whether the corresponding respective artist identifier is associated with albums in at least two different languages; and whether a difference between an earliest release date and a latest release date of media items associated with the corresponding respective artist identifier exceeds a predetermined time span threshold.
-
Specification