Detection of spam using contextual analysis of data sources
First Claim
1. A computer implemented method for identifying business listings, the method comprising:
- determining, using one or more processors, a first frequency value of a business listing characteristic within a first plurality of business listings received from a first source, the first plurality of business listings being associated with a particular business listing context;
determining, using the one or more processors, a second frequency value of the business listing characteristic within a second plurality of business listings received from a second source, the second plurality of business listings being associated with the particular business listing context;
determining, using the one or more processors, a frequency differential between the first frequency value and the second frequency value;
in response to the frequency differential exceeding a threshold differential, identifying, using the one or more processors, the business listing characteristic as a differential characteristic; and
identifying, using the one or more processors, a particular business listing of the plurality of business listings as a spam listing using the differential characteristic.
2 Assignments
0 Petitions
Accused Products
Abstract
Aspects of the disclosure provide for detection of spam business listings. Aspects operate to identify business listing characteristics in trusted sources and untrusted sources. As untrusted sources are likely to contain more spam, characteristics that are present in untrusted sources but not present in trusted sources are typically indicative of spam listings, and vice versa. Thus, statistical analysis of the frequency of characteristics within each source may be used to identify common characteristics of spam listings. These characteristics may further be analyzed in specific listing contexts, as different listing contexts (e.g., different types of businesses) typically use different terms and vocabularies, such that terms that are indicative of spam in one context may not be indicative of spam in another. Various methods for leveraging this context-specific statistical information to improve spam detection operations are disclosed.
-
Citations
20 Claims
-
1. A computer implemented method for identifying business listings, the method comprising:
-
determining, using one or more processors, a first frequency value of a business listing characteristic within a first plurality of business listings received from a first source, the first plurality of business listings being associated with a particular business listing context; determining, using the one or more processors, a second frequency value of the business listing characteristic within a second plurality of business listings received from a second source, the second plurality of business listings being associated with the particular business listing context; determining, using the one or more processors, a frequency differential between the first frequency value and the second frequency value; in response to the frequency differential exceeding a threshold differential, identifying, using the one or more processors, the business listing characteristic as a differential characteristic; and identifying, using the one or more processors, a particular business listing of the plurality of business listings as a spam listing using the differential characteristic. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A processing system for identifying business listings, the processing system comprising:
-
one or more computing devices, each of the one or more computing devices having one or more processors; and a memory, coupled to the one or more processors, for storing business listings; wherein the one or more computing devices is configured to; determine a first frequency value of a business listing characteristic within a first plurality of business listings received from a first source, the first plurality of business listings being associated with a particular business listing context; determine a second frequency value of the business listing characteristic within a second plurality of business listings received from a second source, the second plurality of business listings being associated with the particular business listing context; determine a frequency differential between the first frequency value and the second frequency value; in response to the frequency differential exceeding a threshold differential, identify the business listing characteristic as a differential characteristic; and identify a particular business listing of the plurality of business listings as a spam listing using the differential characteristic. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory, computer-readable medium on which instructions are stored, the instructions, when executed by one or more computing devices performs a method, the method comprising:
-
determining a first frequency value of a business listing characteristic within a first plurality of business listings received from a first source, the first plurality of business listings being associated with a particular business listing context; determining a second frequency value of the business listing characteristic within a second plurality of business listings received from a second source, the second plurality of business listings being associated with the particular business listing context; determining a frequency differential between the first frequency value and the second frequency value; in response to the frequency differential exceeding a threshold differential, identifying the business listing characteristic as a differential characteristic; and identifying a particular business listing of the plurality of business listings as a spam listing using the differential characteristic. - View Dependent Claims (18, 19, 20)
-
Specification