Method and system for providing a user agent string database

US 10,025,847 B2
Filed: 11/25/2014
Issued: 07/17/2018
Est. Priority Date: 11/25/2014
Status: Active Grant

First Claim

Patent Images

1. A method, implemented on at least one computing device each of which has at least one processor, storage, and a communication platform connected to a network for determining a keyword from user agent strings, the method comprising:

receiving a plurality of user agent strings;

grouping the plurality of user agent strings into one or more clusters, wherein the one or more clusters comprise a first cluster that includes two or more user agent strings;

comparing the two or more user agent strings in the first cluster;

extracting a longest common subsequence among the two or more user agent strings;

removing the longest common subsequence from each user agent string to obtain a remaining subsequence; and

determining a keyword from the first cluster based on at least one of the longest common subsequence and the remaining subsequence, wherein the keyword represents a type of user agent information.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method, system, and programs for determining a keyword from user agent strings are disclosed. In one example, a plurality of user agent strings is received. The plurality of user agent strings is grouped into one or more clusters. The one or more clusters comprise a first cluster that includes two or more user agent strings. The two or more user agent strings in the first cluster are compared. Based on the comparing, a keyword is determined from the first cluster. The keyword represents a type of user agent information.

Citations

17 Claims

1. A method, implemented on at least one computing device each of which has at least one processor, storage, and a communication platform connected to a network for determining a keyword from user agent strings, the method comprising:
- receiving a plurality of user agent strings;
  
  grouping the plurality of user agent strings into one or more clusters, wherein the one or more clusters comprise a first cluster that includes two or more user agent strings;
  
  comparing the two or more user agent strings in the first cluster;
  
  extracting a longest common subsequence among the two or more user agent strings;
  
  removing the longest common subsequence from each user agent string to obtain a remaining subsequence; and
  
  determining a keyword from the first cluster based on at least one of the longest common subsequence and the remaining subsequence, wherein the keyword represents a type of user agent information.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the grouping comprises:
    - assigning each user agent string into a cluster;
      
      calculating a distance between each pair of clusters;
      
      identifying a pair of clusters having a minimum distance;
      
      merging the pair of clusters into one cluster if the minimum distance is less than a threshold; and
      
      repeating the calculating, the identifying, and the merging, until a minimum distance exceeds the threshold to generate the one or more clusters.
  - 3. The method of claim 2, wherein the calculating comprises:
    - calculating a distance between each pair of user agent strings, each of which is from one of the pair of clusters to obtain calculated distances;
      
      determining the distance between the pair of clusters based on a minimum distance among the calculated distances.
  - 4. The method of claim 2, wherein the threshold is predetermined or dynamically modified based on a machine learning model.
  - 5. The method of claim 1, further comprising:
    - ranking the merged clusters based on number of user agent strings in each cluster; and
      
      selecting the one or more clusters from the merged clusters based on the ranking.
  - 6. The method of claim 1, wherein the type of user agent information include at least one of operating system, browser, crawler, e-mail client, and game console.
  - 7. The method of claim 1, furthering comprising:
    - providing the keyword to an administrator for confirmation; and
      
      storing the keyword into a database along with the type of user agent information upon confirmation from the administrator.

8. A system having at least one processor storage, and a communication platform for determining a keyword from user agent strings, the system comprising:
- a user agent receiver configured for receiving a plurality of user agent strings;
  
  a user agent clustering unit configured for grouping the plurality of user agent strings into one or more clusters, wherein the one or more clusters comprise a first cluster that includes two or more user agent strings;
  
  a user agent comparing unit configured for comparing the two or more user agent strings in the first cluster;
  
  a subsequence extractor configured for extracting a longest common subsequence among the two or more user agent strings;
  
  a subsequence removing unit configured for removing the longest common subsequence from each user agent string to obtain a remaining subsequence; and
  
  a keyword determiner configured for determining a keyword from the first cluster based on at least one of the longest common subsequence and the remaining subsequence, wherein the keyword represents a type of user agent information.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8, wherein the user agent clustering unit comprises:
    - a distance calculation unit configured for assigning each user agent string into a cluster and calculating a distance between each pair of clusters;
      
      a cluster merging determiner configured for identifying a pair of clusters having a minimum distance; and
      
      a cluster merging unit configured for merging the pair of clusters into one cluster if the minimum distance is less than a threshold, wherein the calculating, the identifying, and the merging are repeated until a minimum distance exceeds the threshold to generate the one or more clusters.
  - 10. The system of claim 9, wherein calculating a distance between each pair of clusters comprises:
    - calculating a distance between each pair of user agent strings, each of which is from one of the pair of clusters to obtain calculated distances; and
      
      determining the distance between the pair of clusters based on a minimum distance among the calculated distances.
  - 11. The system of claim 9, wherein the threshold is predetermined or dynamically modified based on a machine learning model.
  - 12. The system of claim 8, further comprising:
    - a cluster ranking unit configured for ranking the merged clusters based on number of user agent strings in each cluster; and
      
      a cluster filter configured for selecting the one or more clusters from the merged clusters based on the ranking.
  - 13. The system of claim 8, wherein the type of user agent information include at least one of operating system, browser, crawler, e-mail client, and game console.

14. A non-transitory machine-readable medium having information recorded thereon for determining a keyword from user agent strings, wherein the information, when read by the machine, causes the machine to perform the following:
- receiving a plurality of user agent strings;
  
  grouping the plurality of user agent strings into one or more clusters, wherein the one or more clusters comprise a first cluster that includes two or more user agent strings;
  
  comparing the two or more user agent strings in the first cluster;
  
  extracting a longest common subsequence among the two or more user agent strings;
  
  removing the longest common subsequence from each user agent string to obtain a remaining subsequence; and
  
  determining a keyword from the first cluster based on at least one of the longest common subsequence and the remaining subsequence, wherein the keyword represents a type of user agent information.
- View Dependent Claims (15, 16, 17)
- - 15. The medium of claim 14, wherein the grouping comprises:
    - assigning each user agent string into a cluster;
      
      calculating a distance between each pair of clusters;
      
      identifying a pair of clusters having a minimum distance;
      
      merging the pair of clusters into one cluster if the minimum distance is less than a threshold; and
      
      repeating the calculating, the identifying, and the merging, until a minimum distance exceeds the threshold to generate the one or more clusters.
  - 16. The medium of claim 15, wherein the calculating comprises:
    - calculating a distance between each pair of user agent strings, each of which is from one of the pair of clusters to obtain calculated distances;
      
      determining the distance between the pair of clusters based on a minimum distance among the calculated distances.
  - 17. The medium of claim 14, wherein the information, when read by the machine, further causes the machine to perform the following:
    - ranking the merged clusters based on number of user agent strings in each cluster; and
      
      selecting the one or more clusters from the merged clusters based on the ranking.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yahoo Assets LLC
Original Assignee
Oath Inc. (Verizon Communications Inc.)
Inventors
Zhu, Ling, He, Min, Yu, Fei, Wei, Minzhang
Primary Examiner(s)
Jacob, Ajith

Application Number

US14/410,702
Publication Number

US 20160350400A1
Time in Patent Office

1,330 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/335   Filtering based on addition...

G06F 16/35   Clustering; Classification

G06F 16/358   Browsing; Visualisation the...

G06F 16/90344   by using string matching te...

G06N 20/00   Machine learning

H04L 67/02   based on web technology, e....

Method and system for providing a user agent string database

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for providing a user agent string database

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links