Algorithms for selecting subsequences
First Claim
Patent Images
1. A computer-implemented method for selecting shingles, the method comprising:
- parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value;
selecting a first set of shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where each shingle in the first set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle;
parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;
selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where each shingle in the second set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; and
comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure includes, among other things, systems, methods and program products for selecting subsequences (shingles or tuples) generated from sequences of tokens.
12 Citations
34 Claims
-
1. A computer-implemented method for selecting shingles, the method comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value;selecting a first set of shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where each shingle in the first set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where each shingle in the second set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method for selecting shingles, the method comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value;for each shingle of k tokens from the first sequence; determining modulo values of the tokens in the shingle based on the numerical values of the tokens in the shingle modulo k; determining positions of the tokens in the shingle, wherein the position of a token in the shingle specifies an order of the token in the shingle with respect to other tokens in the shingle; comparing the modulo values of the tokens in the shingle with the respective positions of the tokens in the shingle; selecting a first set of the shingles of k tokens from the shingles of k tokens from the first sequence based on the comparison, where for each shingle of k tokens in the first set the modulo value of at least one token in the shingle of the first set matches the respective position of the at least one token in the shingle of the first set; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;for each shingle of k tokens from the second sequence; determining modulo values of the tokens in the shingle based on the numerical values of the tokens in the shingle modulo k; determining positions of the tokens in the shingle, wherein the position of a token in the shingle specifies an order of the token in the shingle with respect to other tokens in the shingle; comparing the modulo values of the tokens in the shingle with the respective positions of the tokens in the shingle; selecting a second set of shingles of k tokens from the shingles of k tokens from the second sequence based on the comparison, where for each shingle of k tokens in the second set the modulo value of at least one token in the shingle of the second set matches the respective position of the at least one token in the shingle of the second set; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens. - View Dependent Claims (10, 11, 12, 13, 14, 28)
-
-
15. A computer-implemented method for selecting shingles, the method comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a value from a set of numerical values;selecting a group S of one or more shingles of m tokens, where m is a numerical value and m<
=k and where each token of each shingle in the group S has a value from the set of numerical values;selecting an offset in reference to a starting position of a shingle of k tokens; and selecting a first set of the shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where for each of the shingles in the first set; a shingle of m tokens starting at the offset in the shingle of k tokens in the first set is in the group S;
orno shingle of m tokens in the shingle of k tokens in the first set is in the group S; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a value from a set of numerical values;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where for each of the shingles in the second set; a shingle of m tokens starting at the offset in the shingle of k tokens in the second set is in the group S;
orno shingle of m tokens in the shingle of k tokens in the second set is in the group S; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A computer program product, encoded on a computer-readable medium, operable to cause one or more processors to perform operations comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value; andselecting a first set of shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where each shingle in the first set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where each shingle in the second set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
-
-
23. A computer program product, encoded on a computer-readable medium, operable to cause one or more processors to perform operations comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value; andfor each shingle of k tokens from the first sequence; determining modulo values of the tokens in the shingle based on the numerical values of the tokens in the shingle modulo k; determining positions of the tokens in the shingle, wherein the position of a token in the shingle specifies an order of the token in the shingle with respect to other tokens in the shingle; comparing the modulo values of the tokens in the shingle with the respective positions of the tokens in the shingle; selecting a first set of the shingles of k tokens from the shingles of k tokens from the first sequence based on the comparison, where for each shingle of k tokens in the first set the modulo value of at least one token in the shingle of the first set matches the respective position of the at least one token in the shingle of the first set; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;for each shingle of k tokens from the second sequence; determining modulo values of the tokens in the shingle based on the numerical values of the tokens in the shingle modulo k; determining positions of the tokens in the shingle, wherein the position of a token in the shingle specifies an order of the token in the shingle with respect to other tokens in the shingle; comparing the modulo values of the tokens in the shingle with the respective positions of the tokens in the shingle; selecting a second set of shingles of k tokens from the shingles of k tokens from the second sequence based on the comparison, where for each shingle of k tokens in the second set the modulo value of at least one token in the shingle of the second set matches the respective position of the at least one token in the shingle of the second set; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
-
-
24. A computer program product, encoded on a computer-readable medium, operable to cause one or more processors to perform operations comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a value from a set of numerical values;selecting a group S of one or more shingles of m tokens, where m is a numerical value and m<
=k and where each token of each shingle in the group S has a value from the set of numerical values;selecting an offset in reference to a starting position of a shingle of k tokens; and selecting a first set of the shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where for each of the shingles in the first set; a shingle of m tokens starting at the offset in the shingle of k tokens in the first set is in the group S;
orno shingle of m tokens in the shingle of k tokens in the first set is in the group S; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a value from a set of numerical values;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where for each of the shingles in the second set; a shingle of m tokens starting at the offset in the shingle of k tokens in the second set is in the group S;
orno shingle of m tokens in the shingle of k tokens in the second set is in the group S; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
-
-
25. A server system comprising:
-
a computer readable medium comprising a computer program product; and one or more processors operable to execute the computer program product to perform operations comprising; parsing a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value;selecting a first set of shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where each shingle in the first set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where each shingle in the second set includes one of (i) a token in a first position in the shingle having a largest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the largest numerical value of all tokens in the shingle; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
-
-
26. A server system comprising:
-
a computer readable medium comprising a computer program product; and one or more processors operable to execute the computer program product to perform operations comprising; parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value;for each shingle of k tokens from the first sequence; determining modulo values of the tokens in the shingle based on the numerical values of the tokens in the shingle modulo k; determining positions of the tokens in the shingle, wherein the position of a token in the shingle specifies an order of the token in the shingle with respect to other tokens in the shingle; comparing the modulo values of the tokens in the shingle with the respective positions of the tokens in the shingle; selecting a first set of the shingles of k tokens from the shingles of k tokens from the first sequence based on the comparison, where for each shingle of k tokens in the first set the modulo value of at least one token in the shingle of the first set matches the respective position of the at least one token in the shingle of the first set; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;for each shingle of k tokens from the second sequence; determining modulo values of the tokens in the shingle based on the numerical values of the tokens in the shingle modulo k; determining positions of the tokens in the shingle, wherein the position of a token in the shingle specifies an order of the token in the shingle with respect to other tokens in the shingle; comparing the modulo values of the tokens in the shingle with the respective positions of the tokens in the shingle; selecting a second set of shingles of k tokens from the shingles of k tokens from the second sequence based on the comparison, where for each shingle of k tokens in the second set the modulo value of at least one token in the shingle of the second set matches the respective position of the at least one token in the shingle of the second set; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
-
-
27. A server system comprising:
-
a computer readable medium comprising a computer program product; and one or more processors operable to execute the computer program product to perform operations comprising; parsing a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a value from a set of numerical values;selecting a group S of one or more shingles of m tokens, where m is a numerical value and m<
=k and where each token of each shingle in the group S has a value from the set of numerical values;selecting an offset in reference to a starting position of a shingle of k tokens; and selecting a first set of the shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where for each of the shingles in the first set; a shingle of m tokens starting at the offset in the shingle of k tokens in the first set is in the group S;
orno shingle of m tokens in the shingle of k tokens in the first set is in the group S; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a value from a set of numerical values;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where for each of the shingles in the second set; a shingle of m tokens starting at the offset in the shingle of k tokens in the second set is in the group S;
orno shingle of m tokens in the shingle of k tokens in the second set is in the group S; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens.
-
-
29. A computer-implemented method for selecting shingles, the method comprising:
-
parsing, by one or more processors, a first sequence of n tokens into shingles of k tokens, where k and n are numerical values and k<
=n, and where each of the n tokens has a numerical value; andselecting a first set of shingles of k tokens from the shingles of k tokens parsed from the first sequence of n tokens, where each shingle in the first set includes one of (i) a token in a first position in the shingle having a smallest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the smallest numerical value of all tokens in the shingle; parsing a second sequence of x tokens into shingles of k tokens, where x is a numerical value and k<
=x, and where each of the x tokens has a numerical value;selecting a second set of shingles of k tokens from the shingles of k tokens parsed from the second sequence of x tokens, where each shingle in the second set includes one of (i) a token in a first position in the shingle having a smallest numerical value of all tokens in the shingle or (ii) a token in a last position in the shingle having the smallest numerical value of all tokens in the shingle; and comparing at least a portion of the first set of shingles of k tokens with at least a portion of the second set of shingles of k tokens. - View Dependent Claims (30, 31, 32, 33, 34)
-
Specification