Segmenting a string using similarity values
First Claim
1. A method for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is the same as or similar to a marker string, the method comprising:
- generating a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison;
identifying a set of ideal segmentation locations based upon an expected number of discrete segments within the string;
using the similarity vector to identify a set of candidate segmentation locations;
responsive to a candidate segmentation location having a similarity value less than another candidate segmentation location within a local window, removing the candidate segmentation location from the set of candidate segmentation locations;
responsive to a candidate segmentation location and a closest ideal segmentation location being at a distance that is greater than the distance threshold, removing the candidate segmentation location from the set of candidate segmentation locations; and
using the set of candidate segmentation locations and the set of ideal segmentation locations to generate a set of segmentation locations; and
using the set of segmentation locations to segment the string.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems and methods for segmenting a string comprised of one or more string segments using similarity values. In embodiments, each string segment may contain at least a variation of a marker string that may be used to separate string segments in the string. In embodiments, a similarity value representing the result of comparing the marker string to substrings of the string may be computed, and a similarity vector representing the set of comparisons for the locations on the string may be generated. In embodiments, the similarity vector may be used to identify candidate segmentation locations in the string. In embodiments, a set of segmentation locations in the string may be derived from the candidate segmentation locations in the string, and the string may be segmented according to the set of segmentation locations.
50 Citations
18 Claims
-
1. A method for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is the same as or similar to a marker string, the method comprising:
-
generating a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison; identifying a set of ideal segmentation locations based upon an expected number of discrete segments within the string; using the similarity vector to identify a set of candidate segmentation locations; responsive to a candidate segmentation location having a similarity value less than another candidate segmentation location within a local window, removing the candidate segmentation location from the set of candidate segmentation locations; responsive to a candidate segmentation location and a closest ideal segmentation location being at a distance that is greater than the distance threshold, removing the candidate segmentation location from the set of candidate segmentation locations; and using the set of candidate segmentation locations and the set of ideal segmentation locations to generate a set of segmentation locations; and using the set of segmentation locations to segment the string. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is at least a variant of a marker string, the method comprising:
-
generating a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison; identifying a set of ideal segmentation locations in the string based upon an expected number of discrete segments within the string; using the similarity vector to generate a set of candidate segmentation locations for segmenting the string based on a comparison of each of a plurality of elements of the similarity vector to a similarity value threshold obtained from a smoothed similarity vector; using the set of candidate segmentation locations and the set of ideal segmentation locations to generate the set of segmentation locations; and using the set of segmentation locations to segment the string. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A system for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is the same as or similar to a marker string, the system comprising:
-
a similarity vector generator, coupled to receive the string and the marker string, that generates a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison; a segment location set generator, coupled to receive the similarity vector, that identifies a set of ideal segmentation locations based upon an expected number of discrete segments within the string, uses the similarity vector to identify a set of candidate segmentation locations, responsive to a candidate segmentation location having a similarity value less than another candidate segmentation location within a local window, removes the candidate segmentation location from the set of candidate segmentation locations, responsive to a candidate segmentation location and a closest ideal segmentation location being at a distance that is greater than a distance threshold, removes the candidate segmentation location from the set of candidate segmentation locations, and uses the set of candidate segmentation locations and the set of ideal segmentation locations to generate a set of segmentation locations, wherein a segmentation location marks the beginning of a discrete segment in the string; and a string segmenter, coupled to receive the set of segmentation locations, that uses the set of segmentation locations to segment the string. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification