Segmenting a String Using Similarity Values
First Claim
1. A method for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is the same as or similar to a marker string, the method comprising:
- generating a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison;
generating a set of segmentation locations identified using a set of ideal segmentation locations and a set of candidate segmentation locations obtained from a set of locations in the similarity vector corresponding to local maximum similarity values within a distance threshold of the locations in the similarity vector corresponding to ideal segmentation locations from the set of ideal segmentation locations; and
using the set of segmentation locations to segment the string.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems and methods for segmenting a string comprised of one or more string segments using similarity values. In embodiments, each string segment may contain at least a variation of a marker string that may be used to separate string segments in the string. In embodiments, a similarity value representing the result of comparing the marker string to substrings of the string may be computed, and a similarity vector representing the set of comparisons for the locations on the string may be generated. In embodiments, the similarity vector may be used to identify candidate segmentation locations in the string. In embodiments, a set of segmentation locations in the string may be derived from the candidate segmentation locations in the string, and the string may be segmented according to the set of segmentation locations.
-
Citations
20 Claims
-
1. A method for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is the same as or similar to a marker string, the method comprising:
-
generating a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison; generating a set of segmentation locations identified using a set of ideal segmentation locations and a set of candidate segmentation locations obtained from a set of locations in the similarity vector corresponding to local maximum similarity values within a distance threshold of the locations in the similarity vector corresponding to ideal segmentation locations from the set of ideal segmentation locations; and using the set of segmentation locations to segment the string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is at least a variant of a marker string, the method comprising:
-
generating a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison; identifying a set of ideal segmentation locations in the string based upon an expected number of discrete segments within the string; using the similarity vector to generate a set of candidate segmentation locations for segmenting the string; using the set of candidate segmentation locations and the set of ideal segmentation locations to generate the set of segmentation locations; and using the set of segmentation locations to segment the string. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system for segmenting a string comprising one or more segments into discrete segments, wherein each of the one or more segments comprises data that is the same as or similar to a marker string, the system comprising:
-
a similarity vector generator, coupled to receive the string and the marker string, that generates a similarity vector comprising a plurality of similarity values and associated locations within the string wherein a similarity value represents a comparison of the marker string and at least a portion of the string and an associated location associated with the similarity value is the location within the string of the start of the at least a portion of the string used in the comparison; a segment location set generator, coupled to receive the similarity vector, that uses the similarity vector to generate a set of segmentation locations wherein a segmentation location marks the beginning of a discrete segment in the string; and a string segmenter, coupled to receive the set of segmentation locations, that uses the set of segmentation locations to segment the string. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification