Method to resolve an incorrectly entered uniform resource locator (URL)
First Claim
1. A method comprising:
- accepting a string of characters representing a possibly incorrectly entered URL;
parsing the string into a set of URL parts, a part formed from characters having values in a first space of characters, each part having a corresponding distance measure of closeness for measuring distances between URL parts;
forming a signature of each URL part, forming said signature including, in the case the distance measure of closeness for a URL part in the first space in integer valued, transforming the characters of the URL part whose values are in the first space into characters in a second space such that the distance measure of closeness is transformed to a distance measure of closeness that is not necessarily integer valued;
for each URL part, searching for at least one cluster of a set of pre-formed clusters, the set of pre-formed clusters being clusters of valid URL parts that are close according to the distance measure of closeness that is not necessarily integer valued, each cluster in the set of pre-formed clusters having a representative URL part and signature thereof, the searching using the signature of the URL part;
further searching for a valid URL part within each cluster found in the searching step,wherein at least one URL part includes one or more non-text non-numerical characters.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and a carrier medium carrying code segments to cause a processor to implement a method for resolving a possibly incorrectly entered URL. The method includes accepting the entered URL, parsing the accepted URL into URL parts, and carrying out a conventional URL lookup. In one embodiment, for any part of the accepted URL that is not valid, the method includes determining a signature for the accepted URL part; and conducting a fuzzy search for at least one valid URL part that is close to the invalid URL part according to a distance measure that combines at least one local measure, each measure suited for a particular type of URL part. At least one valid URL may be formed from the URL parts found in the fuzzy search. In one implementation, the conducting of the fuzzy search includes: determining at least one cluster of a set of pre-formed clusters wherein the accepted URL part is likely to be. Each cluster includes a set of valid URL parts that are close according to a distance measure, and has a representative URL part having a known signature. The determining of the cluster(s) includes finding the at least one signature of representative URLs close to the signature of the accepted URL part. The method includes further searching for a valid URL part within the at least one determined cluster.
142 Citations
26 Claims
-
1. A method comprising:
-
accepting a string of characters representing a possibly incorrectly entered URL; parsing the string into a set of URL parts, a part formed from characters having values in a first space of characters, each part having a corresponding distance measure of closeness for measuring distances between URL parts; forming a signature of each URL part, forming said signature including, in the case the distance measure of closeness for a URL part in the first space in integer valued, transforming the characters of the URL part whose values are in the first space into characters in a second space such that the distance measure of closeness is transformed to a distance measure of closeness that is not necessarily integer valued; for each URL part, searching for at least one cluster of a set of pre-formed clusters, the set of pre-formed clusters being clusters of valid URL parts that are close according to the distance measure of closeness that is not necessarily integer valued, each cluster in the set of pre-formed clusters having a representative URL part and signature thereof, the searching using the signature of the URL part; further searching for a valid URL part within each cluster found in the searching step, wherein at least one URL part includes one or more non-text non-numerical characters.
-
-
2. A method for resolving a possibly incorrectly entered URL comprising:
-
accepting the entered URL; parsing the accepted URL into URL parts; carrying out a conventional URL lookup; and for any part of the accepted URL that is not valid; determining a signature for the accepted URL part; and conducting a fuzzy search for at least one valid URL part that is close to the invalid URL part according to a distance measure that combines at least one local measure, each measure suited for a particular type of URL part, wherein in the case the not-valid accepted URL part includes characters in a first space wherein a distance measure of closeness is integer-valued, the determining of the signature of the accepted URL part includes converting the first space into a second space such that the signature of the URL part is represented by one or more elements in the second space the second space being a space wherein the distance measure for comparing signatures of URL parts is non-integer or a general distance function in a metric space such that cluster analysis can be performed on signatures of valid URLs or URL parts, and wherein at least one URL part includes one or more non-text non-numerical characters. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable medium having encoded thereon at least one computer readable code segment for instructing a processor of a processing system, the at least one code segment when executed carrying out a method for resolving a possibly incorrectly entered URL, the method comprising:
-
accepting the entered URL; parsing the accepted URL into URL parts; carrying out a conventional URL lookup; and for any part of the accepted URL that is not valid; determining a signature for the accepted URL part; and conducting a fuzzy search at least one valid URL part that is close to the invalid URL part according to a distance measure that combines at least one local measure, each measure suited for a particular type of URL part, wherein in the case the not-valid accepted URL part includes characters in a first space wherein a distance measure of closeness is integer-valued, the determining of the signature of the accepted URL part includes converting the first space into a second space such that the signature of the URL part is a sequence of values in the second space, the second space being a space wherein the distance measure for comparing signatures of URL parts is non-integer or a general distance function in a metric space such that cluster analysis can be performed on signatures of valid URLs or URL parts, and wherein at least one URL part includes one or more non-text non-numerical characters. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method of conducting a fuzzy search for a source URL part that closely matches a valid URL part, comprising:
-
determining a signature for the source URL part; determining at least one cluster of a set of pre-formed clusters wherein the source URL part is likely to be, each cluster comprising a set of valid URL parts that are close according to a distance measure and having a representative URL part having a known signature, the determining of the likely clusters including finding at least one signature of representative URLs close to the signature of the accepted URL part; and further searching for a valid URL part within the at least one determined cluster wherein the source URL part includes at least one a non-text non-numerical symbol. - View Dependent Claims (23, 24, 25)
-
-
26. A method comprising:
-
accepting a string of symbols representing a possibly incorrectly entered URL, at least one symbol being a non-text non-numerical symbol; parsing the string into a set of URL parts, a part formed from characters having values in a first space of characters, each part having a corresponding distance measure of closeness for measuring distances between URL parts; forming a signature of each URL part, forming said signature including in the case that the corresponding distance measure of closeness for the first space for at least one part is integer valued, transforming the characters of the URL part whose values are in the first space into characters in a second space such that the distance measure of closeness is transformed to a distance measure of closeness that is not necessarily integer valued; for each URL part, searching for one or more clusters of a set of pre-formed clusters, the set of pre-formed clusters being clusters of valid URL parts that are close according to the distance measure of closeness that is not necessarily integer valued, but a general distance function in a metric space, each cluster in the set of pre-formed clusters having a representative URL part and signature thereof, the searching using the signature of the URL part; further searching for a valid URL part within each cluster found in the searching step.
-
Specification