Methods and systems for augmenting a token lexicon
First Claim
1. A computer-implemented method, comprising:
- receiving a character string in an alphanumeric format having no token-delineating breaks and comprising one or more tokens in the alphanumeric format; and
for each of the one or more tokens, parsing the received character string into a first portion containing a first token and a second portion containing the remaining tokens;
identifying the first token in one or more logs associated with multiple previously received search requests;
determining a frequency with which the identified first token appears in the one or more logs;
determining whether the determined frequency with which the identified first token appears in the one or more logs exceeds a first threshold level; and
storing the identified first token in a lexicon data storage based on the determination of whether the determined frequency with which the identified first token appears in the one or more logs exceeds the first threshold level, wherein the lexicon data storage comprises an ontology associating at least one of a misspelling of the first token with a correct spelling, or an alternate spelling of the first token with a different spelling.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for augmenting a token lexicon are presented. In one embodiment, a method comprising identifying a first token from a search request, storing the first token in a lexicon data storage, receiving a character string comprising a second token, wherein the second token is substantially similar to the first token, and parsing the character string using the lexicon data storage to resolve the second token is set forth. According to another embodiment, a method comprising identifying a first token from an interne article, storing the first token in a lexicon data storage, receiving a character string comprising a second token, wherein the second token is substantially similar to the first token, and parsing the character string using the lexicon data storage to resolve the second token is set forth.
130 Citations
28 Claims
-
1. A computer-implemented method, comprising:
-
receiving a character string in an alphanumeric format having no token-delineating breaks and comprising one or more tokens in the alphanumeric format; and for each of the one or more tokens, parsing the received character string into a first portion containing a first token and a second portion containing the remaining tokens; identifying the first token in one or more logs associated with multiple previously received search requests; determining a frequency with which the identified first token appears in the one or more logs; determining whether the determined frequency with which the identified first token appears in the one or more logs exceeds a first threshold level; and storing the identified first token in a lexicon data storage based on the determination of whether the determined frequency with which the identified first token appears in the one or more logs exceeds the first threshold level, wherein the lexicon data storage comprises an ontology associating at least one of a misspelling of the first token with a correct spelling, or an alternate spelling of the first token with a different spelling. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method, comprising:
-
identifying a character string in an alphanumeric format having no token delineating breaks and comprising one or more tokens in the alphanumeric format from an internet-accessible article; and for each of the one or more tokens, parsing the identified character string into a first portion containing a first token and a second portion containing the remaining tokens; determining a first frequency with which the first token appears in the internet-accessible article, or a second frequency with which the first token appears at least once in a number of different internet-accessible articles; determining whether the determined first frequency with which the first token appears in the internet-accessible article exceeds a first threshold level, or whether the determined second frequency with which the first token appears at least once in the number of different internet-accessible articles exceeds a second threshold level; and storing the first token in a lexicon data storage based on the determination of whether the determined first frequency with which the first token appears in the internet-accessible article exceeds the first threshold level, or whether the determined second frequency with which the first token appears at least once in the number of different internet-accessible articles exceeds the second threshold level, wherein the lexicon data storage comprises an ontology associating at least one of a misspelling of the first token with a correct spelling, or an alternate spelling of the first token with a preferred spelling. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-implemented method comprising:
-
identifying a character string in an alphanumeric format having no token delineating breaks and comprising one or more tokens in the alphanumeric format, wherein the identified character string is included in a plurality of previously received search requests; for each of the one or more tokens, parsing the identified character string into a first portion containing a first token and a second portion containing the remaining tokens; determining whether the first token is already included in a lexicon comprising an ontology of interrelated tokens and whether the first token occurs in the plurality of previously received search requests with at least a threshold frequency; based upon the determination of whether the first token is already included in the lexicon and the determination of whether the first token occurs in the plurality of previously received search results with at least the threshold frequency, identifying a second token that comprises a correct spelling of the first token, or an alternate spelling of the first token; and adding the first token to the lexicon with an association to the identified second token; receiving an alphanumeric string of characters comprising a domain name and having no token-delineating breaks; matching a portion of the received alphanumeric string of characters to the first token using the lexicon; and replacing the matched portion of the received alphanumeric string of characters with the identified second token contained in the lexicon. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory computer-readable storage device comprising program code that, when executed, causes a processor to perform operations comprising:
-
receiving a character string in an alphanumeric format having no token-delineating breaks and comprising the first token one or more tokens in the alphanumeric format; and for each of the one or more tokens, parsing the received character string into a first portion containing a first token and a second portion containing the remaining tokens; identifying the first token in one or more logs associated with multiple previously received search requests; determining a frequency with which the identified first token appears in the one or more logs; determining whether the determined frequency with which the identified first token appears in the one or more logs exceeds a first threshold level; and storing the identified first token in a lexicon data storage based on the determination of whether the determined frequency with which the identified first token appears in the one or more logs exceeds the first threshold level, wherein the lexicon data storage comprises an ontology associating at least one of a misspelling of the first token with a correct spelling, or an alternate spelling of the first token with a preferred spelling. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A non-transitory computer-readable storage device comprising program code that, when executed, causes a processor to perform operations comprising:
-
identifying a character string in an alphanumeric format having no token delineating breaks and comprising one or more tokens in the alphanumeric format from an internet-accessible article; and for each of the one or more tokens, parsing the identified character string into a first portion containing a first token and a second portion containing the remaining tokens; determining a first frequency with which the first token appears in the internet-accessible article, or a second frequency with which the first token appears at least once in a number of different internet-accessible articles; determining whether the determined first frequency with which the first token appears in the internet-accessible article exceeds a first threshold level, or whether the determined second frequency with which the first token appears at least once in the number of different internet-accessible articles exceeds a second threshold level; and storing the first token in a lexicon data storage based on the determination of whether the determined first frequency with which the first token appears in the internet-accessible article exceeds the first threshold level, or whether the determined second frequency with which the first token appears at least once in the number of different internet-accessible articles exceeds the second threshold level, wherein the lexicon data storage comprises an ontology associating at least one of a misspelling of the first token with a correct spelling, or an alternate spelling of the first token with a preferred spelling. - View Dependent Claims (21, 22, 23, 24)
-
-
25. A non-transitory computer-readable storage device comprising program code that, when executed, causes a processor to perform operations comprising:
-
identifying a character string in an alphanumeric format having no token delineating breaks and comprising one or more tokens in the alphanumeric format, wherein the identified character string is included in a plurality of previously received search requests; for each of the one or more tokens, parsing the identified character string into a first portion containing a first token and a second portion containing the remaining tokens; determining whether the first token is already included in a lexicon comprising an ontology of interrelated tokens and whether the first token occurs in the plurality of previously received search requests with at least a threshold frequency; based upon the determination of whether the first token is already included in the lexicon and the determination of whether the first token occurs in the plurality of previously received search results with at least the threshold frequency, identifying a second token that comprises a correct spelling of the first token, or an alternate spelling of the first token; and adding the first token to the lexicon with an association to the identified second token; receiving an alphanumeric string of characters comprising a domain name and having no token-delineating breaks; matching a portion of the received alphanumeric string of characters to the first token using the lexicon; and replacing the matched portion of the received alphanumeric string of characters with the identified second token contained in the lexicon. - View Dependent Claims (26, 27, 28)
-
Specification