Method and apparatus for automatically processing a user's communication
First Claim
1. A method for processing a user'"'"'s communication comprising:
- receiving a list of recognized symbol strings of one or more recognized entries and a first similarity score associated with each recognized entry;
extracting from each recognized symbol string one or more contiguous sequences of N-symbols;
matching at least one of the extracted contiguous sequence of N-symbols with at least one stored contiguous sequence of N-symbols from a first database;
generating a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more stored symbol strings from a second database that correspond to the at least one of the matched contiguous sequence of N-symbols;
computing a third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings; and
outputting a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score.
7 Assignments
0 Petitions
Accused Products
Abstract
The invention concerns a method and apparatus for processing a user'"'"'s communication. The invention may include receiving a list of recognized symbol strings of one or more recognized entries. The list of recognized symbol strings may include a first similarity score associated with each recognized entry. From each recognized symbol string one or more contiguous sequences of N-symbols may be extracted. One of the extracted contiguous sequences of N-symbols may be matched with at least one stored contiguous sequence of N-symbols from a first database. A preliminary set of symbol strings and associated second similarity scores may be generated. The preliminary set of symbol strings may include one or more stored symbol strings from a second database that correspond to the at least one matched contiguous sequence of N-symbols. A third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings may be computed. A refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score may be output.
61 Citations
66 Claims
-
1. A method for processing a user'"'"'s communication comprising:
-
receiving a list of recognized symbol strings of one or more recognized entries and a first similarity score associated with each recognized entry;
extracting from each recognized symbol string one or more contiguous sequences of N-symbols;
matching at least one of the extracted contiguous sequence of N-symbols with at least one stored contiguous sequence of N-symbols from a first database;
generating a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more stored symbol strings from a second database that correspond to the at least one of the matched contiguous sequence of N-symbols;
computing a third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings; and
outputting a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
generating from a received user'"'"'s communication the list of recognized symbol strings of the one or more recognized entries; and
computing the first similarity score associated with each recognized entry contained in the generated list of recognized symbol strings.
-
-
9. The method of claim 1, wherein the third similarity score is computed based on associated information theoretic importance measures.
-
10. The method of claim 9, wherein the associated information theoretic importance measures is calculated using the formula:
- −
log(m(g)/m), where m(g) represents refined N-gram frequency scores and m represents number of stored symbol strings included in the preliminary set of symbol strings.
- −
-
11. The method of claim 1, wherein the second similarity score is computed based on associated information theoretic importance measures.
-
12. The method of claim 11, wherein the associated information theoretic importance measures is calculated using the formula:
- −
log(M(g)/M), where M(g) represents listings N-gram frequency scores and M represents total number of stored symbol strings in the second database.
- −
-
13. The method of claim 1, further comprising:
extracting from the one or more symbol strings stored in the second database the at least one stored contiguous sequence of N-symbols.
-
14. The method of claim 13, further comprising:
mapping the extracted at least one stored contiguous sequence of N-symbols with corresponding one or more symbol strings stored in the second database.
-
15. The method of claim 14, further comprising:
storing mapping information relating at least one of the stored contiguous sequences of N-symbols to the corresponding one or more symbol strings stored in the second database and the second similarity scores associating this particular stored contiguous sequence of N-symbols with the corresponding symbol strings containing it.
-
16. The method of claim 1, further comprising:
computing the associated second similarity scores for the one or more symbol strings stored in the second database included in the preliminary set of symbol strings as a function of at least a number of the contiguous sequences of N-symbols from the list of recognized symbol strings of one or more recognized entries encountered in the symbol string for which the associated second similarity score is being computed.
-
17. The method of claim 1, further comprising:
computing the associated second similarity scores for the one or more symbol strings stored in the second database included in the preliminary set of symbol strings based on at least a ratio of a number of the one or more symbol strings stored in the second database that contain the matched stored contiguous sequence of N-symbols and a total number of the one or more symbol strings stored in the second database.
-
18. The method of claim 1, further comprising:
computing the associated third similarity score as a function of at least a number of the one or more contiguous sequences of N-symbols extracted from each recognized symbol string that appear in the one or more symbol strings stored in the second database included in the preliminary set of symbol strings.
-
19. The method of claim 1, further comprising:
computing the third similarity score based on a ratio of a number of the one or more symbol strings stored in the second database included in the preliminary set of symbol strings containing the extracted contiguous sequences of N-symbols and a total number of stored symbol strings that appear in the preliminary set of symbol strings.
-
20. A machine-readable medium having stored thereon a plurality of executable instructions, the plurality of instructions comprising instructions to:
-
receive a list of recognized symbol strings of one or more recognized entries and a first similarity score associated with each recognized entry;
extract from each recognized symbol string one or more contiguous sequences of N-symbols;
match at least one of the extracted contiguous sequence of N-symbols with at least one stored contiguous sequence of N-symbols from a first database;
generate a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more stored symbol strings from a second database that correspond to the at least one of the matched contiguous sequence of N-symbols;
compute a third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings; and
output a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
generate from a received user'"'"'s communication the list of recognized symbol strings of the one or more recognized entries; and
compute the first similarity score associated with each recognized entry contained in the generated list of recognized symbol strings.
-
-
22. The machine-readable medium of claim 20 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
extract from the one or more symbol strings stored in the second database the at least one stored contiguous sequence of N-symbols.
-
23. The machine-readable medium of claim 22 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
map the extracted at least one stored contiguous sequence of N-symbols with corresponding one or more symbol strings stored in the second database.
-
24. The machine-readable medium of claim 23 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
storing mapping information relating at least one of the stored contiguous sequences of N-symbols to the corresponding one or more symbol strings stored in the second database and the second similarity scores associating this particular stored contiguous sequence of N-symbols with the corresponding symbol strings containing it.
-
25. The machine-readable medium of claim 20 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
compute the associated second similarity scores for the one or more symbol strings stored in the second database included in the preliminary set of symbol strings as a function of at least a number of the contiguous sequences of N-symbols from the list of recognized symbol strings of one or more recognized entries encountered in the symbol string for which the associated second similarity score is being computed.
-
26. The machine-readable medium of claim 20 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
compute the associated second similarity scores for the one or more symbol strings stored in the second database included in the preliminary set of symbol strings based on at least a ratio of a number of the one or more symbol strings stored in the second database that contain the matched stored contiguous sequence of N-symbols and a total number of the one or more symbol strings stored in the second database.
-
27. The machine-readable medium of claim 20 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
compute the associated third similarity score as a function of at least a number of the one or more contiguous sequences of N-symbols extracted from each recognized symbol string that appear in the one or more symbol strings stored in the second database included in the preliminary set of symbol strings.
-
28. The machine-readable medium of claim 20 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
compute the third similarity score based on a ratio of a number of the one or more symbol strings stored in the second database included in the preliminary set of symbol strings containing the extracted contiguous sequences of N-symbols and a total number of stored symbol strings that appear in the preliminary set of symbol strings.
-
29. An apparatus for processing a user'"'"'s communication comprising:
-
an N-gram map generator to extract one or more contiguous sequences of N-symbols from a list of recognized symbol strings of one or more recognized entries;
a first matcher to match at least one of the extracted contiguous sequence of N-symbols with at least one stored contiguous sequence of N-symbols and the first matcher further generates a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more stored symbol strings that correspond to the matched contiguous sequence of N-symbols;
a second matcher to compute a third similarity score corresponding to the one or more stored symbol strings included in the preliminary set of symbol strings; and
an output manager to output a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
a first database to store a plurality of symbol string entries used to process the user'"'"'s communication.
-
-
31. The apparatus of claim 30, wherein the N-gram map generator further extracts from each entry in the first database at least one stored contiguous sequence of N-symbols contained in each entry and the apparatus further comprises:
a second database to store the at least one stored contiguous sequence of N-symbols and a mapping for the corresponding database entries.
-
32. The apparatus of claim 30, wherein N-gram map generator further maps one or more stored contiguous sequences of N-symbols to at least one of the plurality of stored database entries that contain the one or more stored contiguous sequence of N-symbols.
-
33. The apparatus of claim 29, wherein the first matcher is to compute the associated second similarity scores for the preliminary set of symbol strings based on at least a ratio of a number of the one or more stored symbol strings from the first database that contain the stored contiguous sequence of N-symbols and a total number of the one or more stored symbol strings in the first database.
-
34. The apparatus of claim 29, wherein the second matcher is to compute the associated third similarity scores for the preliminary set of symbol strings based on at least a ratio of a number of the one or more stored symbol strings included in the preliminary set of symbol strings containing the extracted contiguous sequence of N-symbols and a total number of the one or more stored symbol strings included in the preliminary set of symbol strings.
-
35. The apparatus of claim 29, the further comprising:
a recognizer to generate the list of recognized symbol strings of one or more recognized entries from received user'"'"'s communication and to compute a first similarity scores associated with each entry of the generated list of recognized symbol strings.
-
36. The apparatus of claim 29, wherein the extracted sequences of N-symbols and the stored contiguous sequences of N-symbols include at least four symbols.
-
37. The apparatus of claim 36, wherein the extracted sequences of N-symbols and the stored contiguous sequences of N-symbols are of a fixed length.
-
38. The apparatus of claim 29, wherein the extracted sequences of N-symbols and the stored contiguous sequences of N-symbols include at least one of a one, two, three, four, five and six symbols.
-
39. The apparatus of claim 38, wherein the extracted sequences of N-symbols and the stored contiguous sequences of N-symbols are of the same fixed length.
-
40. The apparatus of claim 29, wherein the lengths of the extracted contiguous sequences of N-symbols and stored contiguous sequences of N-symbols are at least one of a fixed value, a range of values and a subset of the range of values.
-
41. The method for processing a user'"'"'s communication comprising:
-
extracting one or more listings N-gram from each symbol string entry in a listings data base;
mapping one or more particular listings N-gram from the one or more listings N-gram with a list of listings symbol strings that contain the particular listings N-gram;
calculating an elementary second similarity score for each entry in the list of listings symbol strings that contain the particular listings N-gram;
receiving a list of recognized symbol strings of one or more recognized entries and a first similarity score associated with each recognized entry;
extracting from each recognized symbol string one or more recognized N-grams;
matching at least one of the recognized N-grams with at least one particular listings N-gram from the one or more particular listings N-gram mapped to the list of listings symbol strings;
generating a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more symbol strings from the list of listings symbol strings mapped to the at least one of the matched particular listings N-gram;
computing a third similarity score associated with the one or more symbol strings included in the preliminary set of symbol strings; and
outputting a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
calculating a listings N-gram frequency score for the one or more mapped particular listings N-gram, wherein the listings N-gram frequency score represents a number of symbol string entries in the listings database in which the particular N-gram appears.
-
-
43. The method of claim 42, further comprising:
calculating an listings N-gram frequency ratio for the one or more mapped particular listings N-gram by dividing the listings N-gram frequency score by a total number of symbol string entries in the listings database.
-
44. The method of claim 43, wherein the elementary second similarity score for each N-gram is based on the calculated listings N-gram frequency ratio and the associated second similarity scores for symbol strings are calculated based on corresponding elementary second similarity scores for N-grams from the recognized symbol strings contained in these symbol strings.
-
45. The method of claim 41, wherein the preliminary set of symbol strings is generated based on an established threshold limit of the associated second similarity scores.
-
46. The method of claim 41, wherein the list of listings symbol strings that contain the particular listings N-gram is a full list.
-
47. The method of claim 41, wherein the list of listings symbol strings that contain the particular listings N-gram is a short list.
-
48. The method of claim 41, further comprising:
calculating a refined frequency score for the one or more recognized N-grams, wherein the refined frequency score represents a number of symbol string entries contained in the preliminary set of symbol strings that contain the recognized N-gram.
-
49. The method of claim 48, further comprising:
calculating a refined N-gram frequency ratio for the one or more recognized N-grams by dividing the refined frequency score by a total number of symbol string entries contained in the preliminary set of symbol strings.
-
50. The method of claim 49, further comprising:
calculating an elementary third similarity score for a recognized N-gram and each entry in the preliminary set of symbol strings that contain the recognized N-gram.
-
51. The method of claim 50, wherein the elementary third similarity score for the recognized N-gram and each entry in the preliminary list of symbol strings that contain the recognized N-gram is based on the calculated refined N-gram frequency ratio and the associated third similarity scores are calculated based on corresponding elementary third similarity scores.
-
52. The method of claim 41, further comprising:
generating a refined set of symbol strings is based on an established refined threshold limit of the associated third similarity scores.
-
53. An apparatus for processing a user'"'"'s communication comprising:
-
an N-gram map generator extracts one or more listings N-gram from each symbol string entry in a listings database, maps one or more particular listings N-gram from the one or more listings N-gram with a list of listings symbol strings that contain the particular listings N-gram and calculates an elementary second similarity score for each entry in the list of listings symbol strings that contain the particular listings N-gram;
a first matcher receives a list of recognized symbol strings of one or more recognized entries and a first similarity score associated with each recognized entry, extracts from each recognized symbol string one or more recognized N-grams, matches at least one of the recognized N-grams with at least one particular listings N-gram from the one or more particular listings N-gram mapped to the list of listings symbol strings and generates a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more symbol strings from the list of listings symbol strings mapped to the at least one of the matched particular listings N-gram; and
a second matcher computes a third similarity score associated with the one or more symbol strings included in the preliminary set of symbol strings and outputs a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score. - View Dependent Claims (54, 55, 56, 57, 58, 59)
an N-gram database stores the one or more listings N-gram and the mapped list of listings symbol strings that contain the particular listings N-gram.
-
-
57. The apparatus of claim 53, the second matcher further calculates a refined frequency score for the one or more recognized N-grams, wherein the refined frequency score represents a number of symbol string entries contained in the preliminary set of symbol strings that contain the recognized N-gram.
-
58. The apparatus of claim 57, the second matcher further calculates a refined N-gram frequency ratio for the one or more recognized N-grams by dividing the refined frequency score by a total number of symbol string entries contained in the preliminary set of symbol strings.
-
59. The apparatus of claim 53, the second matcher further calculates an elementary third similarity score for a recognized N-gram and each entry in the preliminary set of symbol strings that contain the recognized N-gram.
-
60. A machine-readable medium having stored thereon a plurality of executable instructions, the plurality of instructions comprising instructions to:
-
extract one or more listings N-gram from each symbol string entry in a listings database;
map one or more particular listings N-gram from the one or more listings N-gram with a list of listings symbol strings that contain the particular listings N-gram;
calculate an elementary second similarity score for each entry in the list of listings symbol strings that contain the particular listings N-gram;
receive a list of recognized symbol strings of one or more recognized entries and a first similarity score associated with each recognized entry;
extract from each recognized symbol string one or more recognized N-grams;
match at least one of the recognized N-grams with at least one particular listings N-gram from the one or more particular listings N-gram mapped to the list of listings symbol strings;
generate a preliminary set of symbol strings and associated second similarity scores, the preliminary set of symbol strings including one or more symbol strings from the list of listings symbol strings mapped to the at least one of the matched particular listings N-gram;
compute a third similarity score associated with the one or more symbol strings included in the preliminary set of symbol strings; and
output a refined set of symbol strings from the preliminary set of symbol strings based on the computed third similarity score. - View Dependent Claims (61, 62, 63, 64, 65, 66)
calculate a listings N-gram frequency score for the one or more mapped particular listings N-gram, wherein the listings N-gram frequency score represents a number of symbol string entries in the listings database in which the particular N-gram appears.
-
-
62. The machine-readable medium of claim 61 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
calculate a listings N-gram frequency ratio for the one or more mapped particular listings N-gram by dividing the listings N-gram frequency score by a total number of symbol string entries in the listings database.
-
63. The machine-readable medium of claim 60 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
calculate a refined frequency score for the one or more recognized N-grams, wherein the refined frequency score represents a number of symbol string entries contained in the preliminary set of symbol strings that contain the recognized N-gram.
-
64. The machine-readable medium of claim 63 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
calculate a refined N-gram frequency ratio for the one or more recognized N-grams by dividing the refined frequency score by a total number of symbol string entries contained in the preliminary set of symbol strings.
-
65. The machine-readable medium of claim 64 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
calculate an elementary third similarity score for a recognized N-gram and each entry in the preliminary set of symbol strings that contain the recognized N-gram.
-
66. The machine-readable medium of claim 60 having stored thereon additional executable instructions, the additional instructions comprising instructions to:
generate a refined set of symbol strings is based on an established refined threshold limit of the associated third similarity scores.
Specification