System, method and apparatus for conducting a phrase search
First Claim
Patent Images
1. A method of searching a database comprising:
- providing a plurality of relational models wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one subset term pair and a subset plurality of types of relational summation metrics (RSMs) that include a summation of values of the corresponding type of relational metric of occurrences of the at least one subset term pair within at least one context window within the at least one subset and includes at least one of a right contextual metric (RCM) and a left contextual metric (LCM);
inputting a first query for the database;
creating a relational model of the first query, wherein the relational model of the first query includes at least one first query relation, each of the first query relations having a first query term pair and a first query plurality of types of relational summation metrics;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting at least one identifier of the subsets relevant to the first query.
1 Assignment
0 Petitions
Accused Products
Abstract
A phrase search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more sequences of terms. Next, a relational model of the query is created. The relational model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
188 Citations
66 Claims
-
1. A method of searching a database comprising:
-
providing a plurality of relational models wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one subset term pair and a subset plurality of types of relational summation metrics (RSMs) that include a summation of values of the corresponding type of relational metric of occurrences of the at least one subset term pair within at least one context window within the at least one subset and includes at least one of a right contextual metric (RCM) and a left contextual metric (LCM);
inputting a first query for the database;
creating a relational model of the first query, wherein the relational model of the first query includes at least one first query relation, each of the first query relations having a first query term pair and a first query plurality of types of relational summation metrics;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting at least one identifier of the subsets relevant to the first query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
T1 is a first term in said term pair;
T2 is a second term in said term pair;
C is equal to a number of terms in said context window; and
N is equal to a number of terms occurring between T1 and T2.
-
-
9. The method as recited in claim 1, further comprising
providing a value RCM(T1, T2) for said RCM for a single occurrence of a term pair (T1, T2) in said subset wherein: -
T1 is a first term in the said term pair;
T2 is a second term in the said term pair;
RCM(T1, T2)=0, if T2 precedes T1; and
RCM(T1, T2)=C−
1−
N, if T1 precedes T2, wherein;
C is equal to a number of terms in said context window; and
N is equal to a number of terms occurring between T1 and T2.
-
-
10. The method as recited in claim 1, further comprising providing a value LCM(T1, T2) for said LCM for a single occurrence of a term pair (T1, T2) in said subset wherein:
-
T1 is a first term in said term pair;
T2 is a second term in said term pair;
LCM(T1, T2)=0, follows T1; and
LCM(T1, T2)=C−
1−
N, if T1 follows T2, wherein;
C is equal to a number of terms in said context window; and
N is equal to a number of terms occurring between T1 and T2.
-
-
11. The method as recited in claim 1, further comprising including in said types of relational metrics a directional contextual metric (DCM).
-
12. The method as recited in claim 11, further comprising providing at least one directional contextual metric (DCM) among said types of relational metrics, wherein:
-
the DCM for a single occurrence of a term pair (T1, T2) in the subset has a value DCM(T1, T2), wherein;
T1 is a first term in said term pair;
T2 is a second term in said term pair;
DCM(T1, T2)=RCM(T1, T2)−
LCM(T1, T2), wherein;
RCM(T1, T2) is a right contextual metric for a single occurrence of said term pair (T1, T2) in said subset;
LCM(T1, T2) is a left contextual metric for a single occurrence of said term pair (T1, T2) in said subset; and
RCM(T1, T2)≧
LCM(T1, T2).
-
-
13. The method as recited in claim 1, further comprising providing said context window having a window size that is a function of an average sentence length.
-
14. The method as recited in claim 1, further comprising providing said context window having a window size that is a function of an avenge paragraph length.
-
15. The method as recited in claim 1, further comprising providing said context window having a window size that is a pre-selected number of terms.
-
16. The method as recited in claim 1, further comprising:
-
providing a relation threshold value for a selected one of said subset plurality of types of RSMs; and
eliminating all relations having a value of said selected type of RSM less than the relation threshold value.
-
-
17. The method as recited in claim 1, further comprising:
-
selecting one of said subset plurality of types of RSMs; and
selecting a pre-selected number of relations having a greatest value of the selected type of RSM from at least one of the said plurality of relational models of said subsets.
-
-
18. The method as recited in claim 1, further comprising choosing said first query to include one or more query fields.
-
19. The method as recited in claim 18, further comprising creating a relational model of said first query by a process comprising:
creating one or more relational models of said one or more query fields wherein, each of said one or more relational models of said one or more query fields includes at least one relational model of one of said one or more query fields in the said first query, wherein each one of said one or more relational models of said one or more query fields has one or more relations; and
p1 combining said one or more relational models of said one or more query fields in said first query into a first query relational model.
-
20. The method as recited in claim 1, further comprising providing one or more stopterms, wherein, if neither a first term in said first query term pair nor a second term in said first query term pair is one of the one or more stopterms said RSMs are increased.
-
21. The method as recited in claim 1, further comprising, providing one or more stopterms, wherein, if both a first term in said first query term pair and a second term in said first query term pair are included in the one or more stopterms said RSMs are decreased.
-
22. The method as recited in claim 1, further comprising, providing one or more stopterms, wherein, if either but not both a first term in said first query term pair and a second term in said first query term pair is one of the one or more stopterms said RSMs are unchanged.
-
23. The method as recited in claim 1, further comprising providing one or more emphasis terms, wherein, if neither a first term in said first query term pair nor a second term in said first query term pair is one of the one or more emphasis terms said RSMs are decreased.
-
24. The method as recited in claim 1, further comprising providing one or more emphasis terms, wherein, if both a first term in said first query term pair and a second term in said first query term pair are included in the one or more emphasis terms said RSMs are increased.
-
25. The method as recited in claim 1, further comprising providing one or more emphasis terms, wherein, if either but not both a first term in said first query term pair and a second term in said first query term pair is one of the one or more emphasis terms said RSMs are unchanged.
-
26. The method as recited in claim 1, further comprising:
-
providing one or more stop relations, wherein each of the stop relations includes a first term and a second term and a plurality of types of relational metrics; and
eliminating the one or more stop relations from the relational model of said first query.
-
-
27. The method as recited in claim 1, further comprising inputting said first query by a process comprising transforming the first query.
-
28. The method as recited in claim 27, transforming said first query by a process comprising at least one of a group process consisting of:
-
not changing said first query; and
replacing a selected portion of said first query with an alternate portion from a substitution list.
-
-
29. The method as recited in claim 28, further comprising cross referencing said alternate portion to said selected portion of said first query in a look-up table.
-
30. The method as recited in claim 29, further comprising choosing said look-up table to comprise[[s]]:
-
one or more non-empty hash chains, wherein each of the one or more non-empty bash chains corresponds to a first section of said selected portion of said first query and each of the one or more hash chains has one or more phrases, each phrase consisting of one or more of said terms, beginning with the first section of said selected portion of said first query; and
one or more alternate portions, wherein each one of the one or more alternate portions corresponds to one of the one or more phrases.
-
-
31. The method as recited in claim 1 further comprising choosing at least one identifier of said subsets to correspond to at least one subsets of said database.
-
32. The method as recited in claim 1 further comprising choosing said database to included at least one of a group consisting of;
- text, narratives, reports, literature, punctuation, messages, electronic mail, internet text, web site information, linguistic patterns, grammatical tags, alphabetic data, alphabetic strings, numeric data, numeric strings, alphanumeric data, alphanumeric strings, sound, music, voice, audio data, audio encoding, vocal encoding, biological information, biological data, biological representations, biological analogs, medical information, medical data, medical representations, medical sequences, medical patterns, genetic sequences, genetic representations, genetic analogs, protein sequences, protein representations, protein analogs, computer software, computer hardware, computer firmware, computer input, computer internal information, computer output, computer representations, computer analogs, sequential symbols, sequential data, sequential items, sequential objects, sequential events, sequential causes, sequential lime spans, sequential actions, sequential attributes, sequential entities, sequential relations, sequential representations, patterned symbols, patterned data, patterned items, patterned objects, patterned events, patterned causes, patterned time spans, patterned actions, patterned attributes, patterned entities, patterned relations, and patterned representations.
-
33. A method of searching a database comprising:
-
providing a plurality of relational models wherein each of the plurality of relational models includes one relational model of at least one subset of a database;
inputting a first query, having one or more query fields, for the database;
creating a relational model of the first query, wherein the relational model of the first query includes at least one relation, each relation having a first query term pair and a first query plurality of types of relational summation metrics, by a process comprising;
creating one or more relational models of the one or more query fields wherein each of said one or more relational models of the one or more query fields includes at least one relational model of one of the one or more query fields in the first query, wherein each of the one or more relational models of the one or more query fields has one or more relations; and
combining the one or more relational models of the one or more query fields in the first query into a first query relational model by a process comprising;
analyzing a first one of the one or more relational models of the one or more query fields including;
determining if a first relation from the first one of the one or more relational models of the one or more query fields is included in the first query model by a process comprising;
selecting a first relation from the first one of the one or more relational models of the one or more query fields, wherein the selected first relation includes a first term pair;
determining if the first term pair is included in one of the one or more relations in the first query model;
when the first term pair is not included in one of the one or more relations in the first query model, then including the selected first relation in the first query model; and
when the first term pair is included in one of the one or more relations in the first query model, comparing a first order of the first term pair in the selected first relation wit a second order of the first term pair in the relation from the first query model containing the first term pair;
when the first order and the second order are the same, combining a plurality of types of Relational Summation Metrics (RSMs) of the selected first relation in the first query field model, with a corresponding plurality of types of RSMs of the relation containing the first term pair in the first query model; and
when the first order and the second order are not the same, reversing the order of the term pair in the selected first relation and exchanging a right directional RSM of the selected first relation with a left directional RSM of the selected first relation; and
combining a plurality of types of RSMs of the selected first relation in the first query field model, with a corresponding plurality of types of RSMs of the relation containing the first term pair in the first query model; and
determining if a subsequent relation from the first one of the one or more relational models of the one or more query fields is included in the first query model; and
analyzing a subsequent one of the one or more relational models of the one or more query fields; and
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
calculating a plurality of first relevance metric values corresponding to each of the subsets; and
outputting at least one identifier of the subsets relevant to the first query. - View Dependent Claims (34, 35, 36, 37)
selecting one of said plurality of types of RSMs;
selecting said relation from either of said first query field model or said first query model, wherein said selected relation has a greatest magnitude of said selected type of RSM; and
replacing said relation containing said first term pair in said first query model with the selected relation.
-
-
35. The method as recited in claim 33 further comprising selected one of said plurality of said types of said relevance metrics to include at least one of a group consisting of:
-
a combination of types of said relevance metrics;
a weighted sum of types of said relevance metrics; and
a weighted product of types of said relevance metrics.
-
-
36. The method as recited in claim 33, further comprising combining said plurality of type of RSMs of said selected first relation in said query field model, with said corresponding plurality of types of RSMs of said relation containing said first term pair in first query model includes:
-
calculating a summation of value of said corresponding plurality of types of RSMs from said relation containing the first term pair in both said first query field model and said first query model; and
replacing said plurality of types of RSMs for the relation containing said first term pair in said first query model with the summation of values of said corresponding plurality of types of RSMs.
-
-
37. The method as recited in claim 33, further comprising:
-
selecting at least one of said one or more query fields in said first query; and
assigning a weight to the selected query field, wherein each one of said plurality of types of RSMs corresponding to the selected query field is scaled by a factor determined by a weight.
-
-
38. A method of searching a database comprising:
-
providing a plurality of relational models wherein each of the plurality of relational models includes one relational model of at least one subset of a database inputting a first query, having one or more query fields, for the database;
creating a relational model of the first query, wherein the relational model of the query includes at least one relation having a first query term pair and a first query plurality of types of relational summation metrics, by a process comprising;
creating one or more relational models of said one or more query fields wherein each of said one or more relational models of said one or more query fields includes at least one relational model of one of said one or more query fields in the first query, wherein each of said one or more relational models of said one or more query fields has one or more relations; and
calculating for each one of the one or more relations in each one of the one or more relational models of the one or more first query fields a summation of values of each of the corresponding types of the relational metrics of each one of one or more occurrences of a first query term pair within the query field, wherein, the plurality of types of the relational metrics include at least one of a non-directional contextual metric (NDCM), a right contextual metric (RCM), a left contextual metric (LCM), and a directional contextual metric (DCM); and
combining the one or more relational models of the one or more query fields in the first query into a first query relational model; and
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
calculating a plurality of first relevance metric values corresponding to each of the subsets;
outputting at least one identifier of the subsets relevant to the first query.
-
-
39. A method of searching a database comprising:
-
providing a plurality of relational models wherein each of the plurality of relational models includes one relational model of at least one subset of a database;
inputting a first query, having one or more query fields, for the database;
creating a relational model of the first query, wherein the relational model of the first query includes at least one first query relation having a first query term pair and a first query plurality of types of relational summation metrics, by a process comprising;
creating one or more relational models of the one or more query fields wherein each of said one or more relational models of the one or more query fields includes at least one relational model of one of the one or more query fields in the first query, wherein each of the one or more relational models of the one or more query fields has one or more relations; and
combining the one or more relational models of the one or more query fields in the first query into a first query relational model;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets by a process comprising;
calculating a plurality of first relevance metrics for a first one of said plurality of relational models of said subsets by a process comprising;
determining an intersection model of said relational model of said first query and a first one of said plurality of relational models of said subsets by a process comprising;
determining one or more intersection relations, wherein each of the intersection relations has;
a shared term pair that includes a term pair present in at least one relation in each one of said first query relational model and the first one of said plurality of the relational models of said subsets; and
a plurality of intersection metrics (IM), wherein each IM is a function fct(RSMQ1, RSMS1), wherein;
RSMQ1 is a type of Relational Summation Metric (RSM) in the relational model of said first query; and
RSMS1 is a corresponding type of said RSM in the relational model of the first one of said plurality of relational models of said subsets; and
calculating a first relevance metric for each of the plurality of types of said RSMs equal to a function of the plurality of corresponding IMs of all intersection relations; and
determining a subsequent plurality of first relevance metrics corresponding to each subsequent one of said plurality of relational models of said subsets; and
outputting a first list of one or more identifiers of the subsets relevant to the first query, wherein each identifier has a corresponding type of first relevance metric for each of the plurality of types of said RSM. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66)
determining a first order of said shared term pair in said first query relational model and a second order of said shared term pair in said first one of said plurality of the relational models of said subsets; and
reversing the second order and exchanging an RCM and an LCM of the subset relation having said shared term pair in the first one of said plurality of the relational models of said subsets, when the first order and second order are not equal.
-
-
41. The method as recited in claim 39, further comprising choosing said function fct(RSMQ1, RSMS1)=(RSMQ1)*(RSMS1).
-
42. The method as recited in claim 39, further comprising applying a scale factor to said fct(RSMQ1, RSMS1).
-
43. The method as recited in claim 39, further comprising choosing said function of said plurality of corresponding IMs of all intersection relations to include a summation of said plurality of corresponding IMs of all intersection relations.
-
44. The method as recited in claim 39, further comprising choosing said function of said plurality of corresponding IMs of all intersection relations to include a summation of values of each of said plurality of types of RSMQ1 in all of said one or more first query relations having said shared term pair included in said one or more intersection relations.
-
45. The method as recited in claim 39, further comprising calculating said plurality of first relevance metrics for said first one of said plurality of relational models of said subsets by a process comprising assigning a zero relevance to the first one of the plurality of subsets if all term pairs of said relational model of said first query are not included in said relational model of the first subset.
-
46. The method as recited in claim 39, wherein determining said intersection model further comprises:
applying a scaling factor to said function of said plurality of corresponding intersection metrics.
-
47. The method as recited in claim 46, further comprising choosing said scaling factor to be a subset emphasis factor (SEF)=SS/R, wherein SS is equal to a summation of values of a selected said type of relational summation metric (RSM) from all subset relations having one of said shared term pairs in the first one of said plurality of relational models of said subsets and R is equal to a summation of values of the selected said type of relational summation metric (RSM) in all of said subset relations in the first one of said plurality of relational models of said subsets.
-
48. The method as recited in claim 46, further comprising choosing said scaling factor to be a query emphasis factor (QEF)=SQ/Q, wherein SQ is equal to a summation of values of a selected said type of relational summation metric (RSM) from all of said query relations having one of said shared term pairs in said relational model of the said first query and Q is equal to a summation of values of the selected said type of relational summation metric (RSM) in all of said query relations in said relational model of said first query.
-
49. The method as recited in claim 46, further comprising choosing said scaling factor to be a length emphasis factor (LEF)=LS/T, wherein LS is equal to a number of terms in said subset and T is equal to a number greater than a number of terms in a largest subset of said database.
-
50. The method as recited in claim 46, further comprising choosing said scaling factor to be an alternate length emphasis factor (LEFalt)=Lcap/T, wherein, Lcap is equal to the lesser of either a number of terms in said subset or an average number of terms in each one of said plurality of subsets, and T is equal to a number greater than a number of terms in a largest subset of said database.
-
51. The method as recited in claim 39, wherein outputting said at least one identifier of said subsets relevant to said first query comprises:
-
outputting a plurality of types of said relevance metrics corresponding to each of said subsets;
selecting one of said plurality of types of said relevance metrics; and
sorting identifiers of said subsets in order of magnitude of the selected one of said plurality of types of relevance metrics.
-
-
52. The method as recited in claim 51, further comprising choosing said selected one of said plurality of types of relevance metrics to include at least one of a group consisting of:
-
a combination of types of said relevance metrics;
a weighted sum of types of said relevance metrics; and
a weighted product of types of said relevance metrics.
-
-
53. The method as recited in claim 51, further comprising mormalizing each of said plurality of corresponding intersection metrics of all of said intersection relations.
-
54. The method as recited in claim 51, further comprising outputting said relational model of said first query.
-
55. The method as recited in claim 51, further comprising displaying a pre-selected number of said subsets in order of magnitude of said selected type of relevance metric.
-
56. The method as recited in claim 55, further comprising highlighting one or more said shared term pairs in each of said subsets relevant to said first query, wherein the terms within each of said one or more shared tern pairs, occur within at least one context window in the subset.
-
57. The method as recited in claim 56, further comprising choosing one or more of said shared term pairs to consist of said shared term pairs having a greatest magnitude of a selected type of said relevance metric.
-
58. The method as recited in claim 51, further comprising displaying one or more of said shared term pairs that are included in each one of said subsets relevant to said first query, wherein terms within each one of said one or more shared term pairs occur within at least one context window in the subset.
-
59. The method as recited in claim 58, further comprising determining a typical order of each of said shared term pairs by process comprising:
-
comparing a magnitude of an RCM of said shared term pair to a magnitude of an LCM of said shared term pair;
when the RCM is larger said shared term pair is in typical order;
when the LCM is larger, reverse an order of said shared term pair and exchange the RCM and the LCM.
-
-
60. The method as recited in claim 58, further comprising, for each one of said one or more shared term pairs, displaying a feedback metric of the query (FBMQ1) equal to a combination of an LCMQ1 and an RCMQ1 and displaying a feedback metric of a subset (FBMS1) equal to a combination of an LCMS1 and an RCMS1 and a product equal to wherein the LCMQ1 is equal to a left contextual metric of said shared term pair in said query, the RCMQ1 is equal to a right contextual metric of said shared term pair in said query, LCMS1 is equal to a left contextual metric of said shared term pair in said subset and the RCMS1 is equal to a right contextual metric of said shared term pair in said subset.
-
61. The method as recited in claim 58, further comprising choosing said plurality of said shared term pairs to consist of one or more of said shared term pairs having a greatest magnitude of a selected type of said relevance metric.
-
62. The method as recited in claim 39, further comprising:
-
inputting a second query;
creating a relational model of the second query;
comparing the relational model of the second query to each one of said plurality of relational models of said subsets;
outputting a second list of one or more identifier of said subsets relevant to the second query; and
determining a plurality of combined relevance metric values by combining, for each of said types of RSM, values of a second plurality of said relevance metrics for the second query with values of said first plurality of relevance metrics for said first query.
-
-
63. A method as recited in claim 62, further comprising determining a third list of one or more identifiers of said subsets to consist of at least one identifier of said subsets present in both of said first and second lists of one or more identifiers of said subsets, wherein, for a selected one of said types of RSM, said combined relevance metric values are greater than zero for each of the identifiers included in the third list of identifiers.
-
64. A method as recited in claim 63 further comprising computing at least one of said combined relevance metric values by a process comprising calculating a product of values of a first type of first relevance metric and a first type of a second relevance metric.
-
65. A method as recited in claim 62, further comprising determining a third list of one or more identifiers of said subsets to consist of at least one identifier of said subsets present in either or both of said first and second lists of one or more identifiers of said subsets, wherein, for a selected one of said types of RSM, said combined relevance metric values are greater than zero for each of the identifiers included in the third list of identifiers.
-
66. A method as recited in claim 65, further comprising computing at least one of said combined relevance metric values by a process comprising calculating a summation of values of a first type of first relevance metric and a first type of a second relevance metric.
Specification