System, method and apparatus for conducting a keyterm search
First Claim
Patent Images
1. A method of searching a database comprising:
- providing a plurality of relational models, wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one term pair and one or more types of relational summation metrics (RSMs), each RSM type including a summation of values of the corresponding type of relational metric of occurrences of the at least one term pair within at least one context window within the at least one database subset and includes at least one of a right contextual metric (RCM) and a left contextual metric (LCM);
inputting a first query for the database;
creating a relational model of the first query;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting a first plurality of identifiers of the subsets relevant to the first query.
1 Assignment
0 Petitions
Accused Products
Abstract
A keyterm search is a method of searching a database for subsets of the database that are relevant to an input query. First, a number of relational models of subsets of a database are provided. A query is then input. The query can include one or more keyterms. Next, a gleaning model of the query is created. The gleaning model of the query is then compared to each one of the relational models of subsets of the database. The identifiers of the relevant subsets are then output.
207 Citations
65 Claims
-
1. A method of searching a database comprising:
-
providing a plurality of relational models, wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one term pair and one or more types of relational summation metrics (RSMs), each RSM type including a summation of values of the corresponding type of relational metric of occurrences of the at least one term pair within at least one context window within the at least one database subset and includes at least one of a right contextual metric (RCM) and a left contextual metric (LCM);
inputting a first query for the database;
creating a relational model of the first query;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting a first plurality of identifiers of the subsets relevant to the first query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
T1 is a first term in said term pair, T2 is a second term in said term pair;
C is equal to a number of terms in said context window; and
N is equal to a number of terms occurring between T1 and T2.
-
-
7. The method as recited in claim 1, further comprising providing that, where said RCM has a single occurrence of said term pair (T1, T2) in said at least one subset, said RCM has a value RCM(T1, T2), wherein:
-
T1 is a first term in said term pair;
T2 is a second term in said term pair;
RCM(T1, T2)=0, if T2 precedes T1; and
RCM(T1, T2)=C−
1−
N, if T1 precedes T2, whereinC is equal to a number of terms in said context window; and
N is equal to a number of terms occurring between T1 and T2.
-
-
8. The method as recited in claim 1, further comprising providing that, where said LCM has a single occurrence of said term pair (T1, T2) in said at least one subset said LCM has a value LCM(T1, T2), wherein:
-
T1 is a first term in said term pair;
T2 is a second term in said term pair;
LCM(T1, T2)=0, if T2 follows T1; and
LCM(T1, T2)=C−
1−
N, if T1 follows T2, wherein;
C is equal to a number of terms in said context window; and
N is equal to a number of terms occurring between T1 and T2.
-
-
9. The method as recited in claim 1, further comprising providing for said context window to have a window size that is a function of an average sentence length in said database.
-
10. The method as recited in claim 1, further comprising providing for said context window to have a window size that is a function of an average paragraph length in said database.
-
11. The method as recited in claim 1, further comprising providing for said context window to have a window size that is a pre-selected number of terms.
-
12. The method as recited in claim 1, further comprising:
-
providing a relation threshold value for a selected one of said one or more types of RSMs; and
eliminating all relations having a value of the selected type of said RSM that is less than the relation threshold value.
-
-
13. The method as recited in claim 1, further comprising:
-
selecting one of said one or more types of RSMs;
selecting a pre-selected number of relations having a greatest value of the selected type of RSM from at least one of said plurality of relational models of said subsets.
-
-
14. The method as recited in claim 1, wherein, each one of said plurality of identifiers of subsets corresponds to one of said plurality of said subsets.
-
15. The method as recited in claim 1 further comprising choosing said database to include at least one of a group consisting of:
- text, narratives, reports, literature, punctuation, messages, electronic mail, internet text, web site information, linguistic patterns, grammatical tags, alphabetic data, alphabetic strings, numeric data, numeric strings, alphanumeric data, alphanumeric strings, sound, music, voice, audio data, audio encoding, vocal encoding, biological information, biological data, biological representations, biological analogs, medical information, medical data, medical representations, medical sequences, medical patterns, genetic sequences, genetic representations, genetic analogs, protein sequences, protein representations, protein analogs, computer software, computer hardware, computer firmware, computer input, computer internal information, computer output, computer representations, computer analogs, sequential symbols, sequential data, sequential items, sequential objects, sequential events, sequential causes, sequential time spans, sequential actions, sequential attributes, sequential entities, sequential relations, sequential representations, patterned symbols, patterned data, patterned items, patterned objects, patterned events, patterned causes, patterned time spans, patterned actions, patterned attributes, patterned entities, patterned relations, and patterned representations.
-
16. The method as recited in claim 1, further comprising transforming said first query that is inputted.
-
17. The method as recited in claim 16, wherein said process of transforming said first query comprises at least one of a group of processes consisting of:
-
not changing said first query; and
replacing a selected portion of said first query with an alternative portion from a substitution list.
-
-
18. The method as recited in claim 17 further comprising cross referencing said alternative portion to said selected portion of said first query in a look-up table.
-
19. The method as recited in claim 18, further comprising providing said look-up table with:
-
one or more non-empty hash chains, wherein each of the one or more non-empty hash chains corresponds to a first section of said selected portion of said first query and each of the one or more non-empty hash chains has one or more phrases, each phrase consisting of one or more of said terms, beginning with a first section of said selected portion of said first query; and
one or more alternative portions, wherein each one of the one or more alternative portions corresponds to one of the one or more phrases.
-
-
20. The method as recited in claim 1, wherein providing said plurality of relational models comprises transforming each one of said plurality of said subsets of said database.
-
21. The method as recited in claim 1, wherein creating said relational model of said first query comprises expanding said first query.
-
22. The method as recited in claim 21, wherein further comprising expanding said first query by a process comprising:
-
comparing said first query to a selection of said plurality of models of said subsets of said database;
extracting a plurality of matching relations from said models of said subsets of said database wherein each one of said matching relations comprises;
a term pair comprising;
a term matching a term in said first query; and
a related term; and
one or more types of RSMs, each RSM type including a summation of values of a corresponding type of relational metric of occurrences of said at least one term pair within said subset.
-
-
23. The method as recited in claim 22, further comprising including, in said term matching said term in said first query, at least one of a group of terms consisting of:
-
a term that is identical to at least one term in said first query; and
a term that contains at least one term in said first query.
-
-
24. The method as recited in claim 22, further comprising reducing said plurality of matching relations to a plurality of unique relations.
-
25. The method as recited in claim 24, further comprising reducing said plurality of matching relations to said plurality of unique relations by a process comprising:
-
selecting one of said plurality of matching relations; and
determining if a term pair from the selected matching relation is included in one of said plurality of unique relations;
when the term pair is not included in one of said plurality of unique relations, including said matching relation among said plurality of unique relations; and
when the term pair is included in a selected one of said plurality of unique relations, comparing a first order of the term pair in the selected matching relation and a second order of the term pair in the selected unique relation;
when the first order and the second order of the term pair are the same, replacing said one or more types of RSMs of the selected unique relation with a summation of corresponding types of RSMs of the matching relation and the corresponding types of RSMs of the selected unique relation; and
when the first order and the second order of the term pair are not the same;
reversing the order of the term pair in the matching relation;
exchanging a right directional RSM of the matching relation with a left directional RSM of the matching relation; and
replacing said one or more types of RSMs for the selected unique relation with a summation of corresponding types of RSMs of the matching relation and the corresponding types of RSMs of the selected unique relation having the term pair.
-
-
26. The method as recited in claim 25, further comprising reducing said plurality of matching relations by a process comprising eliminating each one of said plurality of matching relations having a value of a corresponding type of RSM that is less than VT, wherein VT is a threshold value.
-
27. The method as recited in claim 25, further comprising reducing said plurality of matching relations by a process comprising:
-
extracting matching relations from a pre-selected plurality of relational models; and
eliminating each of said plurality of matching relations having a value of a corresponding type of RSM that is less than VT, wherein VT is a threshold value.
-
-
28. The method as recited in claim 25, further comprising reducing said plurality of matching relations by a process that comprises:
-
eliminating each one of said plurality of matching relations having a value of a corresponding type of RSM that is less than VT, wherein VT is a threshold value; and
selecting a pre-selected quantity of said matching relations having a greatest value of the corresponding type of RSM.
-
-
29. The method as recited in claim 24, further comprising sorting said plurality of unique relations in order of prominence, wherein prominence is equal to a magnitude of a value of a selected metric.
-
30. The method as recited in claim 22, further comprising determining a typical order of said term pair for each one of said plurality of matching relations.
-
31. The method as recited in claim 30, further comprising determining said typical order of said term pair for each one of said plurality of matching relations by a process that comprises:
-
comparing a magnitude of an RCM value of said matching relation to a magnitude of an LCM value of said matching relation;
when the RCM value is larger than the LCM value, the term pair of the matching relation is in a typical order; and
when the LCM value is larger than the RCM value, reversing the order of the term pair in the matching relation and exchanging the RCM value and the LCM value.
-
-
32. A method of searching a database, the method comprising:
-
providing a plurality of relational models, wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one term pair and one or more types of relational summation metrics (RSMs);
inputting a first query for the database;
creating a relational model of the first query;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets by a process comprising;
determining a plurality of first relevance metrics for a first one of the plurality of relational models of the subsets by a sub-process comprising;
determining an intersection model of the relational model of the first query and a first one of the plurality of relational models of the subsets by a process comprising;
determining a plurality of intersection relations, wherein each one of the plurality of intersection relations has;
a shared term pair, which includes a term pair present in at least one relation in each of the first query relational models and the first one of the plurality of the relational models of the subsets; and
a plurality of intersection metrics (IMs), each one of the plurality of intersection metrics being expressible as IM=fct(RSMQ1, RSMS1), wherein;
fct is a selected function of at least one of two arguments, RSMQ1 and RSMS1,RSMQ1 is a value of a type of relational summation metric in the relational model of the first query, and RSMS1 is a value of a corresponding type of relational summation metric in the relational model of the first one of the plurality of relational models of the subsets; and
calculating a first relevance metric value for each type of RSM, equal to a summation of the plurality of corresponding IM values of all intersection relations; and
determining a subsequent plurality of first relevance metric values corresponding to each subsequent one of the plurality of relational models of the subsets; and
outputting a first plurality of identifiers of the subsets relevant to the first query. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59)
applying a scaling factor to said summation of said plurality of corresponding IM values.
-
-
37. The method as recited in claim 36, further comprising selecting said scaling factor to be a subset emphasis factor (SEF)=SS/R, wherein SS is equal to a sum of values of a selected type of relational metric from said relational model of said subset for all shared relations and R is equal to a sum of values of the selected type of relational metric in said relational model of said subset.
-
38. The method as recited in claim 36, further comprising selecting said scaling factor to be a query emphasis factor (QEF)=SQ/Q, wherein SQ is equal to a sum of values of a selected type of relational metric from said relational value of said first query for all shared relations and Q is equal to a sum of values of the selected type of relational metric in said relational model of said first query.
-
39. The method as recited in claim 36, further comprising selecting said sealing factor to be a length emphasis factor (LEF)=Ls/T wherein Ls is equal to a number of terms in said subset and T is equal to a number greater than a number of terms in a largest subset of said database.
-
40. The method as recited in claim 36, further comprising selecting said scaling factor to be an alternate length emphasis factor (LEFalt)=Lcap/T wherein Lcap is equal to the lesser of either a number of terms in said subset or an average number of terms in each one of said plurality of subsets, and T is equal to a number greater than a number of terms in a largest subset of said database.
-
41. The method as recited in claim 32, further comprising outputting a plurality of identifiers of subsets relevant to said first query by a process comprising:
-
outputting a plurality of types of relevance metric values corresponding to each one of said plurality of subsets;
selecting one of the plurality of types of relevance metrics;
sorting the plurality of identifiers of subsets in order of magnitude of values of the selected type of relevance metric; and
outputting the plurality of identifiers of subsets in order of magnitude of values of the selected type of relevance metric.
-
-
42. The method as recited in claim 41, further comprising selecting one of said plurality of the types of relevance metrics from a group of metrics consisting of:
-
a combination of types of relevance metrics;
a weighted sum of types of relevance metrics; and
a weighted product of types of relevance metrics.
-
-
43. The method as recited in claim 41, further comprising normalizing each one of said plurality of corresponding intersection metrics of all intersection relations.
-
44. The method as recited in claim 41, further comprising outputting said relational model of said first query.
-
45. The method as recited in claim 41, further comprising displaying a pre-selected number of subsets in order of magnitude of values of said selected type of relevance metric.
-
46. The method as recited in claim 45, further comprising highlighting one or more of said shared term pairs in each of one or more of said plurality of subsets, wherein terms within each of the one or more of said shared term pairs occur within at least one context window.
-
47. The method as recited in claim 46, further comprising selecting said one or more shared term pairs to comprise said one or more shared term pairs having a greatest magnitude of a value of a selected type of said relevance metric.
-
48. The method as recited in claim 41, further comprising displaying one or more of said shared term pairs with each of one or more of said plurality of subsets, wherein terms within each of said one or more of shared term pairs occur within at least one context window in the subset.
-
49. The method as recited in claim 48, further comprising displaying, for each of said plurality of shared term pairs values, NDCMQ1 and NDCMS1, and a product equal to S * (ln|NDCMQ1|) * (ln |NDCMS1|), wherein:
-
NDCMQ1 is equal to a non-directional contextual metric value of said shared term pair in said query; and
NDCMS1 is equal to a non-directional contextual metric value of said shared term pair in said subset, wherein NDCMS1>
1;
NDCMQ1|>
1;
S=1 if NDCMQ1>
1; and
S=−
1 if NDCMQ1<
−
1.
-
-
51. The method as recited in claim 48, further comprising choosing said one or more shared term pairs to comprise one or more of said shared term pairs having a greatest magnitude of a value of a selected type of said relevance metric.
-
52. The method as recited in claim 32, further comprising:
-
inputting a second query;
creating a relational model of the second query;
comparing the relational model of the second query to each of said plurality of relational models of said subsets;
outputting a second plurality of identifiers of said subsets relevant to the second query; and
determining a plurality of combined relevance metric values by combining a second plurality of second relevance metric values for the second query with said plurality of first relevance metric values for said first query.
-
-
53. A method as recited in claim 52, further comprising determining a third plurality of identifiers of said subsets consisting of identifiers of said subsets present in both of said first and second pluralities of subsets, wherein said combined relevance metric values are greater than zero for each of said identifiers of said subsets that is present in both said first plurality of identifiers of said subsets and said second plurality of identifiers of said subsets.
-
54. A method as recited in claim 53, further comprising combining each of said combined relevance metric values by a process comprising calculating a product of a first type of said first relevance metric values and a first type of said second relevance metric values.
-
55. A method as recited in claim 52, further comprising determining a third plurality of identifiers of said subsets consisting of identifiers of said subsets present in at least one of said first and said second plurality of subsets, =wherein said combined relevance metric values are greater than zero for each of said identifiers of said subsets in at least one of said first plurality of identifiers of said subsets and said second plurality of identifiers of said subsets.
-
56. A method as recited in claim 55, further comprising combining said relevance metric values by a process comprising calculating a summation of a first type of said first relevance metric values and a first type of said second relevance metric values.
-
57. The method as recited in claim 32, further comprising, outputting a representation from the group of representations consisting of:
-
a representation of said database;
a representation of said plurality of relational models;
a representation of said first query;
a representation of a plurality of said intersection models;
a representation of a plurality of said subsets relevant to said first query; and
a representation of a plurality of subsets of said database not included among said subsets relevant to said first query.
-
-
58. The method of claim 32, further comprising choosing said function IM=fct(RSMQ1, RSMS1) to be IM=RSMQ1+RSMS1.
-
59. THe method of claim 32, further ocmprising choosing said function Im=fct(RSMQ1, RSMS1) to be IM=RSMQ1.
-
50. The method as recited in ciaim 48, further comprising displaying, for each of said plurality of shared term pairs values, NDCMQ1 and NDCMS1, and a product equal to (ln NDCMQ) * (ln NDCMS1), wherein:
-
NDCMQ1 is equal to a non-directional contextual metric value of the shared term pair in said query;
NDCMS1 is equal to a non-directional contextual metric value of the shared term pair in said subset, wherein NDCMS1>
1, andNDCMQ1>
1.
-
-
60. A method of producing a model of a database comprising:
-
providing a database;
calculating a plurality of relations wherein, each one of the plurality of relations has a term pair and a plurality of types of relational summation metrics (RSMs), and wherein, each one of the plurality of RSMs includes a summation of the corresponding types of relational metrics of each one of a plurality of occurrences of the term pair within a context window within the database, wherein the types of relational metrics include;
a non-directional contextual metric;
a right contextual metric;
a left contextual metric; and
a directional contextual metric; and
outputting a model of the database. - View Dependent Claims (61)
-
-
62. A method of searching a database comprising:
-
providing a plurality of relational models, wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one term pair and one or more types of relational summation metrics (RSMs), each RSM type including a summation of values of a corresponding type of relational metric of occurrences of the at least one term pair within at least one context window within the at least one database subset and includes at least one of a right contextual metric (RCM), a left contextual metric (LCM) and a directional contextual metric (DCM), wherein a DCM value for a single occurrence of a term pair (T1, T2) in the at least one subset is;
DCM(T1, T2)=RCM(T1, T2)−
LCM(T1, T2), wherein;
T1 is a first term in the term pair;
T2 is a second term in the term pair;
RCM(T1, T2) is a right contextual metric value for the single occurrence of the term pair (T1, T2) in the at least one subset;
LCM(T1, T2) is a left contextual metric value for the single occurrence of a term pair (T1, T2) in the at least one subset; and
RCM(T1, T2)≧
LCM(T1, T2);
inputting a first query for the database;
creating a relational model of the first query;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting a first plurality of identifiers of the subsets relevant to the first query.
-
-
63. A method of searching a database comprising:
-
providing a plurality of relational models, wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one term pair and one or more types of relational summation metrics (RSMs), each RSM type including a summation of values of a corresponding type of relational metric of occurrences of the at least one term pair within at least one context window within the at least one database subset and includes at least one scaled frequency metric (SFM) that is defined by;
SFM=(C−
1−
N) * ((2 FM−
F1−
F2)/(2 FM));
C is equal to a number of terms in the context window;
N is equal to a number of terms occurring between a first term and a second term of the term pair;
FM is equal to a frequency of occurrences of a most frequent term in the database;
F1 is equal to a frequency of occurrences of the first term of the term pair in the database; and
F2 is equal to a frequency of occurrences of the second term of the term pair in said database; and
inputting a first query for the database;
creating a relational model of the first query;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting a first plurality of identifiers of the subsets relevant to the first query. - View Dependent Claims (64)
-
-
65. A method of searching a database comprising:
-
providing a plurality of relational models, wherein each of the plurality of relational models includes a relational model of at least one subset of a database and a plurality of relations, wherein each of the plurality of relations includes at least one term pair and one or more types of relational summation metrics (RSMs), each RSM type including a summation of values of a corresponding type of relational metric of occurrences of the at least one term pair within at least one context window within the at least one database subset and includes at least one of a right contextual metric (RCM), a left contextual metric (LCM) and a non-directional contextual metric (NDCM), wherein an NDCM value for a single occurrence of a term pair (T1, T2) in the at least one subset is NDCM(T1, T2)=RCM(T1, T2)+LCM(T1, T2), wherein;
T1 is a first term in the term pair;
T2 is a second term in the term pair;
RCM(T1, T2) is a right contextual metric value for the single occurrence of the term pair (T1, T2) in the at least one subset; and
LCM(T1, T2) is a left contextual metric value for the single occurrence of the term pair (T1, T2) in the at least one subset;
inputting a first query for the database;
creating a relational model of the first query;
comparing the relational model of the first query to each one of the plurality of relational models of the subsets; and
outputting a first plurality of identifiers of the subsets relevant to the first query.
-
Specification