Text-searching code, system and method
First Claim
1. A computer-executed method for matching a target document in the form of a digitally encoded natural-language text with a plurality of sample texts, comprising the steps of:
- (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), determining a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and (b) representing the document as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term, (c) determining for each of a plurality of sample texts, a match score related to the number of terms present in or derived from that text that match those in the target document, and (d) selecting one or more of the sample texts having the highest match scores.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are a computer-readable code, system and method for comparing a target concept, invention, or event with each of a plurality of texts. Each of a plurality of non-generic words and optionally, words groups characterizing the target concept, invention, or event, is selected as a vector term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of a term is a measure of the field-specificity of that term. There is then determined, for each of the plurality of texts, a match score related to the number of vector terms present in or derived from that text that match those in the target concept, invention, or event. Texts having the highest match scores are selected.
-
Citations
18 Claims
-
1. A computer-executed method for matching a target document in the form of a digitally encoded natural-language text with a plurality of sample texts, comprising the steps of:
-
(a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), determining a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and (b) representing the document as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term, (c) determining for each of a plurality of sample texts, a match score related to the number of terms present in or derived from that text that match those in the target document, and (d) selecting one or more of the sample texts having the highest match scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An automated system for matching a target document in the form of a digitally encoded natural-language text with a plurality of sample texts, comprising
(1) a computer, (2) accessible by said computer, a database of word records, where each record includes text identifiers of the library texts that contain that word, associated library identifiers for each text, and optionally, one or more selectivity values for each word, where the selectivity value of a term in a library of texts in a field is related to the frequency of occurrence of that term in said library, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, (3) a computer readable code which is operable, under the control of said computer, to perform steps comprising: -
(a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), determining a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and (b) representing the document as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term, (c) determining for each of a plurality of sample texts, a match score related to the number of vector terms present in or derived from that text that match those in the target document, and (d) selecting one or more of the sample texts having the highest match scores. - View Dependent Claims (14, 15, 16, 17)
-
-
18. Computer readable code for use with an electronic computer and a database word records for matching a target document in the form of a digitally encoded natural-language text with a plurality of sample texts, where each record in the word records database includes text identifiers of the library texts that contain that word, an associated library identifier for each text, and optionally, one or more selectivity values for each word, where the selectivity value of a term in a library of texts in a field is related to the frequency of occurrence of that term in said library, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, said code being operable, under the control of said computer, to perform steps comprising:
-
(a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), determining a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and (b) representing the document as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term. (c) determining for each of a plurality of sample texts, a match score related to the number of vector terms present in or derived from that text that match those in the target document, and (d) selecting one or more of the sample texts having the highest match scores.
-
Specification