Method and apparatus for creating a customized summary of text by selection of sub-sections thereof ranked by comparison to target data items
DC CAFCFirst Claim
Patent Images
1. Apparatus for summarizing data sets, the apparatus comprising:
- an input for receiving a data set to be summarized;
sectioning means for dividing said received data set into plural sections according to pre-determined criteria;
ranking means operable for each said section to compare data within the said section with one or more target data items and for calculating a ranking value for the said section, said ranking value being dependent on the outcome of said comparisons for the said section; and
compiling means for compiling a customized summary of the data set by selecting one or more of said one or more sections according to their respective ranking values.
5 Assignments
Litigations
0 Petitions
Reexamination
Accused Products
Abstract
A system for summarizing data sets stores target data items and divides the data set into sections. Each section is compared against the target data items and a ranking value is calculated for each section dependent on the outcome of the comparisons. A summary of the data set is then compiled from sections having a ranking value past a pre-determined threshold value.
-
Citations
13 Claims
-
1. Apparatus for summarizing data sets, the apparatus comprising:
-
an input for receiving a data set to be summarized;
sectioning means for dividing said received data set into plural sections according to pre-determined criteria;
ranking means operable for each said section to compare data within the said section with one or more target data items and for calculating a ranking value for the said section, said ranking value being dependent on the outcome of said comparisons for the said section; and
compiling means for compiling a customized summary of the data set by selecting one or more of said one or more sections according to their respective ranking values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 13)
means for identifying one or more key data items in each said section according to a pre-determined stop list;
calculating means operable for each said section to calculate one or more distribution values, each said distribution value representing a different pre-determined measure of the distribution, in said data set, of key data items identified in the said section; and
adjustment means for adjusting said ranking value for each said section according to the respective said one or more distribution values.
-
-
4. Apparatus as in claim 3, wherein:
-
said calculating means are operable to calculate a first distribution value for each said section, said first distribution value representing a measure of the number of sections of said data set, other than the said section, containing key data items of the said section, said first distribution value, as calculated for the said section, being proportional to the sum of the values of said measure of the number of sections determined for each key data item of the said section.
-
-
5. Apparatus as in claim 4 wherein:
-
said calculating means are operable to calculate a second distribution value for each said section, said second distribution value representing a measure of the separation between the first occurrence within said data set of each key data item of the said section and the respective last occurrence, said second distribution value, as calculated for the said section, being proportional to the sum of the values of said measure of separation determined for each key data item of the said section.
-
-
6. Apparatus as in claim 5, wherein:
said selecting means are arranged to compile a summary having a pre-defined length by selecting, in order of decreasing rank, as determined by the corresponding ranking value, one or more of said one or more sections, beginning with the highest ranked section, and adding each selected section to the summary until the summary has attained said pre-defined length.
-
7. Apparatus as in claim 3, wherein:
-
said calculating means are operable to calculate a second distribution value for each said section, said second distribution value representing a measure of the separation between the first occurrence within said data set of each key data item of the said section and the respective last occurrence, said second distribution value, as calculated for the said section, being proportional to the sum of the values of said measure of separation determined for each key data item of the said section.
-
-
8. Apparatus as in claim 1, wherein:
said selecting means are arranged to compile a summary having a pre-defined length by selecting, in order of decreasing rank, as determined by the corresponding ranking value, one or more of said one or more sections, beginning with the highest ranked section, and adding each selected section to the summary until the summary has attained said pre-defined length.
-
13. A method as in claim 8 wherein:
-
at step b), said one or more pre-determined measures of distribution include a measure of the separation between the first occurrence within said data set of each key data item of the said section and the respective last occurrence; and
the corresponding distribution value, as calculated for the said section, is proportional to the sum of the values of said measure of separation determined for each key data item of the said section.
-
-
9. A method for generating a customised summary of a data set, the method comprising:
-
i) receiving, as input, a data set to be summarized;
ii) dividing said data set into sections according to predetermined criteria;
iii) comparing data items in each said section against one or more target data items;
iv) calculating a ranking value for each said section in dependence upon the outcome of the respective said comparisons; and
v) compiling a customized summary of said data set by selecting one or more of said one or more sections according to their respective ranking values. - View Dependent Claims (10, 11, 12)
a) identifying key data items within each said section from step ii) according to a pre-determined stop list;
b) calculating, for each said section, one or more distribution values each representing a pre-determined measure of the distribution of the key data items of the said section in said data set; and
c) adjusting said ranking value form step iv) for each said section in dependence upon the respective said one or more distribution values.
-
-
11. A method as in claim 10, wherein:
-
at step b), said one or more pre-determined measures of distribution include a measure of the number of sections of said data set, other than the said section, containing key data items of the said section; and
the corresponding distribution value, as calculated for the said section, is proportional to the sum of the values of said measure of the number of sections determined for each key data item of the said section.
-
-
12. A method as in claim 11 wherein:
-
at step b), said one or more pre-determined measures of distribution include a measure of the separation between the first occurrence within said data set of each key data item of the said section and the respective last occurrence; and
the corresponding distribution value, as calculated for the said section, is proportional to the sum of the values of said measure of separation determined for each key data item of the said section.
-
Specification