Weighting method for use in information extraction and abstracting, based on the frequency of occurrence of keywords and similarity calculations
First Claim
1. An information abstracting method comprising the steps of:
- accepting an input of character string data divided into prescribed units, with each individual character represented by a character code;
extracting a keyword for each of said prescribed units from said input character string data;
weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical to said extracted. keyword;
selecting at least one keyword from said extracted keywords on the basis of the weighted result; and
outputting said selected keyword as an information abstract relating to said character string data.
0 Assignments
0 Petitions
Accused Products
Abstract
The primary object of the invention is to extract and display keywords as an information abstract that are significant and effective in describing a topic common to a plurality of units, when a large number of character string data sets divided into prescribed units are given. The invention comprises an input section for accepting an input of character string data divided into prescribed units, with each individual character represented by a character code, and an output section for displaying the result of information abstracting. Keywords contained in each of the prescribed units are extracted by a keyword extracting section from the character string data input from the input section, a score is calculated for each keyword by a score calculating section so that a higher score is given to a keyword extracted from a lager number of units, and on the basis of the thus calculated scores, keywords are selected by an abstracting section and are output as an information abstract to the output section.
-
Citations
50 Claims
-
1. An information abstracting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units, with each individual character represented by a character code;
extracting a keyword for each of said prescribed units from said input character string data;
weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical to said extracted. keyword;
selecting at least one keyword from said extracted keywords on the basis of the weighted result; and
outputting said selected keyword as an information abstract relating to said character string data. - View Dependent Claims (15, 29, 38, 41)
-
-
2. An information abstracting apparatus comprising:
-
input means for accepting an input of character string data divided into prescribed units, with each individual character represented by a character code;
keyword extracting means for extracting a keyword for each of said prescribed units from the character string data input from said input means;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical to said extracted keyword;
keyword selecting means for selecting at least one keyword from said extracted keywords on the basis of the weighted result; and
output means for outputting said selected keyword as an information abstract relating to said character string data. - View Dependent Claims (16)
-
-
3. A weighting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units, with each individual character represented by a character code;
extracting a keyword for each of said prescribed units from said input character string data; and
weighting said extracted keyword by taking into account the state of occurrence, in the other prescribed units, of keywords that are identical to said extracted keyword. - View Dependent Claims (17)
-
-
4. A teletext broadcast receiving apparatus comprising:
-
teletext broadcast receiving means for receiving a teletext broadcast;
channel storing means for storing a plurality of channels of prescribed programs;
keyword extracting means for extracting a keyword from each of said prescribed programs received by said teletext broadcast receiving means on said channels stored in said channel storing means;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed programs, of keywords that are identical to said extracted keyword;
keyword selecting means for selecting keywords from said extracted keywords on the basis of the weighted result; and
display means for displaying all or part of said selected keywords as an information abstract relating to said teletext broadcast. - View Dependent Claims (18)
-
-
5. An information abstracting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
extracting a keyword for each paragraph in each of said prescribed units from said input character string data;
generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical to said extracted keyword, and also weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting keywords and keyword associations from said extracted keywords and said generated keyword associations on the basis of the weighted results; and
outputting said selected keywords and keyword associations as an information abstract relating to said character string data. - View Dependent Claims (19, 44, 46)
-
-
6. An information abstracting apparatus comprising:
-
input means for accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
keyword extracting means for extracting a keyword for each paragraph in each of said prescribed units from the character string data input from said input means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical to said extracted keyword, and for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting means for selecting keywords and keyword associations from said extracted keywords and said generated keyword associations on the basis of the weighted results; and
output means for outputting said selected keywords and keyword associations as an information abstract relating to said character string data. - View Dependent Claims (20)
-
-
7. A weighting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
extracting a keyword for each paragraph in each of said prescribed units from said input character string data;
generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph; and
weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical to said extracted keyword, and also weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association. - View Dependent Claims (21)
-
-
8. A teletext broadcast receiving apparatus comprising:
-
teletext broadcast receiving means for receiving a teletext broadcast;
channel storing means for storing a plurality of channels of prescribed programs;
keyword extracting means for extracting a keyword from each of said prescribed programs received by said teletext broadcast receiving means on said channels stored in said channel storing means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph in the same program;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed programs, of keywords that are identical to said extracted keyword, and for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting means for selecting keywords and keyword associations from said extracted keywords and said generated keyword associations on the basis of the weighted results; and
display means for displaying all or part of said selected keywords and keyword associations as an information abstract relating to said teletext broadcast. - View Dependent Claims (22)
-
-
9. An information abstracting apparatus comprising:
-
input means for accepting an input of character string data divided into prescribed units, with each individual character represented by a character code;
keyword extracting means for extracting a keyword for each of said prescribed units from said input character string data;
similarity calculating means for calculating similarity between keywords thus extracted;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical or similar to said extracted keyword;
keyword selecting means for selecting keywords from said extracted keywords on the basis of the weighted result; and
output means for outputting said selected keywords as an information abstract relating to said character string data. - View Dependent Claims (35)
-
-
10. A weighting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units, with each individual character represented by a character code;
extracting a keyword for each of said prescribed units from said input character string data;
calculating similarity between keywords thus extracted; and
weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical or similar to said extracted keyword. - View Dependent Claims (36)
-
-
11. A teletext broadcast receiving apparatus comprising:
-
teletext broadcast receiving means for receiving a teletext broadcast;
channel storing means for storing a plurality of channels of prescribed programs;
keyword extracting means for extracting a keyword from each of said prescribed programs received by said teletext broadcast receiving means on said channels stored in said channel storing means;
similarity calculating mean'"'"'s for calculating similarity between keywords thus extracted;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed programs, of keywords that are identical or similar to said extracted keyword;
keyword selecting means for selecting keywords from said extracted keywords on the basis of the weighted result; and
display means for displaying all or part of said selected keywords as an information abstract relating to said teletext broadcast. - View Dependent Claims (31, 32, 33, 34, 37, 40, 43, 48, 49, 50)
-
-
12. An information abstracting apparatus comprising:
-
input means for accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
keyword extracting means for extracting a keyword for each paragraph in each of said prescribed units from said character string data input from said input means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
similarity calculating means for calculating similarity between keywords thus extracted, on the basis of a plurality of factors including said keyword association;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical or similar to said extracted keyword, and for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting means for selecting keywords and keyword associations from said extracted keywords and said generated keyword associations on the basis of the weighted results; and
outputting said selected keywords and keyword associations as an information abstract relating to said character string data. - View Dependent Claims (30, 39, 42, 45, 47)
-
-
13. A weighting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
extracting a keyword for each paragraph in each of said prescribed units from said input character string;
generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
calculating similarity between keywords thus extracted, on the basis of a plurality of factors including said keyword association; and
weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed units, of keywords that are identical or similar to said extracted keyword, and also weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association.
-
-
14. A teletext broadcast receiving apparatus comprising:
-
teletext broadcast receiving means for receiving a teletext broadcast;
channel storing means for storing a plurality of channels of prescribed programs;
keyword extracting means for extracting a keyword from each of said prescribed programs received by said teletext broadcast receiving means on said channels stored in said channel storing means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph in the same program;
similarity calculating means for calculating similarity between keywords thus extracted, on the basis of a plurality of factors including said keyword association;
weighting means for weighting said extracted keyword by taking into account a state of occurrence, in the other prescribed programs, of keywords that are identical or similar to said extracted keyword, and for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting means for selecting keywords and keyword associations from said extracted keywords and said generated keyword associationson the basis of the weighted results; and
display means for displaying all or part of said selected keywords and keyword associations as an information abstract relating to said teletext broadcast.
-
-
23. An information abstracting apparatus comprising:
-
input means for accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
keyword extracting means for extracting a keyword for each paragraph in each of said prescribed units from the character string data input from said input means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
weighting means for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting means for selecting keyword associations from said generated keyword associations on the basis of the weighted result; and
output means for outputting an information abstract, generated based on the selection results, relating to said character string data.
-
-
24. A weighting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
extracting a keyword for each paragraph in each of said prescribed units from said input character string data;
generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph; and
weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association.
-
-
25. A teletext broadcast receiving apparatus comprising:
-
teletext broadcast receiving means for receiving a teletext broadcast;
channel storing means for storing a plurality of channels of prescribed programs;
keyword extracting means for extracting a keyword from each of said prescribed programs received by said teletext broadcast receiving means on said channels stored in said channel storing means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph in the same program;
weighting means for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical to said generated keyword association;
selecting means for selecting keyword associations from said generated keyword associations on the basis of the weighted result; and
display means for displaying an information abstract, generated based on the selection results, relating to said teletext broadcast.
-
-
26. An information abstracting apparatus comprising:
-
input means for accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
keyword extracting means for extracting a keyword for each paragraph in each of said prescribed units from said character string data input from said input means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
similarity calculating means for calculating similarity between keywords thus extracted;
keyword association/similarity calculating means for, by using said similarity calculated between keywords constituting said generated keyword association and keywords constituting another keyword association, calculating similarity between the keyword associations;
weighting means for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical or similar to said generated keyword association;
selecting means for selecting keyword associations from said generated keyword associations on the basis of the weighted result; and
outputting said selected keyword associations as an information abstract relating to said character string data.
-
-
27. A weighting method comprising the steps of:
-
accepting an input of character string data divided into prescribed units each subdivided into prescribed paragraphs, with each individual character represented by a character code;
extracting a keyword for each paragraph in each of said prescribed units from said input character string;
generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
calculating similarity between keywords thus extracted;
by using said similarity calculated between keywords constituting said generated keyword association and keywords constituting another keyword association, calculating similarity between the keyword associations; and
weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical or similar to said generated keyword association.
-
-
28. A teletext broadcast receiving apparatus comprising:
-
teletext broadcast receiving means for receiving a teletext broadcast;
channel storing means for storing a plurality of channels of prescribed programs;
keyword extracting means for extracting a keyword from each of said prescribed programs received by said teletext broadcast receiving means on said channels stored in said channel storing means;
keyword associating means for generating a keyword association by associating one keyword with another among keywords obtained from the same paragraph;
similarity calculating means for calculating similarity between keywords thus extracted;
keyword association/similarity calculating means for, by using said similarity calculated between keywords constituting said generated keyword association and keywords constituting another keyword association, calculating similarity between the keyword associations;
weighting means for weighting said generated keyword association by taking into account a state of occurrence, in the other prescribed paragraphs, of keyword associations that are identical or similar to said generated keyword association;
selecting means for selecting keyword associations from said generated keyword associations on the basis of the weighted result; and
display means for displaying all or part of said selected keyword associations as an information abstract relating to said teletext broadcast.
-
Specification