Data mining system, data mining method and data retrieval system
First Claim
Patent Images
1. A data mining system comprising:
- an extraction part of figure information which performs image processing on a figure in a document and extracts information on concepts and relationships between the concepts in the figure;
an extraction part of text information which extracts information on concepts and relationships between the concepts from a text portion in the document; and
a storing part which stores, in association with each other, images in the figure, identification information of the figure, and the information on the concepts and the relationships between the concepts in the figure, which is extracted by the extraction part of figure information,wherein the extraction part of figure information extracts the information on the concepts and/or the relationships between the concepts in the figure by utilizing the information extracted by the extraction part of text information, whereinthe extraction part of text information has a function of retrieving a paragraph referring to the figure from the document andthe extraction part of figure information extracts a concept that cannot be specified due to insufficient accuracy of the image processing, and a concept having lexical ambiguity, by utilizing the information on concepts or the information the concepts and the relationships between the concepts, which is extracted from the paragraph extracted by the extraction part of text information.
1 Assignment
0 Petitions
Accused Products
Abstract
Accurate concept information and relationships between concepts are extracted from a figure even in a case where sufficient character string recognition accuracy cannot be obtained by image processing, or a case where lexical ambiguity remains because there are a plurality of meanings with the same spelling. After concepts and relationships between concepts appearing in a document referring to a figure and a document related or similar to the document are prepared, candidates for concepts and relationships between concepts are limited to those likely to appear in the figure by checking against the prepared concepts and relationships between concepts. Thus, a false recognition rate is lowered.
14 Citations
19 Claims
-
1. A data mining system comprising:
-
an extraction part of figure information which performs image processing on a figure in a document and extracts information on concepts and relationships between the concepts in the figure; an extraction part of text information which extracts information on concepts and relationships between the concepts from a text portion in the document; and a storing part which stores, in association with each other, images in the figure, identification information of the figure, and the information on the concepts and the relationships between the concepts in the figure, which is extracted by the extraction part of figure information, wherein the extraction part of figure information extracts the information on the concepts and/or the relationships between the concepts in the figure by utilizing the information extracted by the extraction part of text information, wherein the extraction part of text information has a function of retrieving a paragraph referring to the figure from the document and the extraction part of figure information extracts a concept that cannot be specified due to insufficient accuracy of the image processing, and a concept having lexical ambiguity, by utilizing the information on concepts or the information the concepts and the relationships between the concepts, which is extracted from the paragraph extracted by the extraction part of text information. - View Dependent Claims (2, 3)
-
-
4. A data mining system comprising:
-
an extraction part of figure information which performs image processing on a figure in a document and extracts information on concepts and relationships between the concepts in the figure; an extraction part of text information which extracts information on concepts and relationships between the concepts from a text portion in the document; and a storing part which stores, in association with each other, images in the figure, identification information of the figure, and the information on the concepts and the relationships between the concepts in the figure, which is extracted by the extraction part of figure information, wherein the extraction part of figure information extracts the information on the concepts and/or the relationships between the concepts in the figure by utilizing the information extracted by the extraction part of text information, wherein the extraction part of text information has a function of retrieving documents related and/or similar to the document including the figure from a document database and the extraction part of figure information extracts a concept that cannot be specified due to insufficient accuracy of the image processing, and a concept having lexical ambiguity, by utilizing the information on concepts or the information on the concepts and the relationships between the concepts, which is extracted from the related documents and/or similar documents retrieved by the extraction part of text information. - View Dependent Claims (5, 6)
-
-
7. A data mining system comprising:
-
an extraction part of figure information which performs image processing on a figure in a document and extracts information on concepts and relationships between the concepts in the figure; an extraction part of text information which extracts information on concepts and relationships between the concepts from a text portion in the document; and a storing part which stores, in association with each other, images in the figure, identification information of the figure, and the information on the concepts and the relationships between the concepts in the figure, which is extracted by the extraction part of figure information, wherein the extraction part of figure information extracts the information on the concepts and/or the relationships between the concepts in the figure by utilizing the information extracted by the extraction part of text information, wherein the extraction part of text information extracts, from a title, a caption or a main text of the figure, a concept representing a content of the figure by use of a dictionary, a unique expression recognition method, an extraction pattern or syntactic analysis, and then stores the extracted concepts in the storing part, in association with identification information of the figure. - View Dependent Claims (8, 9)
-
-
10. A data mining method comprising:
-
a step of inputting a document including a figure to a processing part; a text processing step of extracting information on concepts and relationships between the concepts from a text portion in the document in the processing part; a figure processing step of extracting characters in the figure in the document by performing image processing on the figure, extracting a concept composed of a plurality of consecutive characters by considering the distances between adjacent characters, and extracting a relationship between concepts based on a shape of a symbol disposed between the concepts; and an output step of outputting images in the figure, identification information of the figure and the information on the concepts and the relationships between the concepts in the figure, which is extracted in the figure processing step, in association with each other, wherein, in the figure processing step, the information on the concepts and/or the relationships between the concepts in the figure is extracted by utilizing the information extracted in the text processing step, wherein the text processing step includes a step of retrieving a paragraph referring to the figure from the document, and extracting information on concepts and relationships between the concepts from the paragraph, and in the figure processing step, a concept that cannot be specified due to insufficient accuracy of the image processing and a concept having lexical ambiguity are extracted by utilizing the information on concepts or the information on the concepts and the relationships between the concepts, which is extracted from the paragraph. - View Dependent Claims (11)
-
-
12. A data mining method comprising:
-
a step of inputting a document including a figure to a processing part; a text processing step of extracting information on concepts and relationships between the concepts from a text portion in the document in the processing part; a figure processing step of extracting characters in the figure in the document by performing image processing on the figure, extracting a concept composed of a plurality of consecutive characters by considering the distances between adjacent characters, and extracting a relationship between concepts based on a shape of a symbol disposed between the concepts; and an output step of outputting images in the figure, identification information of the figure and the information on the concepts and the relationships between the concepts in the figure, which is extracted in the figure processing step, in association with each other, wherein, in the figure processing step, the information on the concepts and/or the relationships between the concepts in the figure is extracted by utilizing the information extracted in the text processing step, wherein in the text processing step, a concept representing a content of the figure is extracted from any of a title, a caption of the figure and a main text, and in the output step, a dictionary, a unique expression recognition method, an extraction pattern or syntactic analysis is used to output the concept representing the content of the figure in association with the identification information of the figure. - View Dependent Claims (13)
-
-
14. A retrieval system comprising:
-
a database which stores an information comprising; images in a figure included in a document, identification information of the figure and information on concepts and relationships between the concepts in the figure in association with each other information on concepts and relationships between the concepts from a text portion in the document, text information extracts, from a title, a caption or a main text of the figure, a concept representing a content of the figure by use of a dictionary, a unique expression recognition method, an extraction pattern or syntactic analysis, and then stores the extracted concepts in the storing part, in association with identification information of the figure; an input part which inputs retrieval concepts; a retrieval part which calculates relevance between the retrieval concepts inputted by the input part and concepts in the figure, which are associated with the figure and stored in the database, and outputs the images in the figure by ranking the images in terms of the relevance; and a display part which displays the images in the figure in descending order of relevance, the images outputted from the retrieval part. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification