Text mining method, text mining device and text mining program
First Claim
1. A text mining device, comprising:
- a computer device that includes a processing device, a memory readable by the processing device, and a storage unit readable by that processing device, the memory having stored program code sufficient to cause the computer device, upon execution by the processing device, to operate as;
a data input unit that receives, as input, audible speech and converts said speech to an input text set intended to be a target of text mining;
a language processing unit that performs language processing for one or more portions of the input text set and outputs and stores a plurality of text elements;
a topic involvement degree calculation unit that calculates and stores a topic relatedness degree that indicates a degree to which each text element relates to an analysis target topic received by the user and stored; and
an element identification unit that, for each text element,calculates and stores a topic involvement degree on the analysis target topic with respect to the text element,calculates and stores an appearance degree by counting a number of times the text element appears in the input text set, said appearance degree indicating a degree to which the text element appears in each portion of the input text set corresponding to the analysis target topic,corrects the calculated appearance degree of the text element by multiplying the calculated appearance degree with the topic involvement degree to produce and store a corrected appearance degree,calculates and stores, using the corrected appearance degree, a feature degree as an index of a degree to which the text element appears within the input text set, andusing the feature degree, identifies, stores and outputs, via an output unit, a distinctive text element within the input text set on the basis of the calculated feature degree,wherein the feature degree is a degree that a word of the input text set, a word n-Gram, a segment, or dependency thereof, or n consecutive dependency thereof, or each element divided into a unit of a partial tree of a syntax tree, or any combination of the foregoing appears within the input text set, where n is a natural number.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are a text mining method, device, and program capable of performing text mining with a specific topic as an object with high precision. An element identification unit calculates a feature degree, which is an index for indicating a degree that within a text set of interest, which is a set of text that is to be analyzed, an element of the text appears. An output unit identifies distinctive elements within the text set of interest on the basis of the calculated feature degree and outputs the identified elements. The element identification unit corrects the feature degree on the basis of a topic relatedness degree, which is a value indicating a degree related to a topic of analysis, which is a topic for which each text portion of the text being analyzed has been partitioned into predetermined units that are to be analyzed.
-
Citations
12 Claims
-
1. A text mining device, comprising:
-
a computer device that includes a processing device, a memory readable by the processing device, and a storage unit readable by that processing device, the memory having stored program code sufficient to cause the computer device, upon execution by the processing device, to operate as; a data input unit that receives, as input, audible speech and converts said speech to an input text set intended to be a target of text mining; a language processing unit that performs language processing for one or more portions of the input text set and outputs and stores a plurality of text elements; a topic involvement degree calculation unit that calculates and stores a topic relatedness degree that indicates a degree to which each text element relates to an analysis target topic received by the user and stored; and an element identification unit that, for each text element, calculates and stores a topic involvement degree on the analysis target topic with respect to the text element, calculates and stores an appearance degree by counting a number of times the text element appears in the input text set, said appearance degree indicating a degree to which the text element appears in each portion of the input text set corresponding to the analysis target topic, corrects the calculated appearance degree of the text element by multiplying the calculated appearance degree with the topic involvement degree to produce and store a corrected appearance degree, calculates and stores, using the corrected appearance degree, a feature degree as an index of a degree to which the text element appears within the input text set, and using the feature degree, identifies, stores and outputs, via an output unit, a distinctive text element within the input text set on the basis of the calculated feature degree, wherein the feature degree is a degree that a word of the input text set, a word n-Gram, a segment, or dependency thereof, or n consecutive dependency thereof, or each element divided into a unit of a partial tree of a syntax tree, or any combination of the foregoing appears within the input text set, where n is a natural number. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A text mining method, implemented by a computing machine with a processing device and an information storage device that stores instructions for operating the processing device, for analyzing text input to the computing machine, the method comprising:
-
receiving, as input, audible speech, and converting and storing said speech as an input text set intended to be a target of text mining; performing language processing upon the input text set to generate and store a plurality of text elements; receiving and storing an analysis target topic; and for each text element, calculating and storing a topic involvement degree that indicates a degree to which the text element relates to the analysis target topic, calculating and storing an appearance degree as a number of times the text element appears in each corresponding part of the analysis target topic, using the topic involvement degree of the text element to correct the stored calculated appearance degree of the text element by multiplying the calculated appearance degree with the topic involvement degree to produce a corrected appearance degree, using the corrected appearance degree to calculate and store a feature degree of the text element on the input text set, the feature degree indicating a degree to which the text element appears within the input text set, and using the feature degree to identify, store, and output via an output unit, a distinctive text element within the input text set.
-
-
11. A non-transitory computer readable medium for analyzing text input to a computing machine having recorded thereon a program, said program operatively configured to cause, upon execution by a processing device of the computing machine, the computing machine to perform a method comprising the steps of:
-
receiving and storing, as input to the computing machine, an analysis target topic; receiving audible speech, and converting and storing said audible speech as a target text set as input to the computing machine, and dividing said target text set into text elements for analysis; and for each text element, calculating and storing a topic involvement degree that indicates a degree to which the element relates to the analysis target topic; calculating and storing an appearance degree as a number of times the text element appears in each corresponding part of the analysis target topic; using the topic involvement degree of the text element to correct the stored calculated appearance degree of the text element by multiplying the calculated appearance degree with the topic involvement degree to produce a corrected appearance degree; using the corrected appearance degree to calculate and store a feature degree as an index that indicates a degree to which the text element appears within the target text set; identifying and storing, by using the feature degree, a distinctive element within the target text set; and outputting, via an output unit, the identified distinctive element.
-
-
12. A text mining device, comprising:
a computer device that includes a processing device, a memory readable by the processing device, and a storage unit readable by that processing device, the memory having stored program code sufficient to cause the computer device, upon execution by the processing device, to operate as; a data input element that receives, as input, audible speech and converts said speech to an input text set intended to be a target of text mining; a language processing element that performs language processing for one or more portions of the input text set and outputs and stores a plurality of text elements; a topic involvement degree calculation element that calculates and stores a topic relatedness degree that indicates a degree to which each text element relates to an analysis target topic received and input from the user; and an element identification element that calculates and stores a topic involvement degree on the analysis target topic with respect to the text element, calculates and stores an appearance degree by counting a number of times the text element appears in the input text set, said appearance degree indicating a degree to which the text element appears in each portion of the input text corresponding to the analysis target topic, corrects, using the topic involvement degree of the text element, the calculated appearance degree of the text element by multiplying the calculated appearance degree with the topic involvement degree to produce and store a corrected appearance degree, calculates, using the corrected appearance degree, a feature degree as an index of a degree to which the text element appears within the input text set, and using the feature degree, identifies, stores, and outputs, via an output unit, a distinctive text element within the input text set on the basis of the calculated feature degree.
Specification