APPARATUS AND METHOD FOR EXECUTING AN AUTOMATED ANALYSIS OF DATA, IN PARTICULAR SOCIAL MEDIA DATA, FOR PRODUCT FAILURE DETECTION
First Claim
1. A method for an automated data analysis, comprising:
- providing one or more databases indicative of a plurality of keywords;
providing analysis input data obtained from one or more data sources, and pre-processing the analysis input data to generate pre-processed analysis input data available for data analysis processing, the analysis input data including a plurality of text documents respectively being associated with at least one of a plurality of data samples;
performing data analysis processing of the pre-processed analysis input data, including;
word count processing to determine word count numbers indicative of occurrence frequencies for keywords of the one or more databases in the text documents of the pre-processed analysis input data for each of the plurality of data samples,correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient being associated with the respective keyword pair, the respective correlation coefficient being indicative of a quantitative measure of correlation between the determined word count numbers of the keywords of the respective keyword pair for the plurality of data samples,correlation-link identification processing to identify correlation-linked keyword pairs, wherein keywords of a keyword pair are determined to be correlation-linked to each other based on a correlation criteria, the correlation criteria including a criteria whether the determined correlation coefficient associated with the respective keyword pair exceeds a correlation threshold, andcorrelation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs, each correlation group including keywords of at least one correlation-linked keyword pair and, for each keyword included in the respective correlation group, the respective correlation group further includes the other keywords identified to be correlation-linked to the respective keyword; and
outputting, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and a method for executing an automated analysis of analysis input data (e.g. social media data and/or On-Board-Diagnosis data) for product failure detection is proposed. Data analysis processing is performed, including: word count processing to determine word count numbers indicative of occurrence frequencies for keywords of a database in user-created text documents of the social media data; correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient; correlation-link identification processing to identify correlation-linked keyword pairs for which the determined correlation coefficient exceeds a correlation threshold; and correlation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs; and, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords is output.
-
Citations
15 Claims
-
1. A method for an automated data analysis, comprising:
-
providing one or more databases indicative of a plurality of keywords; providing analysis input data obtained from one or more data sources, and pre-processing the analysis input data to generate pre-processed analysis input data available for data analysis processing, the analysis input data including a plurality of text documents respectively being associated with at least one of a plurality of data samples; performing data analysis processing of the pre-processed analysis input data, including; word count processing to determine word count numbers indicative of occurrence frequencies for keywords of the one or more databases in the text documents of the pre-processed analysis input data for each of the plurality of data samples, correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient being associated with the respective keyword pair, the respective correlation coefficient being indicative of a quantitative measure of correlation between the determined word count numbers of the keywords of the respective keyword pair for the plurality of data samples, correlation-link identification processing to identify correlation-linked keyword pairs, wherein keywords of a keyword pair are determined to be correlation-linked to each other based on a correlation criteria, the correlation criteria including a criteria whether the determined correlation coefficient associated with the respective keyword pair exceeds a correlation threshold, and correlation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs, each correlation group including keywords of at least one correlation-linked keyword pair and, for each keyword included in the respective correlation group, the respective correlation group further includes the other keywords identified to be correlation-linked to the respective keyword; and outputting, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for executing an automated data analysis, comprising:
-
a storage configured to store one or more databases indicative of a plurality of keywords; a data input interface configured to provide analysis input data obtained from one or more data sources, the analysis input data including a plurality of text documents respectively being associated with at least one of a plurality of data samples; a processing system configured to execute pre-processing the analysis input data to generate pre-processed analysis input data available for data analysis processing, and the processing system being configured to execute the data analysis processing of the pre-processed analysis input data, including; word count processing to determine word count numbers indicative of occurrence frequencies for keywords of the one or more databases in the text documents of the pre-processed analysis input data for each of the plurality of data samples, correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient being associated with the respective keyword pair, the respective correlation coefficient being indicative of a quantitative measure of correlation between the determined word count numbers of the keywords of the respective keyword pair for the plurality of data samples, correlation-link identification processing to identify correlation-linked keyword pairs, wherein keywords of a keyword pair are determined to be correlation-linked to each other based on a correlation criteria, the correlation criteria including a criteria whether the determined correlation coefficient associated with the respective keyword pair exceeds a correlation threshold, and correlation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs, each correlation group including keywords of at least one correlation-linked keyword pair and, for each keyword included in the respective correlation group, the respective correlation group further includes the other keywords identified to be correlation-linked to the respective keyword; and a data output interface configured to output, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords.
-
Specification