Managing a set of data
First Claim
1. A computer implemented method for generating a qualified set of data, the method comprising:
- receiving, by at least one processor, an input set of data;
determining, by the at least one processor analyzing the input set of data, a domain that characterizes a subject matter of the input set of data;
computing, by extracting a common feature from the input set of data by the at least one processor, a probability that a specific user created a first portion of the input set of data;
identifying, by the at least one processor, the first portion of the input set of data based, at least in part, on the first portion of the input set of data having the common feature;
generating, by the at least one processor, based, at least in part, on the domain, on the probability and on the first portion of the input set of data having the common feature, a user identifier associated with the first portion of the input set of data;
storing, by the at least one processor, the user identifier in a data repository;
computing, by the at least one processor, based at least in part on the domain and the user identifier, a credibility measure;
computing, by the at least one processor, based at least in part on the credibility measure, a quality factor associated with the first portion of the input set of data;
generating, by the at least one processor, based at least in part on the quality factor exceeding a quality factor threshold, the qualified set of data comprising data, among the first portion of the input data, that exceeds the quality threshold; and
outputting, by the at least one processor, the qualified set of data.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the disclosure include managing a set of data associated with a corpus. By analyzing the corpus, a domain is established to characterize the subject matter of the set of data. A user identifier is generated for a portion of the set of data. Based upon a credibility computation, a quality factor for a portion of the set of data is determined. The credibility computation includes using both the domain and the user identifier to determine the quality factor for the portion of the set of data. The quality factor for the portion of the set of data is compared with a threshold. In response to a quality factor for a portion of the set of data exceeding the threshold, the portion of the set of data is selected.
-
Citations
19 Claims
-
1. A computer implemented method for generating a qualified set of data, the method comprising:
-
receiving, by at least one processor, an input set of data; determining, by the at least one processor analyzing the input set of data, a domain that characterizes a subject matter of the input set of data; computing, by extracting a common feature from the input set of data by the at least one processor, a probability that a specific user created a first portion of the input set of data; identifying, by the at least one processor, the first portion of the input set of data based, at least in part, on the first portion of the input set of data having the common feature; generating, by the at least one processor, based, at least in part, on the domain, on the probability and on the first portion of the input set of data having the common feature, a user identifier associated with the first portion of the input set of data; storing, by the at least one processor, the user identifier in a data repository; computing, by the at least one processor, based at least in part on the domain and the user identifier, a credibility measure; computing, by the at least one processor, based at least in part on the credibility measure, a quality factor associated with the first portion of the input set of data; generating, by the at least one processor, based at least in part on the quality factor exceeding a quality factor threshold, the qualified set of data comprising data, among the first portion of the input data, that exceeds the quality threshold; and outputting, by the at least one processor, the qualified set of data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer program for generating a qualified set of data, the computer program product comprising a computer readable storage medium having instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
receive an input set of data; determine, by analyzing the input set of data, a domain that characterizes a subject matter of the input set of data; compute, by extracting a common feature from the input set of data, a probability that a specific user created a first portion of the input set of data; identify the first portion of the input set of data based, at least in part, on the first portion of the input set of data having the common feature; generate, based, at least in part, on the domain, on the probability and on the first portion of the input set of data having the common feature, a user identifier associated with the first portion of the input set of data; store the user identifier in a data repository; compute a credibility measure, based at least in part on the domain and the user identifier; compute, based at least in part on the credibility measure, a quality factor associated with the first portion of the input set of data; and generate, based at least in part on the quality factor exceeding a quality factor threshold, the qualified set of data comprising data, among the first portion of the input data, that exceeds the quality threshold; and output the qualified set of data.
-
-
19. A computer system for generating a qualified set of data, the computer system comprising a processor configured to:
-
receive an input set of data; determine, by analyzing the input set of data, a domain that characterizes a subject matter of the input set of data; compute, by extracting a common feature from the input set of data, a probability that a specific user created a first portion of the input set of data; identify the first portion of the input set of data based, at least in part, on the first portion of the input set of data having the common feature; generate, based, at least in part, on the domain, the probability and the first portion of the input set of data having the common feature, a user identifier associated with the first portion of the input set of data; store the user identifier in a data repository; compute, based at least in part on the domain and the user identifier, a credibility measure; compute, based at least in part on the credibility measure, a quality factor associated with the first portion of the input set of data; generate, based at least in part on the quality factor exceeding a quality factor threshold, the qualified set of data comprising data, among the first portion of the input data, that exceeds the quality threshold; and output the qualified set of data.
-
Specification