Natural language processing with dynamic pipelines
First Claim
1. A method for natural language processing, the method comprising:
- selecting, by a computer processor, a dynamic pipeline based, at least in part, on a corpus, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video;
identifying, by a computer processor, a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component;
applying, by the computer processor, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue;
amending, by the computer processor, an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space;
standardizing, by the computer processor, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer;
applying, by the computer processor, the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus;
identifying, by the computer processor, a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of corpus; and
generating, by the computer processor, a summary report based, at least in part, on the set of information of the one or more corpora.
2 Assignments
0 Petitions
Accused Products
Abstract
Natural language processing is provided. A computer processor, selects a pipeline based on an artifact that includes unstructured data, the pipeline identifying a first algorithm of a first set of algorithms of a first human language technology (HLT) component and a second algorithm of a second set of algorithms of a second HLT component; applies the first algorithm based on the artifact to generate a first cluster space associated with the artifact; amends an evidence chain associated with the artifact in response to applying the first algorithm, wherein the evidence chain includes one or more probabilistic findings of truth corresponding to the artifact; standardizes a first ontology of the first cluster space; applies the second algorithm based on the artifact to generate a second cluster space that is associated with the artifact; and identifies a set of information of one or more corpora that is relevant to the artifact.
26 Citations
20 Claims
-
1. A method for natural language processing, the method comprising:
-
selecting, by a computer processor, a dynamic pipeline based, at least in part, on a corpus, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video; identifying, by a computer processor, a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component; applying, by the computer processor, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue; amending, by the computer processor, an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space; standardizing, by the computer processor, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer; applying, by the computer processor, the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus; identifying, by the computer processor, a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of corpus; and generating, by the computer processor, a summary report based, at least in part, on the set of information of the one or more corpora. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for natural language processing, the computer program product comprising a computer readable storage medium having one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising:
-
program instructions to select a dynamic pipeline based, at least in part, on a corpus, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video; program instructions to identify a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component; program instructions to apply, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue; program instructions to standardize, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer; program instructions to amend an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space; program instructions to standardize a first ontology of the first cluster space; program instructions to apply the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus; program instructions to identify a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of the corpus; and program instructions to generate a summary report based, at least in part, on the set of information of the one or more corpora. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer system for natural language processing, the computer system comprising:
-
a memory; and a processor in communication with the memory, wherein the computer system is configured to perform a method, said method comprising; selecting, by a computer processor, a dynamic pipeline based, at least in part, on a, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video; identifying, a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component; applying, by the computer processor, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue; amending, by the computer processor, an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space; standardizing, by the computer processor, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer; applying, by the computer processor, the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus; identifying, by the computer processor, a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of the corpus; and generating, by the computer processor, a summary report based, at least in part, on the set of information of the one or more corpora. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification