Natural language processing with dynamic pipelines

US 10,380,253 B2
Filed: 03/04/2014
Issued: 08/13/2019
Est. Priority Date: 03/04/2014
Status: Active Grant

First Claim

Patent Images

1. A method for natural language processing, the method comprising:

selecting, by a computer processor, a dynamic pipeline based, at least in part, on a corpus, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video;

identifying, by a computer processor, a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component;

applying, by the computer processor, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue;

amending, by the computer processor, an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space;

standardizing, by the computer processor, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer;

applying, by the computer processor, the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus;

identifying, by the computer processor, a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of corpus; and

generating, by the computer processor, a summary report based, at least in part, on the set of information of the one or more corpora.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Natural language processing is provided. A computer processor, selects a pipeline based on an artifact that includes unstructured data, the pipeline identifying a first algorithm of a first set of algorithms of a first human language technology (HLT) component and a second algorithm of a second set of algorithms of a second HLT component; applies the first algorithm based on the artifact to generate a first cluster space associated with the artifact; amends an evidence chain associated with the artifact in response to applying the first algorithm, wherein the evidence chain includes one or more probabilistic findings of truth corresponding to the artifact; standardizes a first ontology of the first cluster space; applies the second algorithm based on the artifact to generate a second cluster space that is associated with the artifact; and identifies a set of information of one or more corpora that is relevant to the artifact.

26 Citations

View as Search Results

20 Claims

1. A method for natural language processing, the method comprising:
- selecting, by a computer processor, a dynamic pipeline based, at least in part, on a corpus, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video;
  
  identifying, by a computer processor, a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component;
  
  applying, by the computer processor, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue;
  
  amending, by the computer processor, an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space;
  
  standardizing, by the computer processor, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer;
  
  applying, by the computer processor, the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus;
  
  identifying, by the computer processor, a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of corpus; and
  
  generating, by the computer processor, a summary report based, at least in part, on the set of information of the one or more corpora.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein standardizing the first ontology of the first cluster space comprises:
    - determining, by the computer processor, the first ontology of the first cluster space of the first algorithm; and
      
      standardizing, by the computer processor, the first ontology to a resource description framework representation.
  - 3. The method of claim 1, wherein standardizing the first ontology of the first cluster space comprises:
    - determining, by the computer processor, the first ontology of the first cluster space of the first algorithm;
      
      determining, by the computer processor, a second ontology of the second cluster space of the second algorithm; and
      
      standardizing, by the computer processor, the first ontology to the second ontology.
  - 4. The method of claim 1, further comprising:
    - modifying, by the computer processor, the first set of algorithms of the first human language technology component based, at least in part, on a user specification that specifies a third algorithm, by adding the third algorithm to the first set of algorithms.
  - 5. The method of claim 1, wherein identifying the set of information further includes:
    - standardizing, by the computer processor, an ontology of each of the one or more corpora;
      
      comparing, by the computer processor, the cluster space of the corpus to each of the one or more corpora;
      
      determining, by the computer processor, a relevance to the corpus of a first information item of the set of information; and
      
      determining, by the computer processor, an inference representing a relationship corresponding to the corpus, wherein the inference is based, at least in part, on the first information item of the set of information.
  - 6. The method of claim 1, wherein the pipeline includes an order of algorithms that identifies an order in which to apply a plurality of algorithms.
  - 7. The method of claim 1, wherein the summary report is based, at least in part, on the evidence chain associated with the corpus and the cluster space of the corpus.

8. A computer program product for natural language processing, the computer program product comprising a computer readable storage medium having one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising:
- program instructions to select a dynamic pipeline based, at least in part, on a corpus, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video;
  
  program instructions to identify a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component;
  
  program instructions to apply, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue;
  
  program instructions to standardize, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer;
  
  program instructions to amend an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space;
  
  program instructions to standardize a first ontology of the first cluster space;
  
  program instructions to apply the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus;
  
  program instructions to identify a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of the corpus; and
  
  program instructions to generate a summary report based, at least in part, on the set of information of the one or more corpora.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer program product of claim 8, wherein the program instructions to standardize the first ontology of the first cluster space comprise:
    - program instructions to determine the first ontology of the first cluster space of the first algorithm; and
      
      program instructions to standardize the first ontology to a resource description framework representation.
  - 10. The computer program product of claim 8, wherein the program instructions to standardize the first ontology of the first cluster space comprise:
    - program instructions to determine the first ontology of the first cluster space of the first algorithm;
      
      program instructions to determine a second ontology of the second cluster space of the second algorithm; and
      
      program instructions to standardize the first ontology to the second ontology.
  - 11. The computer program product of claim 8, wherein the program instructions stored on the one or more computer readable storage media further comprise:
    - program instructions to modify the first set of algorithms of the first human language technology component based, at least in part, on a user specification that specifies a third algorithm, by adding the third algorithm to the first set of algorithms.
  - 12. The computer program product of claim 8, wherein the program instructions to identify the set of information of the one or more corpora further includes:
    - program instructions to standardize an ontology of each of the one or more corpora;
      
      program instructions to compare the cluster space of the corpus to each of the one or more corpora;
      
      program instructions to determine a relevance to the corpus of a first information item of the set of information; and
      
      program instructions to determine an inference representing a relationship corresponding to the corpus, wherein the inference is based, at least in part, on the first information item of the set of information.
  - 13. The computer program product of claim 8, wherein the pipeline includes an order of algorithms that identifies an order in which to apply a plurality of algorithms.

14. A computer system for natural language processing, the computer system comprising:
- a memory; and
  
  a processor in communication with the memory, wherein the computer system is configured to perform a method, said method comprising;
  
  selecting, by a computer processor, a dynamic pipeline based, at least in part, on a, wherein the dynamic pipeline links a first human language technology component and a second human language technology component, wherein the first human language technology component comprises a first set of algorithms and the second human language technology component comprises a second set of algorithms and wherein the corpus includes at least text, audio, and video;
  
  identifying, a first algorithm of the first set of algorithms associated with the first human language technology component and a second algorithm of the second set of algorithms associated with the second human language technology component;
  
  applying, by the computer processor, the first algorithm based, at least in part, on the corpus to generate a first cluster space that reflects a dynamic determination of relationships within the corpus, wherein the first cluster space includes probabilities that each respective relationship within the corpus is true or untrue;
  
  amending, by the computer processor, an evidence chain that includes one or more findings of true relationships associated with the corpus in response to applying the first algorithm, to reflect a most recent finding of a true relationship of the true relationships that supersedes a previous finding in light of a probabilistic determination from new determined relationships in the first cluster space;
  
  standardizing, by the computer processor, a first ontology of the first cluster space, wherein the first ontology is a data structure on a computer;
  
  applying, by the computer processor, the second algorithm based, at least in part, on the corpus and the first ontology of the first cluster space to generate a second cluster space that is associated with the corpus;
  
  identifying, by the computer processor, a set of information of one or more corpora that has a relevance to the corpus that exceeds a pre-determined threshold based, at least in part, on the first and second cluster spaces of the corpus; and
  
  generating, by the computer processor, a summary report based, at least in part, on the set of information of the one or more corpora.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer system of claim 14, wherein standardizing the first ontology of the first cluster space comprises:
    - determining, by the computer processor, the first ontology of the first cluster space of the first algorithm; and
      
      standardizing, by the computer processor, the first ontology to a resource description framework representation.
  - 16. The computer system of claim 14, wherein standardizing the first ontology of the first cluster space comprises:
    - determining, by the computer processor, the first ontology of the first cluster space of the first algorithm;
      
      determining, by the computer processor, a second ontology of the second cluster space of the second algorithm; and
      
      standardizing, by the computer processor, the first ontology to the second ontology.
  - 17. The computer system of claim 14, wherein the method further comprises:
    - modifying, by the computer processor, the first set of algorithms of the first human language technology component based, at least in part, on a user specification that specifies a third algorithm, by adding the third algorithm to the first set of algorithms.
  - 18. The computer system of claim 14, wherein identifying the set of information of the one or more corpora further includes:
    - standardizing, by the computer processor, an ontology of each of the one or more corpora;
      
      comparing, by the computer processor, the cluster space of the corpus to each of the one or more corpora;
      
      determining, by the computer processor, a relevance to the corpus of a first information item of the set of information; and
      
      determining, by the computer processor, an inference representing a relationship corresponding to the corpus, wherein the inference is based, at least in part, on the first information item of the set of information.
  - 19. The computer system of claim 14, wherein the pipeline includes an order of algorithms that identifies an order in which to apply a plurality of algorithms.
  - 20. The computer system of claim 14, wherein the summary report is based at least in part, on the evidence chain associated with the corpus and the cluster space of the corpus.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kyndryl Incorporated
Original Assignee
International Business Machines Corporation
Inventors
Ahmed, Mohamed N., Baughman, Aaron K.
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US14/196,002
Publication Number

US 20150254232A1
Time in Patent Office

1,988 Days
Field of Search

704 9
US Class Current
CPC Class Codes

G06F 16/367 Ontology

G06F 40/30 Semantic analysis

Natural language processing with dynamic pipelines

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Natural language processing with dynamic pipelines

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links