Extracting veiled meaning in natural language content

US 10,176,166 B2
Filed: 09/01/2017
Issued: 01/08/2019
Est. Priority Date: 07/09/2015
Status: Expired due to Fees

First Claim

Patent Images

1. A method, in a data processing system comprising a processor and a memory, for identifying hidden meaning in a portion of natural language content, wherein the memory comprises instructions executed by the processor to cause the processor to be specifically configured to implement a hidden meaning translation engine, the method comprising:

receiving, by the hidden meaning translation engine of the data processing system, a primary portion of natural language content from one or more corpora of electronic documentation;

identifying, by the hidden meaning translation engine of the data processing system, a secondary portion of natural language content, in the one or more corpora of electronic documentation, that references the primary portion of natural language content;

analyzing, by the hidden meaning translation engine of the data processing system, the secondary portion of natural language content to identify indications of meaning directed to elements of the primary portion of natural language content, wherein analyzing the secondary portion of natural language content further comprises correlating a first temporal characteristic of the secondary portion of natural language content with a second temporal characteristic of the primary portion of natural language content;

generating and training, by the hidden meaning translation engine of the data processing system, a probabilistic model based on results of the analysis of the secondary portion of natural language content modeling a probability of hidden meaning in the primary portion of natural language content at least by weighting the secondary portion of natural language content based on whether the first temporal characteristic is at a prior time to the second temporal characteristic or at a later time than the second temporal characteristic;

generating, by the hidden meaning translation engine of the data processing system, a hidden meaning statement data structure for the primary portion of natural language content based on the indications of meaning identified by the analysis of the secondary portion; and

performing, by a cognitive system, a cognitive operation at least by performing natural language processing on a combination of the primary portion of natural language content and the hidden meaning statement data structure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Mechanisms for identifying hidden meaning in a portion of natural language content are provided. A primary portion of natural language content is received and a secondary portion of natural language content is identified that references the natural language content. The secondary portion of natural language content is analyzed to identify indications of meaning directed to elements of the primary portion of natural language content. A probabilistic model is generated based on the secondary portion of natural language content modeling a probability of hidden meaning in the primary portion of natural language content. A hidden meaning statement data structure is generated for the primary portion of natural language content based on the probabilistic model.

46 Citations

View as Search Results

17 Claims

1. A method, in a data processing system comprising a processor and a memory, for identifying hidden meaning in a portion of natural language content, wherein the memory comprises instructions executed by the processor to cause the processor to be specifically configured to implement a hidden meaning translation engine, the method comprising:
- receiving, by the hidden meaning translation engine of the data processing system, a primary portion of natural language content from one or more corpora of electronic documentation;
  
  identifying, by the hidden meaning translation engine of the data processing system, a secondary portion of natural language content, in the one or more corpora of electronic documentation, that references the primary portion of natural language content;
  
  analyzing, by the hidden meaning translation engine of the data processing system, the secondary portion of natural language content to identify indications of meaning directed to elements of the primary portion of natural language content, wherein analyzing the secondary portion of natural language content further comprises correlating a first temporal characteristic of the secondary portion of natural language content with a second temporal characteristic of the primary portion of natural language content;
  
  generating and training, by the hidden meaning translation engine of the data processing system, a probabilistic model based on results of the analysis of the secondary portion of natural language content modeling a probability of hidden meaning in the primary portion of natural language content at least by weighting the secondary portion of natural language content based on whether the first temporal characteristic is at a prior time to the second temporal characteristic or at a later time than the second temporal characteristic;
  
  generating, by the hidden meaning translation engine of the data processing system, a hidden meaning statement data structure for the primary portion of natural language content based on the indications of meaning identified by the analysis of the secondary portion; and
  
  performing, by a cognitive system, a cognitive operation at least by performing natural language processing on a combination of the primary portion of natural language content and the hidden meaning statement data structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein generating the hidden meaning statement data structure for the primary portion of natural language content comprises generating a metadata data structure that is linked to the primary natural language content in a corpus of information, wherein the hidden meaning statement data structure explicitly specifies the hidden meaning as an explicit natural language statement in relation to the primary portion of natural language content.
  - 3. The method of claim 1, wherein the cognitive operation is a question answering operation performed by a question and answer pipeline implemented in the data processing system.
  - 4. The method of claim 1, wherein generating the probabilistic model comprises generating one or more pairing data structures that pairs a secondary portion of natural language content with a corresponding primary portion of natural language content, wherein the secondary portion specifies a potential hidden meaning in the corresponding primary portion.
  - 5. The method of claim 4, wherein generating the probabilistic model further comprises:
    - determining which pairing data structures to maintain in the probabilistic model based on an evaluation of an amount of corroborating evidence for each pairing data structure in the one or more pairing data structures.
  - 6. The method of claim 5, wherein generating the probabilistic model further comprises:
    - removing a pairing data structure from the probabilistic model in response to the pairing data structure failing to have a predetermined amount of corroborating evidence to support the pairing data structure.
  - 7. The method of claim 4, wherein generating the probabilistic model comprises associating with a pairing data structure, in the one or more pairing data structures, contextualizing features that specify domain conditions contemporaneous to the primary portion of natural language content.
  - 8. The method of claim 7, wherein the contextualizing features are key performance indicators (KPIs) and corresponding KPI values for a domain of the primary portion of natural language content.
  - 9. The method of claim 7, wherein the contextualizing features comprise annotations indicative of sentiment in at least one of the primary portion of natural language content or the secondary portion of natural language content.
  - 10. The method of claim 4, wherein generating the hidden meaning statement data structure comprises determining a consensus result based on analysis of the one or more pairing data structures and generating the hidden meaning statement data structure based on the consensus result.
  - 11. The method of claim 4, wherein generating the probabilistic model further comprises:
    - calculating a weight value for each pairing data structure in the one or more pairing data structures, wherein the calculation of the weight value is at least partially based on whether the secondary portion of natural language content is determined to be predictive or reactionary with regard to the primary portion of natural language content;
      
      associating calculated weight values with corresponding pairing data structures in the one or more pairing data structures; and
      
      for each pairing data structure in the one or more pairing data structures, determining whether to maintain the pairing data structure in, or discarding the pairing data structure from, the probabilistic model based on a weight value associated with the pairing data structure.
  - 12. The method of claim 11, wherein the calculating of the weight value is further based on whether or not the secondary portion of natural language content identifies information that is missing in the primary portion of natural language content.
  - 13. The method of claim 1, wherein generating the hidden meaning statement data structure comprises performing a machine learning operation utilizing a parallel text analysis algorithm to identify a most probable hidden meaning of the primary portion of natural language based on the probabilistic model.
  - 14. The method of claim 1, wherein identifying the secondary portion of natural language content that references the natural language content comprises:
    - extracting one or more features from the primary portion of natural language content; and
      
      performing a search of a corpus of information for one or more secondary portions of natural language content that reference at least one of the one or more features from the primary portion of natural language content or reference the primary portion of natural language content as a whole.
  - 15. The method of claim 14, wherein identifying the secondary portion of natural language content that references the natural language content further comprises analyzing the one or more secondary portions of natural language content to identify at least one of direct quotes from the primary portion of natural language content present in the one or more secondary portions of natural language content or explicit links to the primary portion of natural language content.

16. A computer program product comprising a non-transitory computer readable medium having a computer readable program stored therein, wherein the computer readable program, when executed in a data processing system, causes the data processing system to:
- receive a primary portion of natural language content from one or more corpora of electronic documentation;
  
  identify a secondary portion of natural language content, in the one or more corpora of electronic documentation, that references the primary portion of natural language content;
  
  analyze the secondary portion of natural language content to identify indications of meaning directed to elements of the primary portion of natural language content, wherein analyzing the secondary portion of natural language content further comprises correlating a first temporal characteristic of the secondary portion of natural language content with a second temporal characteristic of the primary portion of natural language content;
  
  generate and train a probabilistic model based on results of the analysis of the secondary portion of natural language content modeling a probability of hidden meaning in the primary portion of natural language content at least by weighting the secondary portion of natural language content based on whether the first temporal characteristic is at a prior time to the second temporal characteristic or at a later time than the second temporal characteristic;
  
  generate a hidden meaning statement data structure for the primary portion of natural language content based on the indications of meaning identified by the analysis of the secondary portion; and
  
  perform a cognitive operation at least by performing natural language processing on a combination of the primary portion of natural language content and the hidden meaning statement data structure.

17. An apparatus comprising:
- a processor; and
  
  a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to;
  
  receive a primary portion of natural language content from one or more corpora of electronic documentation;
  
  identify a secondary portion of natural language content, in the one or more corpora of electronic documentation, that references the primary portion of natural language content;
  
  analyze the secondary portion of natural language content to identify indications of meaning directed to elements of the primary portion of natural language content, wherein analyzing the secondary portion of natural language content further comprises correlating a first temporal characteristic of the secondary portion of natural language content with a second temporal characteristic of the primary portion of natural language content;
  
  generate and train a probabilistic model based on results of the analysis of the secondary portion of natural language content modeling a probability of hidden meaning in the primary portion of natural language content at least by weighting the secondary portion of natural language content based on whether the first temporal characteristic is at a prior time to the second temporal characteristic or at a later time than the second temporal characteristic;
  
  generate a hidden meaning statement data structure for the primary portion of natural language content based on the indications of meaning identified by the analysis of the secondary portion; and
  
  perform a cognitive operation at least by performing natural language processing on a combination of the primary portion of natural language content and the hidden meaning statement data structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Byron, Donna K., Johnson, Benjamin L., Krishnamurthy, Lakshminarayanan, Kummamuru, Krishna, Winkler, Timothy P.
Primary Examiner(s)
Serrou, Abdelali

Application Number

US15/694,037
Publication Number

US 20170364507A1
Time in Patent Office

494 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/3329   Natural language query form...

G06F 40/30   Semantic analysis

G06N 20/00   Machine learning

G06N 5/022   Knowledge engineering; Know...

G06N 7/01   Probabilistic graphical mod...

Extracting veiled meaning in natural language content

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

46 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Extracting veiled meaning in natural language content

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links