System and method for supplementing a question answering system with mixed-language source documents
First Claim
1. A computer implemented method in a data processing system comprising a processor and a memory comprising instructions, which are executed by the processor to cause the processor to determine language independent candidate answers, the method comprising:
- ingesting a corpus, wherein the corpus may comprise information in one or more languages;
using a cognitive system to convert the corpus into one or more acyclic graphs, wherein nodes in the one or more acyclic graphs represent facts and connectors in the one or more acyclic graphs represent connections between two or more facts;
generating an element data structure using the one or more acyclic graphs;
identifying clusters of elements in the element data structure based on characteristics of the elements;
storing the clusters in a cluster data structure;
in response to receiving a question,searching the cluster data structure to generate a listing of element clusters having elements involving the question;
generating a filtered listing of candidate element clusters, wherein the filtered listing comprises elements and their clusters from the listing of element clusters that are present in the question;
training the cognitive system with association rules, wherein the association rules specify compatibility of elements with knowledge domains;
applying the trained cognitive system to analyze the filtered listing of candidate element clusters to identify the candidate element clusters that are compatible with the knowledge domain of the question;
ranking the candidate element clusters; and
outputting the candidate element cluster corresponding to the highest ranked element cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments can provide a computer implemented method, in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement a mixed-language question answering supplement system, the method comprising receiving a question in a target language; determining the question cannot be answered using a target-language only corpus; applying natural language processing to parse the question into at least one focus; for each focus, determining if one or more target language verbs share direct syntactic dependency with the focus; for each of the one or more verbs sharing direct syntactic dependency, determining if one or more target language entities share direct syntactic dependency with the verb; determining one or more Abstract Universal Verbal Types associated with each verb; for each of the one or more Abstract Universal Verbal Types, determining whether a dependency between a source language entity and a source language verb is of the same type as the dependency between the target language verb and the target language entity; if the dependency is similar, returning the source language entity as a member of a set; populating the set of returned source language entities for each focus in the target language question; identifying one or more parallel passages wherein all core arguments are matched; for each parallel passage: identifying the presence or absence of oblique nominal arguments; and measuring the precision of the oblique nominal arguments in the parallel passages against those present in the target language question; and returning an answer to the target question in the target language based on a scoring of the parallel passages based on the accuracy of their respective oblique nominal arguments.
-
Citations
18 Claims
-
1. A computer implemented method in a data processing system comprising a processor and a memory comprising instructions, which are executed by the processor to cause the processor to determine language independent candidate answers, the method comprising:
-
ingesting a corpus, wherein the corpus may comprise information in one or more languages; using a cognitive system to convert the corpus into one or more acyclic graphs, wherein nodes in the one or more acyclic graphs represent facts and connectors in the one or more acyclic graphs represent connections between two or more facts; generating an element data structure using the one or more acyclic graphs; identifying clusters of elements in the element data structure based on characteristics of the elements; storing the clusters in a cluster data structure; in response to receiving a question, searching the cluster data structure to generate a listing of element clusters having elements involving the question; generating a filtered listing of candidate element clusters, wherein the filtered listing comprises elements and their clusters from the listing of element clusters that are present in the question; training the cognitive system with association rules, wherein the association rules specify compatibility of elements with knowledge domains; applying the trained cognitive system to analyze the filtered listing of candidate element clusters to identify the candidate element clusters that are compatible with the knowledge domain of the question; ranking the candidate element clusters; and outputting the candidate element cluster corresponding to the highest ranked element cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for determining language independent candidate answers, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
ingest a corpus, wherein the corpus may comprise information in one or more languages; use a cognitive system to convert the corpus into one or more acyclic graphs, wherein nodes in the one or more acyclic graphs represent facts and connectors in the one or more acyclic graphs represent connections between two or more facts; generate an element data structure using the one or more acyclic graphs; identify clusters of elements in the element data structure based on characteristics of the elements; store the clusters in a cluster data structure; in response to receiving a question; search the cluster data structure to generate a listing of element clusters having elements involving the question; generate a filtered listing of candidate element clusters, wherein the filtered listing comprises elements and their clusters from the listing of element clusters that are present in the question; train the cognitive system with association rules, wherein the association rules specify compatibility of elements with knowledge domains; apply the trained cognitive system to analyze the filtered listing of candidate element clusters to identify the candidate element clusters that are compatible with the knowledge domain of the question; rank the candidate element clusters; and output the candidate element cluster corresponding to the highest ranked element cluster. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for determining language independent candidate answers, comprising:
a language independent candidate answer determination processor configured to; ingest a corpus, wherein the corpus may comprise information in one or more languages; use a cognitive system to convert the corpus into one or more acyclic graphs, wherein nodes in the one or more acyclic graphs represent facts and connectors in the one or more acyclic graphs represent connections between two or more facts; generate an element data structure using the one or more acyclic graphs; identify clusters of elements in the element data structure based on characteristics of the elements; store the clusters in a cluster data structure; in response to receiving a question; search the cluster data structure to generate a listing of element clusters having elements involving the question; generate a filtered listing of candidate element clusters, wherein the filtered listing comprises elements and their clusters from the listing of element clusters that are present in the question; train the cognitive system with association rules, wherein the association rules specify compatibility of elements with knowledge domains; apply the trained cognitive system to analyze the filtered listing of candidate element clusters to identify the candidate element clusters that are compatible with the knowledge domain of the question; rank the candidate element clusters; and output the candidate element cluster corresponding to the highest ranked element cluster. - View Dependent Claims (18)
Specification