Dynamic Concept Based Query Expansion
First Claim
1. A method, in an information handling system comprising a processor and a memory to expand queries processed by a question/answer (QA) system, the method comprising:
- extracting, by at least one of the processors, a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents, and wherein the concepts are stored in the memory;
generating, by at least one of the processors, a plurality of child level categories in a category hierarchy from the plurality of concepts, and storing the generated child level categories in the memory;
grouping, by at least one of the processors, the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets;
creating, by at least one of the processors, a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets, and storing the parent categories in the memory;
dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories, wherein each of the sub-corpora is stored in the memory; and
answering, by at least one of the processors, a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category.
1 Assignment
0 Petitions
Accused Products
Abstract
An approach is provided expand queries processed by a question/answer (QA) system. In the approach, concepts are extracted from documents using natural language processing to identify the concepts included in passages found in the documents. The approach generates child level categories in a category hierarchy from the concepts and groups the child level categories into sets based on related concepts. The process creates parent categories from the sets and divides a corpus used by the QA system into a number of sub-corpora, with each of the sub-corpora corresponding to one of the child level categories. The approach answers questions posed to the QA system by identifying a child level category related to the question and searching the sub-corpora corresponding to the child level category.
-
Citations
20 Claims
-
1. A method, in an information handling system comprising a processor and a memory to expand queries processed by a question/answer (QA) system, the method comprising:
-
extracting, by at least one of the processors, a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents, and wherein the concepts are stored in the memory; generating, by at least one of the processors, a plurality of child level categories in a category hierarchy from the plurality of concepts, and storing the generated child level categories in the memory; grouping, by at least one of the processors, the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating, by at least one of the processors, a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets, and storing the parent categories in the memory; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories, wherein each of the sub-corpora is stored in the memory; and answering, by at least one of the processors, a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to expand queries processed by a question/answer (QA) system, wherein the set of instructions perform actions of; extracting a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents; generating a plurality of child level categories in a category hierarchy from the plurality of concepts; grouping the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories; and answering a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product stored in a computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the information handling system to expand queries processed by a question/answer (QA) system by performing actions comprising:
-
extracting a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents; generating a plurality of child level categories in a category hierarchy from the plurality of concepts; grouping the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories; and answering a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification