Analyzing concepts over time
First Claim
Patent Images
1. A method, in an information handling system comprising a processor and a memory, for analyzing concept vectors over time to detect changes in a corpus, the method comprising:
- generating, by the system, at least a first concept vector set V1, . . . , Vk derived from a first set of concept sequences over k concepts that are extracted from the corpus and applied to a vector learning component;
generating, by the system, at least a second concept vector set V′
1, . . . , V′
k+b derived from a concatenation of the first set of concept sequences and a second set of concept sequences over k old and b new concepts that are extracted from the corpus and applied to the vector learning component, where the second set of concept sequences is effectively collected after collection of the first set of concept sequences; and
performing, by the system, a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by analyzing relationship strengths between concepts that persist in the first set of concept sequences and the second set of concept sequences to identify market trends for answering questions submitted to the information handling system by identifying vector changes for one or more concepts included in the first and/or second set of concept sequences, wherein analyzing relationship strengths comprises;
computing, by the system, a first cosine distance between each vector pair Vi, Vj from the first concept vector set V1, . . . , Vk for all i≠
j, 1≦
i, j≦
k;
computing, by the system, a second cosine distance between each vector pair V′
i, V′
j from the second concept vector set V′
1, . . . , V′
k+b for all i≠
j, 1≦
i, j≦
k; and
identifying concept pairs from the first set of concept sequences whose interrelationship has changed by reporting each concept pair Vi, Vj whereby a subtraction of the second cosine distance from the first cosine distance exceeds a first specified reporting threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for automatically generating and processing first and second concept vector sets extracted, respectively, from a first set of concept sequences and from a second, temporally separated, concept sequences by performing a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by identifying changes for one or more concepts included in the first and/or second set of concept sequences.
30 Citations
19 Claims
-
1. A method, in an information handling system comprising a processor and a memory, for analyzing concept vectors over time to detect changes in a corpus, the method comprising:
-
generating, by the system, at least a first concept vector set V1, . . . , Vk derived from a first set of concept sequences over k concepts that are extracted from the corpus and applied to a vector learning component; generating, by the system, at least a second concept vector set V′
1, . . . , V′
k+b derived from a concatenation of the first set of concept sequences and a second set of concept sequences over k old and b new concepts that are extracted from the corpus and applied to the vector learning component, where the second set of concept sequences is effectively collected after collection of the first set of concept sequences; andperforming, by the system, a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by analyzing relationship strengths between concepts that persist in the first set of concept sequences and the second set of concept sequences to identify market trends for answering questions submitted to the information handling system by identifying vector changes for one or more concepts included in the first and/or second set of concept sequences, wherein analyzing relationship strengths comprises; computing, by the system, a first cosine distance between each vector pair Vi, Vj from the first concept vector set V1, . . . , Vk for all i≠
j, 1≦
i, j≦
k;computing, by the system, a second cosine distance between each vector pair V′
i, V′
j from the second concept vector set V′
1, . . . , V′
k+b for all i≠
j, 1≦
i, j≦
k; andidentifying concept pairs from the first set of concept sequences whose interrelationship has changed by reporting each concept pair Vi, Vj whereby a subtraction of the second cosine distance from the first cosine distance exceeds a first specified reporting threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to analyze concept vectors over time to detect changes in a corpus, wherein the set of instructions are executable to perform actions of; generating, by the system, at least a first concept vector set V1, . . . , Vk derived from a first set of concept sequences over k concepts that are extracted from the corpus and applied to a vector learning component; generating, by the system, at least a second concept vector set V′
1, . . . , V′
k+b derived from a concatenation of the first set of concept sequences and a second set of concept sequences over k old and b new concepts that are extracted from the corpus and applied to the vector learning component, where the second set of concept sequences is effectively collected after collection of the first set of concept sequences; andperforming, by the system, a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by analyzing relationship strengths between concepts that persist in the first set of concept sequences and the second set of concept sequences to identify market trends for answering questions submitted to the information handling system by identifying vector changes for one or more concepts included in the first and/or second set of concept sequences, wherein analyzing relationship strengths comprises; computing, by the system, a first cosine distance between each vector pair Vi, Vj from the first concept vector set V1, . . . , Vk for all i≠
j, 1≦
i, j≦
k;computing, by the system, a second cosine distance between each vector pair V′
i, V′
j from the second concept vector set V′
1, . . . , V′
k+b for all i≠
j, 1≦
i, j≦
k; andidentifying concept pairs from the first set of concept sequences whose interrelationship has changed by reporting each concept pair Vi, Vj whereby a subtraction of the second cosine distance from the first cosine distance exceeds a first specified reporting threshold. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer program product stored in a non-transitory computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the system to analyze concept vectors over time to detect changes in a corpus by performing actions comprising:
-
generating, by the system, at least a first concept vector set V1, . . . , Vk derived from a first set of concept sequences over k concepts that are extracted from the corpus and applied to a vector learning component; generating, by the system, at least a second concept vector set V′
1, . . . , V′
k+b derived from a concatenation of the first set of concept sequences and a second set of concept sequences over k old and b new concepts that are extracted from the corpus and applied to the vector learning component, where the second set of concept sequences is effectively collected after collection of the first set of concept sequences; andperforming, by the system, a natural language processing (NLP) analysis of the first concept vector set and second concept vector set to detect changes in the corpus over time by analyzing relationship strengths between concepts that persist in the first set of concept sequences and the second set of concept sequences to identify market trends for answering questions submitted to the information handling system by identifying vector changes for one or more concepts included in the first and/or second set of concept sequences, wherein analyzing relationship strengths comprises; computing, by the system, a first cosine distance between each vector pair Vi, Vj from the first concept vector set V1, . . . , Vk for all i≠
j, 1≦
i, j≦
k;computing, by the system, a second cosine distance between each vector pair V′
i, V′
j from the second concept vector set V′
1, . . . , V′
k+b for all i≠
j, 1≦
i, j≦
k; andidentifying concept pairs from the first set of concept sequences whose interrelationship has changed by reporting each concept pair Vi, Vj whereby a subtraction of the second cosine distance from the first cosine distance exceeds a first specified reporting threshold. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification