Themes surfacing for communication data analysis
First Claim
1. A method of processing e-communication data by a computer system to identify one or more themes within the communication data, the method comprising:
- accessing, by a processing system of a computer system, a set of communication data stored in a storage system of the computer system;
identifying, by the processing system, terms in the set of communication data, wherein a term is a word or short phrase;
defining, by the processing system, relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another;
calculating, by the processing system, a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, the number of letters in the terms of the relation, and/or the proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data;
identifying, by the processing system, themes in the set of communication data based on the relations, wherein a theme is a group of one or more relations that have similar meaning; and
storing, by the processing system, the terms, the relations, and the themes in a database.
2 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of the method of processing communication data to identify one or more themes within the communication data includes identifying terms in a set of communication data, wherein a term is a word or short phrase, and defining relations in the set of communication data based on the terms, wherein the relation is a pair of terms that appear in proximity to one another. The method further includes identifying themes in the set of communication data based on the relations, wherein a theme is a group of one or more relations that have similar meanings, and storing the terms, the relations, and the themes in the database.
-
Citations
15 Claims
-
1. A method of processing e-communication data by a computer system to identify one or more themes within the communication data, the method comprising:
-
accessing, by a processing system of a computer system, a set of communication data stored in a storage system of the computer system; identifying, by the processing system, terms in the set of communication data, wherein a term is a word or short phrase; defining, by the processing system, relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another; calculating, by the processing system, a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, the number of letters in the terms of the relation, and/or the proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data; identifying, by the processing system, themes in the set of communication data based on the relations, wherein a theme is a group of one or more relations that have similar meaning; and storing, by the processing system, the terms, the relations, and the themes in a database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of processing communication data by a computer system to identify one or more themes within the communication data, the method comprising:
-
accessing, by a processing system of a computer system, a set of communication data stored in a storage system of the computer system; identifying, by the processing system, terms in the set of communication data, wherein a term is a word or short phrase; defining, by the processing system, relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another; calculating, by the processing system, a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, die number of letters in the terms of the relation, and/or the proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data; identifying, by the processing system, themes in the set of communication data based on the relations and the relation scores; and storing, by the processing system, the terms, the relations, scores and the themes in a database. - View Dependent Claims (12, 13)
-
-
14. A non-transient computer readable medium programmed with computer readable code that upon execution by a processor causes the processor to execute a method of processing a set of communication data, the method comprising:
-
accessing a set of communication data; identifying terms in the set of communication data, wherein a term is a word or short phrase; defining relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another; identifying a context vector for each term based on the words appearing before and after that term; grouping the terms with similar context vectors into a node; grouping relations with the same nodes into groups of relations; calculating a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, the number of letters in the terms of the relation, and/or die proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data; identifying theme candidates in the set of communication data based on the groups of relations; calculating a theme score for each theme candidate; identifying themes as those theme candidates having at least a threshold theme score; and storing the terms, the relations, and the themes in a database. - View Dependent Claims (15)
-
Specification