Themes surfacing for communication data analysis

US 9,697,246 B1
Filed: 09/30/2014
Issued: 07/04/2017
Est. Priority Date: 09/30/2013
Status: Active Grant

First Claim

Patent Images

1. A method of processing e-communication data by a computer system to identify one or more themes within the communication data, the method comprising:

accessing, by a processing system of a computer system, a set of communication data stored in a storage system of the computer system;

identifying, by the processing system, terms in the set of communication data, wherein a term is a word or short phrase;

defining, by the processing system, relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another;

calculating, by the processing system, a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, the number of letters in the terms of the relation, and/or the proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data;

identifying, by the processing system, themes in the set of communication data based on the relations, wherein a theme is a group of one or more relations that have similar meaning; and

storing, by the processing system, the terms, the relations, and the themes in a database.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An embodiment of the method of processing communication data to identify one or more themes within the communication data includes identifying terms in a set of communication data, wherein a term is a word or short phrase, and defining relations in the set of communication data based on the terms, wherein the relation is a pair of terms that appear in proximity to one another. The method further includes identifying themes in the set of communication data based on the relations, wherein a theme is a group of one or more relations that have similar meanings, and storing the terms, the relations, and the themes in the database.

Citations

15 Claims

1. A method of processing e-communication data by a computer system to identify one or more themes within the communication data, the method comprising:
- accessing, by a processing system of a computer system, a set of communication data stored in a storage system of the computer system;
  
  identifying, by the processing system, terms in the set of communication data, wherein a term is a word or short phrase;
  
  defining, by the processing system, relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another;
  
  calculating, by the processing system, a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, the number of letters in the terms of the relation, and/or the proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data;
  
  identifying, by the processing system, themes in the set of communication data based on the relations, wherein a theme is a group of one or more relations that have similar meaning; and
  
  storing, by the processing system, the terms, the relations, and the themes in a database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 further comprising identifying a context vector for each term based on the words appearing before and after that term;
    - grouping the terms into nodes based on the context vectors, wherein the terms with similar context vectors are put into the same node; and
      
      grouping the relations with the same nodes.
  - 3. The method of claim 2 further comprising assigning a unique node number to each node;
    - andassociating the unique node number with each term grouped into that node;
      
      wherein the step of grouping the relations with the same term nodes includes grouping the relations containing terms with the same node numbers.
  - 4. The method of claim 2 further comprising:
    - identifying at least one ungrouped term, wherein the ungrouped term is one that is not grouped into the nodes;
      
      determining a similarity score of the ungrouped term to one or more of the terms grouped into the nodes by performing character trigram similarity; and
      
      grouping the ungrouped term into one of the nodes if the similarity score of the ungrouped term to one or more of the terms grouped into that node is at least a threshold similarity score.
  - 5. The method of claim 2 further comprising displaying at least a portion of the grouped relations to a user.
  - 6. The method of claim 2 wherein the step of identifying themes is further based on the grouped relations.
  - 7. The method of claim 1 further comprising eliminating low-scoring relations prior to the step of identifying the themes in the set of communication data.
  - 8. The method of claim 1 further comprising calculating a theme score for each theme by averaging the scores of each relation grouped into that theme, and eliminating low-scoring themes.
  - 9. The method of claim 1 further comprising displaying at least a portion of the themes to a user, prompting the user to eliminate themes.
  - 10. The method of claim 1 further comprising:
    - calculating a theme score for each theme based on the number of relations associated therewith, the number of unique terms associated therewith, and/or the number of classes associated therewith; and
      
      eliminating themes having a theme score below a threshold.

11. A method of processing communication data by a computer system to identify one or more themes within the communication data, the method comprising:
- accessing, by a processing system of a computer system, a set of communication data stored in a storage system of the computer system;
  
  identifying, by the processing system, terms in the set of communication data, wherein a term is a word or short phrase;
  
  defining, by the processing system, relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another;
  
  calculating, by the processing system, a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, die number of letters in the terms of the relation, and/or the proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data;
  
  identifying, by the processing system, themes in the set of communication data based on the relations and the relation scores; and
  
  storing, by the processing system, the terms, the relations, scores and the themes in a database.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11 further comprising:
    - calculating a theme score for each theme, wherein the theme score is based on the relation scores of the relations associated with the theme; and
      
      eliminating themes with theme scores below a threshold.
  - 13. The method of claim 12 further comprising:
    - comparing the themes to a list of important themes, raising the theme score for those themes appearing on the list of important themes.

14. A non-transient computer readable medium programmed with computer readable code that upon execution by a processor causes the processor to execute a method of processing a set of communication data, the method comprising:
- accessing a set of communication data;
  
  identifying terms in the set of communication data, wherein a term is a word or short phrase;
  
  defining relations in the set of communication data based on the terms, wherein a relation is a pair of terms that appear in proximity to one another;
  
  identifying a context vector for each term based on the words appearing before and after that term;
  
  grouping the terms with similar context vectors into a node;
  
  grouping relations with the same nodes into groups of relations;
  
  calculating a relation score for each relation based on a frequency that the terms of the relation appear together in the set of communication data, the number of letters in the terms of the relation, and/or die proximity of the terms to one another, wherein relations that appear relatively frequently in the set of communication data and/or have terms with more letters are given a higher score, wherein the score is lowered for those relations whose terms appear relatively far apart in the set of communication data, wherein scoring each relation includes multiplying a number of times that the terms of the relation appear in the set of communication data by a number of total characters in the terms of the relation, and dividing by 1+an average distance between the terms of the relation as it appears in the set of communication data;
  
  identifying theme candidates in the set of communication data based on the groups of relations;
  
  calculating a theme score for each theme candidate;
  
  identifying themes as those theme candidates having at least a threshold theme score; and
  
  storing the terms, the relations, and the themes in a database.
- View Dependent Claims (15)
- - 15. The non-transient computer readable medium of claim 14 wherein the theme score is based on the number of relations associated therewith, the number of unique terms associated therewith, and/or the number of classes associated therewith.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Systems Incorporated
Original Assignee
Verint Systems Limited (Verint Systems Incorporated)
Inventors
Horesh, Yair, Romano, Roni
Primary Examiner(s)
Chen, Susan

Application Number

US14/501,519
Time in Patent Office

1,008 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/23   Updating

G06F 16/24578   using ranking

G06F 16/313   Selection or weighting of t...

G06F 16/367   Ontology

Themes surfacing for communication data analysis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Themes surfacing for communication data analysis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links