Systems and methods for determining atypical language

US 9,690,849 B2
Filed: 03/07/2014
Issued: 06/27/2017
Est. Priority Date: 09/30/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

analyzing a first cluster of conceptually-related portions of text to identify a probability for each of the one or more portions of texts within the first cluster of conceptually-related portions of text, wherein the first cluster of conceptually-related portions of text comprises one or more financial documents and each of the one or more financial documents comprises one or more financial document sections and each of the one or more financial document sections comprises one or more sentences, and wherein the probability is calculated based on the number of occurrences of a given token of a given sentence of a given financial document of the first cluster of conceptually-related portions of text;

developing a model based on the one or more probabilities corresponding to the one or more portions of texts within the first cluster of conceptually-related portions of text;

calculating an abnormality score for each of the one or more sentences of the one or more financial document sections of a first identified conceptually-related portion of text as compared to the model; and

transmitting a second identified conceptually-related portion of text based upon the abnormality score satisfying a threshold.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method includes analyzing a cluster of conceptually-related portions of text to develop a model and calculating a novelty measurement between a first identified conceptually-related portion of text and the model. The method further includes transmitting a second identified conceptually-related portion of text and a score associated with the novelty measurement from a server to an access device via a signal. Another method includes determining at least two corpora of conceptually-related portions of text. The method also includes calculating a common neighbors similarity measurement between the at least two corpora of conceptually-related portions of text and if the common neighbors similarity measurement exceeds a threshold, merging the at least two corpora of conceptually-related portions of text into a cluster or if the common neighbors similarity measurement does not exceed a threshold, maintaining a non-merge of the at least two corpora of conceptually-related portions of text.

14 Citations

14 Claims

1. A computer-implemented method comprising:
- analyzing a first cluster of conceptually-related portions of text to identify a probability for each of the one or more portions of texts within the first cluster of conceptually-related portions of text, wherein the first cluster of conceptually-related portions of text comprises one or more financial documents and each of the one or more financial documents comprises one or more financial document sections and each of the one or more financial document sections comprises one or more sentences, and wherein the probability is calculated based on the number of occurrences of a given token of a given sentence of a given financial document of the first cluster of conceptually-related portions of text;
  
  developing a model based on the one or more probabilities corresponding to the one or more portions of texts within the first cluster of conceptually-related portions of text;
  
  calculating an abnormality score for each of the one or more sentences of the one or more financial document sections of a first identified conceptually-related portion of text as compared to the model; and
  
  transmitting a second identified conceptually-related portion of text based upon the abnormality score satisfying a threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the first cluster of conceptually-related portions of text is generated by aggregating one or more financial documents according to an assigned key value.
  - 3. The method of claim 2, wherein the assigned key value is a time period of a given financial document.
  - 4. The method of claim 2, wherein the assigned key value is a sector for a given financial document.
  - 5. The method of claim 2, wherein the assigned key value is a market cap for a given financial document.
  - 6. The method of claim 1, wherein the second identified conceptually-related portion of text comprises one or more sentences identified as atypical.
  - 7. The method of claim 1, wherein the second identified conceptually-related portion of text comprises one or more sentences identified as typical.

8. A system comprising:
- a processor;
  
  a memory coupled to the processor; and
  
  a processing program stored in the memory for execution by the processor, the processing program comprising;
  
  an analysis module, the analysis module configured to analyze a cluster of conceptually-related portions of text to identify a probability for each of the one or more portions of texts within the first cluster of conceptually-related portions of text, wherein the first cluster of conceptually-related portions of text comprises one or more financial documents and each of the one or more financial documents comprises one or more financial document sections'"'"' and each of the one or more financial document sections comprises one or more sentences, and wherein the probability is calculated based on the number of occurrences of a given token of a given sentence of a given financial document of the first cluster of conceptually-related portions of text and to develop a model based on the one or more probabilities corresponding to the one or more portions of texts within the first cluster of conceptually-related portions of text;
  
  a novelty module, the novelty module configured to calculate an abnormality score for each of the one or more sentences of the one or more financial document sections of a first identified conceptually-related portion of text as compared to the model; and
  
  a transmission module, the transmission module configured to transmit a second identified conceptually-related portion of text based upon the abnormality score satisfying a threshold.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the first cluster of conceptually-related portions of text is generated by aggregating one or more financial documents according to an assigned key value.
  - 10. The system of claim 9, wherein the assigned key value is a time period of a given financial document.
  - 11. The system of claim 9, wherein the assigned key value is a sector for a given financial document.
  - 12. The system of claim 9, wherein the assigned key value is a market cap for a given financial document.
  - 13. The system of claim 8, wherein the second identified conceptually-related portion of text comprises one or more sentences identified as atypical.
  - 14. The system of claim 8, wherein the second identified conceptually-related portion of text comprises one or more sentences identified as typical.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Thomson Reuters Enterprise Centre GmbH (The Woodbridge Co. Ltd.)
Original Assignee
Thomson Reuters Global Resources Unlimited Company (The Woodbridge Co. Ltd.)
Inventors
Al-Kofahi, Khalid, Shah, Sameena, Dorr, Dietmar, Sisk, Jacob
Primary Examiner(s)
Ries, Laurie

Application Number

US14/201,134
Publication Number

US 20140344279A1
Time in Patent Office

1,208 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/335   Filtering based on addition...

G06F 16/35   Clustering; Classification

G06F 40/242   Dictionaries

G06F 40/284   Lexical analysis, e.g. toke...

G06Q 10/10   Office automation; Time man...

G06Q 50/18   Legal services

Systems and methods for determining atypical language

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for determining atypical language

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links