Systems and methods for determining atypical language
First Claim
1. A computer-implemented method comprising:
- analyzing a first cluster of conceptually-related portions of text to identify a probability for each of the one or more portions of texts within the first cluster of conceptually-related portions of text, wherein the first cluster of conceptually-related portions of text comprises one or more financial documents and each of the one or more financial documents comprises one or more financial document sections and each of the one or more financial document sections comprises one or more sentences, and wherein the probability is calculated based on the number of occurrences of a given token of a given sentence of a given financial document of the first cluster of conceptually-related portions of text;
developing a model based on the one or more probabilities corresponding to the one or more portions of texts within the first cluster of conceptually-related portions of text;
calculating an abnormality score for each of the one or more sentences of the one or more financial document sections of a first identified conceptually-related portion of text as compared to the model; and
transmitting a second identified conceptually-related portion of text based upon the abnormality score satisfying a threshold.
5 Assignments
0 Petitions
Accused Products
Abstract
A method includes analyzing a cluster of conceptually-related portions of text to develop a model and calculating a novelty measurement between a first identified conceptually-related portion of text and the model. The method further includes transmitting a second identified conceptually-related portion of text and a score associated with the novelty measurement from a server to an access device via a signal. Another method includes determining at least two corpora of conceptually-related portions of text. The method also includes calculating a common neighbors similarity measurement between the at least two corpora of conceptually-related portions of text and if the common neighbors similarity measurement exceeds a threshold, merging the at least two corpora of conceptually-related portions of text into a cluster or if the common neighbors similarity measurement does not exceed a threshold, maintaining a non-merge of the at least two corpora of conceptually-related portions of text.
14 Citations
14 Claims
-
1. A computer-implemented method comprising:
-
analyzing a first cluster of conceptually-related portions of text to identify a probability for each of the one or more portions of texts within the first cluster of conceptually-related portions of text, wherein the first cluster of conceptually-related portions of text comprises one or more financial documents and each of the one or more financial documents comprises one or more financial document sections and each of the one or more financial document sections comprises one or more sentences, and wherein the probability is calculated based on the number of occurrences of a given token of a given sentence of a given financial document of the first cluster of conceptually-related portions of text; developing a model based on the one or more probabilities corresponding to the one or more portions of texts within the first cluster of conceptually-related portions of text; calculating an abnormality score for each of the one or more sentences of the one or more financial document sections of a first identified conceptually-related portion of text as compared to the model; and transmitting a second identified conceptually-related portion of text based upon the abnormality score satisfying a threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; a memory coupled to the processor; and a processing program stored in the memory for execution by the processor, the processing program comprising; an analysis module, the analysis module configured to analyze a cluster of conceptually-related portions of text to identify a probability for each of the one or more portions of texts within the first cluster of conceptually-related portions of text, wherein the first cluster of conceptually-related portions of text comprises one or more financial documents and each of the one or more financial documents comprises one or more financial document sections'"'"' and each of the one or more financial document sections comprises one or more sentences, and wherein the probability is calculated based on the number of occurrences of a given token of a given sentence of a given financial document of the first cluster of conceptually-related portions of text and to develop a model based on the one or more probabilities corresponding to the one or more portions of texts within the first cluster of conceptually-related portions of text; a novelty module, the novelty module configured to calculate an abnormality score for each of the one or more sentences of the one or more financial document sections of a first identified conceptually-related portion of text as compared to the model; and a transmission module, the transmission module configured to transmit a second identified conceptually-related portion of text based upon the abnormality score satisfying a threshold. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification