DOCUMENT PROCESSOR AND ASSOCIATED METHOD
First Claim
Patent Images
1. A computer implemented method of processing a digitally encoded document having text composed by an author, said method including the steps of:
- using a processor to analyse segmentation of the text and storing results of said segmentation analysis in a digitally accessible format;
using a processor to analyse punctuation of the text and storing results of said punctuation analysis in a digitally accessible format;
using a processor to linguistically analyse the text and storing results of said linguistic analysis in a digitally accessible format; and
predicting an author trait using a machine learning system that is adapted to receive the results of said linguistic analysis, said segmentation analysis and said punctuation analysis as input, said machine learning system having been trained to process said input so as to output at least one predicted author trait, wherein said at least one predicted author trait is a demographic trait.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer implemented method of processing a digitally encoded document having a text composed by an author by using a processor to analyse the segmentation, punctuation and linguistics of text and storing the results in a digitally accessible format. Author traits are then predicted using a machine learning system based on the results of the segmentation, punctuation and linguistics analysis of the text.
73 Citations
25 Claims
-
1. A computer implemented method of processing a digitally encoded document having text composed by an author, said method including the steps of:
-
using a processor to analyse segmentation of the text and storing results of said segmentation analysis in a digitally accessible format; using a processor to analyse punctuation of the text and storing results of said punctuation analysis in a digitally accessible format; using a processor to linguistically analyse the text and storing results of said linguistic analysis in a digitally accessible format; and predicting an author trait using a machine learning system that is adapted to receive the results of said linguistic analysis, said segmentation analysis and said punctuation analysis as input, said machine learning system having been trained to process said input so as to output at least one predicted author trait, wherein said at least one predicted author trait is a demographic trait. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 21, 22, 23)
-
-
14. A computer implemented method of processing a digitally encoded document having text composed by an author, said method including the steps of:
-
using a processor to analyse segmentation of the text and storing results of said segmentation analysis in a digitally accessible format; using a processor to analyse punctuation of the text and storing results of said punctuation analysis in a digitally accessible format; using a processor to linguistically analyse the text and storing results of said linguistic analysis in a digitally accessible format; and predicting an author trait using a machine learning system that is adapted to receive the results of said linguistic analysis, said segmentation analysis and said punctuation analysis as input, said machine learning system having been trained to process said input so as to output at least one predicted author trait, wherein said at least one predicted author trait is a psychometric trait. - View Dependent Claims (15, 16, 17)
-
-
18. A method of training a machine learning system, said method including:
-
compiling a representative sample of training documents, each training document being associated with known author trait information; using a processor to linguistically analyse text of the training documents and storing the results of said linguistic analysis in a digitally accessible format; using a processor to analyse segmentation of the text of the training documents and storing the results of said segmentation analysis in a digitally accessible format; using a processor to analyse punctuation of the text of the training documents and storing the results of said punctuation analysis in a digitally accessible format; and using the machine learning system in a training mode to process the results of said linguistic analysis, said segmentation analysis and said punctuation analysis, along with the associated known author trait information, so as to formulate a function for use by the machine learning system in an operational mode to process input documents so as to output at least one predicted author trait, wherein said at least one predicted author trait is a demographic trait and/or a psychometric trait. - View Dependent Claims (19, 20)
-
-
24. A machine learning system for processing a digitally encoded document having text composed by an author, said machine learning system having been trained to process said document so as to output at least three of the following six predicted author traits:
age;
gender;
educational level;
native language;
country of origin and/or geographic region.
-
25. A machine learning system for processing a digitally encoded document having text composed by an author, said machine learning system having been trained to process said document so as to output at least three of the following six predicted author traits:
extraversion;
agreeableness;
conscientiousness;
neuroticism;
psychoticism and/or openness.
Specification