Variables and method for authorship attribution

US 20070239433A1
Filed: 04/06/2006
Published: 10/11/2007
Est. Priority Date: 04/06/2006
Status: Active Grant

First Claim

Patent Images

1. A method to determine whether an unidentified author of a textual work corresponds to a known author, the method comprising the steps of:

obtaining a known sample of text of the known author;

selecting from the known sample a known grammatical unit;

parsing and analyzing the known grammatical unit to produce known grammatical unit level data;

selecting from the textual work an unknown grammatical unit;

parsing and analyzing the unknown grammatical unit to produce unknown grammatical unit level data; and

comparing the unknown grammatical unit level data to the known grammatical unit level data.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method uses linguistic units of analysis to identify the authorship of a document. The method is useful to determine authorship of brief documents, and in situations where there are less than ten documents per known author, i.e. when there is scarcity of text. The method analyzes parameters such as the syntax, punctuation, and, optionally the average word and paragraph length, and when the parameters are analyzed using statistical methods, obtains a high degree of reliability (>90% accuracy). The method can be applicable to numerous languages other than English because the variables selected are characteristic of most languages. The reliability of the method is verified when subjected to a cross-validation statistical analysis.

56 Citations

View as Search Results

5 Claims

1. A method to determine whether an unidentified author of a textual work corresponds to a known author, the method comprising the steps of:
- obtaining a known sample of text of the known author;
  
  selecting from the known sample a known grammatical unit;
  
  parsing and analyzing the known grammatical unit to produce known grammatical unit level data;
  
  selecting from the textual work an unknown grammatical unit;
  
  parsing and analyzing the unknown grammatical unit to produce unknown grammatical unit level data; and
  
  comparing the unknown grammatical unit level data to the known grammatical unit level data.

2. A set of characteristics of a textual work comprising a syntactic feature and a graphemic feature.

3. A computer-aided method to determine whether an unidentified author of a textual work belongs to a group comprising the textual work of a known author, the method comprising the steps of:
- obtaining a sample of the textual work of the unidentified author;
  
  obtaining a sample of the textual work of the known author;
  
  entering the samples into a computer system, the computer system including a memory, a means for analyzing documents, and a means for determining belonging, stored within the memory;
  
  utilizing the means for analyzing documents, splitting the entered samples into individual sentences, the sentences each including a head, a plurality of words and punctuation, the punctuation defining a syntactic edge within the individual sentence, and the punctuation defining a discursive function emphatic within the individual sentence;
  
  categorizing the punctuation by determining the syntactic edge;
  
  indicating the discursive function emphatic, a graphemic feature being generated by the steps of categorizing and indicating;
  
  dividing each of the individual sentences into the words;
  
  labeling each of the words as a part of speech;
  
  listing the labeled words into phrases for each labeled word;
  
  identifying phrases for each said head;
  
  classifying the identified phrases as marked or unmarked;
  
  characterizing the identified phrases by markedness, thereby producing a plurality of syntactic features; and
  
  utilizing the means for determining belonging, inputting at least one of the syntactic features and at least one of the graphemic feature for each said sample to determine whether the unidentified author of the textual work sample belongs to the known author group.

4. A system for determining whether an unidentified author of a textual work belongs to a group comprising the textual work of a known author, the system comprising:
- a computer system including a memory, an input means, a means for analyzing documents, and a means for determining belonging, stored within the memory;
  
  a sample of the textual work of the unidentified author;
  
  a sample of the textual work of the known author, the samples being input into the computer system;
  
  the means for analyzing documents splitting the entered samples into individual sentences, the sentences each including a head, a plurality of words and punctuation, the punctuation defining a syntactic edge within the individual sentences, and the punctuation defining a discursive function emphatic within the individual sentence;
  
  the means for analyzing documents categorizing the punctuation by determining the syntactic edge; and
  
  indicating the discursive function emphatic, thereby generating a graphemic feature;
  
  the means for analyzing documents dividing each of the individual sentences into the words;
  
  labeling each of the words as a part of speech;
  
  listing the labeled words into phrases for each labeled word, identifying phrases for each said head, classifying the identified phrases as marked or unmarked, characterizing the identified phrases by markedness, thereby producing a plurality of syntactic features; and
  
  inputting at least one of the syntactic features and at least one of the graphemic features into the means for determining belonging, thereby determining whether the unidentified author of the textual work sample belongs to the known author group.

5. A method to determine whether an unidentified author of a textual work belongs to a group comprising the textual work of a known author, the method comprising the steps of:
- obtaining a sample of the textual work of the unidentified author;
  
  obtaining a sample of the textual work of the known author;
  
  analyzing the samples by, splitting the entered samples into individual sentences, the sentences each including a head, a plurality of words and punctuation, the punctuation defining a syntactic edge within the individual sentence, and the punctuation defining a discursive function emphatic within the individual sentence;
  
  categorizing the punctuation by determining the syntactic edge;
  
  indicating the discursive function emphatic, a graphemic feature being generated by the steps of categorizing and indicating;
  
  dividing each of the individual sentences into the words;
  
  labeling each of the words as a part of speech;
  
  listing the labeled words into phrases for each labeled word;
  
  identifying phrases for each said head;
  
  classifying the identified phrases as marked or unmarked;
  
  characterizing the identified phrases by markedness, thereby producing a plurality of syntactic features;
  
  utilizing a means for determining belonging, inputting at least one of the syntactic features and at least one of the graphemic features to determine whether the unidentified author of the textual work sample belongs to the known author group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Carole Chaski
Original Assignee
Carole Chaski
Inventors
Chaski, Carole

Granted Patent

US 9,880,995 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 16/353   into predefined classes

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/237   Lexical tools

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

Variables and method for authorship attribution

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

56 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Variables and method for authorship attribution

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links