TEXT ANALYSIS
1 Assignment
0 Petitions
Accused Products
Abstract
A method of processing text having an associated source type to generate data indicative of a property associated with said text, said text comprising a plurality of tokens. The method comprises generating a plurality of metrics of said text based upon said plurality of tokens, the plurality of metrics comprising token count data for said plurality of tokens, part of speech data for said plurality of tokens, semantic field data for said plurality of tokens and at least one metric indicative of a property of the text; selecting reference data from a plurality of reference data based upon the source type associated with the text processing each of said plurality of metrics of said text based upon the reference data to generate data indicating a relationship between said plurality of metrics and said reference data; and combining the data indicating a relationship between the respective ones of the plurality of metrics and said reference data to generate the data indicative of a property associated with said text. The method may be applied to author profiling.
24 Citations
47 Claims
-
1-27. -27. (canceled)
-
28. A method of processing text having an associated source type to generate data indicative of a property associated with said text, said text comprising a plurality of tokens, the method comprising:
-
generating a plurality of metrics of said text based upon said plurality of tokens, the plurality of metrics comprising token count data for said plurality of tokens, part of speech data for said plurality of tokens, semantic field data for said plurality of tokens and at least one metric indicative of a property of the text; selecting reference data from a plurality of reference data based upon the source type associated with the text; processing each of said plurality of metrics of said text based upon the reference data to generate data indicating a relationship between respective ones of said plurality of metrics and said reference data; and combining the data indicating a relationship between the respective ones of the plurality of metrics and said reference data to generate the data indicative of a property associated with said text; wherein first data indicative of a first property of the author of said text is generated based upon a first selected reference data; wherein second data indicative of a second property of the author of said text is selected based upon a second reference data; and wherein one of said first and second properties is selected based upon said first and second data. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A non-transitory computer readable medium carrying a computer program comprising computer readable instructions configured to cause a computer to carry out a method of processing text having an associated source type to generate data indicative of a property associated with said text, said text comprising a plurality of tokens, the method comprising:
-
generating a plurality of metrics of said text based upon said plurality of tokens, the plurality of metrics comprising token count data for said plurality of tokens, part of speech data for said plurality of tokens, semantic field data for said plurality of tokens and at least one metric indicative of a property of the text; selecting reference data from a plurality of reference data based upon the source type associated with the text; processing each of said plurality of metrics of said text based upon the reference data to generate data indicating a relationship between respective ones of said plurality of metrics and said reference data; and combining the data indicating a relationship between the respective ones of the plurality of metrics and said reference data to generate the data indicative of a property associated with said text; wherein first data indicative of a first property of the author of said text is generated based upon a first selected reference data; wherein second data indicative of a second property of the author of said text is selected based upon a second reference data; and wherein one of said first and second properties is selected based upon said first and second data.
-
-
47. A computer apparatus for processing text having an associated source type to generate data indicative of a property associated with said text, said text comprising a plurality of tokens, the apparatus comprising:
-
a memory storing processor readable instructions; and a processor arranged to read and execute instructions stored in said memory; wherein said processor readable instructions comprise instructions arranged to control the computer to carry out a method of processing text having an associated source type to generate data indicative of a property associated with said text, said text comprising a plurality of tokens, the method comprising; generating a plurality of metrics of said text based upon said plurality of tokens, the plurality of metrics comprising token count data for said plurality of tokens, part of speech data for said plurality of tokens, semantic field data for said plurality of tokens and at least one metric indicative of a property of the text; selecting reference data from a plurality of reference data based upon the source type associated with the text; processing each of said plurality of metrics of said text based upon the reference data to generate data indicating a relationship between respective ones of said plurality of metrics and said reference data; and combining the data indicating a relationship between the respective ones of the plurality of metrics and said reference data to generate the data indicative of a property associated with said text; wherein first data indicative of a first property of the author of said text is generated based upon a first selected reference data; wherein second data indicative of a second property of the author of said text is selected based upon a second reference data; and wherein one of said first and second properties is selected based upon said first and second data.
-
Specification