Labeling of work of art titles in text for natural language processing
First Claim
Patent Images
1. A computer comprising:
- a parser for parsing text comprising;
a tokenizing module which divides the text into an ordered sequence of linguistic tokens,a morphological module for associating parts of speech with the linguistic tokens, the morphological module identifying verbs which are used to introduce direct speech,a detection module which applies rules for identifying expressions as candidate titles of works, each of the expressions comprising at least one of the linguistic tokens,a filtering module for filtering the candidate titles of works, the filtering module applying at least one rule which is formulated to exclude citations of direct speech from the candidate titles of works, the filtering module determining whether an expression identified by the detection module constitutes direct speech introduced by a verb identified by the morphological module as being one of the verbs which are used to introduce direct speech, anda comparison module for comparing remaining candidate titles of works with titles of works in an associated knowledge base and annotating the text to identify the candidate title as a nominative unit when a match with a title of a work is found in the associated knowledge base; and
a digital processor for implementing the parser.
6 Assignments
0 Petitions
Accused Products
Abstract
A parser for parsing text includes a tokenizing module which divides the text into an ordered sequence of linguistic tokens. A morphological module associates parts of speech with the linguistic tokens. A detection module identifies candidate titles of creative works, such as works of art. A filtering module filters the candidate titles of works to exclude citations of direct speech from the candidate titles of works. A comparison module compares any remaining candidate titles of works with titles of works in an associated knowledge base. The comparison module annotates the text when a match is found.
139 Citations
18 Claims
-
1. A computer comprising:
a parser for parsing text comprising; a tokenizing module which divides the text into an ordered sequence of linguistic tokens, a morphological module for associating parts of speech with the linguistic tokens, the morphological module identifying verbs which are used to introduce direct speech, a detection module which applies rules for identifying expressions as candidate titles of works, each of the expressions comprising at least one of the linguistic tokens, a filtering module for filtering the candidate titles of works, the filtering module applying at least one rule which is formulated to exclude citations of direct speech from the candidate titles of works, the filtering module determining whether an expression identified by the detection module constitutes direct speech introduced by a verb identified by the morphological module as being one of the verbs which are used to introduce direct speech, and a comparison module for comparing remaining candidate titles of works with titles of works in an associated knowledge base and annotating the text to identify the candidate title as a nominative unit when a match with a title of a work is found in the associated knowledge base; and a digital processor for implementing the parser. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A method for natural language processing of input text comprising:
-
processing the text to identify a group of candidate titles of works; identifying verbs which are used to introduce direct speech; filtering the group of candidate titles of works, including applying at least one rule which is formulated to remove candidate titles of works which are citations of direct speech from the group of candidate titles of works, the filtering comprising filtering out expressions linked to a verb identified as being one of the verbs which are used to introduce direct speech; comparing remaining candidate titles of works with a knowledge base which identifies titles of works; for a candidate title of a work which matches a title of a work in the knowledge base, annotating the text to identify the candidate title of a work as a nominative unit; and wherein the filtering, processing, comparing, and annotating are all implemented by a digital processor. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A tangible storage medium comprising instructions which when executed by a digital processor implement natural language processing of a text input comprising:
-
processing the text to identify candidate titles of works; identifying verbs which are used to introduce direct speech; filtering the candidate titles of works to remove candidate titles of works that are determined to be citations of direct speech from the candidate titles of works, the filtering comprising filtering out expressions linked to a verb identified as being one of the verbs which are used to introduce direct speech; comparing candidate titles of works with a knowledge base which identifies titles of works; and annotating text which includes a candidate title of a work for which a match is found in the knowledge base. - View Dependent Claims (17)
-
-
18. A method for natural language processing of input text comprising:
-
processing the text to identify candidate titles of works; filtering the candidate titles of works, including applying at least one rule which is formulated to remove citations of direct speech from the candidate titles of works including identifying verbs which are used to introduce direct speech and filtering candidate titles as constituting direct speech based at least in part on whether the candidate title is introduced by an identified verb which is used to introduce direct speech; comparing remaining candidate titles of works with a knowledge base which identifies titles of works; for a candidate title of a work which matches a title of a work in the knowledge base, annotating the text to identify the candidate title of a work as being a nominative unit; and wherein the filtering, processing, comparing, and annotating are all implemented by a digital processor.
-
Specification