Labeling of work of art titles in text for natural language processing
First Claim
Patent Images
1. A parser for parsing text comprising:
- a tokenizing module which divides the text into an ordered sequence of linguistic tokens;
a morphological module for associating parts of speech with the linguistic tokens;
a detection module which applies rules for identifying expressions as candidate titles of works, each of the expressions comprising at least one of the linguistic tokens;
a filtering module for filtering the candidate titles of works, the filtering module applying at least one rule which is formulated to exclude citations of direct speech from the candidate titles of works; and
a comparison module for comparing remaining candidate titles of works with titles of works in an associated knowledge base and annotating the text to identify the candidate title as a nominative unit when a match with a title of a work is found in the associated knowledge base.
6 Assignments
0 Petitions
Accused Products
Abstract
A parser for parsing text includes a tokenizing module which divides the text into an ordered sequence of linguistic tokens. A morphological module associates parts of speech with the linguistic tokens. A detection module identifies candidate titles of creative works, such as works of art. A filtering module filters the candidate titles of works to exclude citations of direct speech from the candidate titles of works. A comparison module compares any remaining candidate titles of works with titles of works in an associated knowledge base. The comparison module annotates the text when a match is found.
48 Citations
21 Claims
-
1. A parser for parsing text comprising:
-
a tokenizing module which divides the text into an ordered sequence of linguistic tokens; a morphological module for associating parts of speech with the linguistic tokens; a detection module which applies rules for identifying expressions as candidate titles of works, each of the expressions comprising at least one of the linguistic tokens; a filtering module for filtering the candidate titles of works, the filtering module applying at least one rule which is formulated to exclude citations of direct speech from the candidate titles of works; and a comparison module for comparing remaining candidate titles of works with titles of works in an associated knowledge base and annotating the text to identify the candidate title as a nominative unit when a match with a title of a work is found in the associated knowledge base. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for natural language processing of input text comprising:
-
processing the text to identify candidate titles of works; filtering the candidate titles of works, including applying at least one rule which is formulated to remove citations of direct speech from the candidate titles of works; comparing remaining candidate titles of works with a knowledge base which identifies titles of works; and for a candidate title of a work which matches a title of a work in the knowledge base, annotating the text to identify the candidate title of a work as a nominative unit. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A storage medium comprising instructions which when executed by a digital processor implement natural language processing of a text input comprising:
-
processing the text to identify candidate titles of works; filtering the candidate titles of works to remove citations of direct speech from the candidate titles of works; comparing candidate titles of works with a knowledge base which identifies titles of works; and annotating text which includes a candidate title of a work for which a match is found in the knowledge base. - View Dependent Claims (20, 21)
-
Specification