Arc filtering in a syntactic graph
First Claim
Patent Images
1. A method comprising:
- identifying a sentence;
identifying a graph of generalized constituents of the sentence based on rough syntactic analysis of a lexical-morphological structure of the sentence, wherein the graph of generalized constituents comprises arcs and nodes, wherein each of the nodes represents a constituent of the sentence comprising one or more words in the sentence that function as a unit within the sentence, and wherein each of the arcs between a pair of the nodes represents a syntactic slot expressing a type of relationship between lexical values of the pair;
filtering, by a data processing apparatus, the arcs of the graph of generalized constituents using a combination classifier comprising a tree classifier and at least one linear classifier, wherein the tree classifier divides the arcs into clusters based on a predetermined set of symbolic features, and wherein the linear classifier filters the clusters of the arcs based on combinations of numerical features for each of the clusters; and
identifying, by the data processing apparatus, a syntactic structure of the sentence by performing precise syntactic analysis of the sentence based on the graph of generalized constituents of the sentence with the filtered clusters of the arcs.
5 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure provides methods and systems for performing syntactic analysis of a text. In some implementations the method includes performing rough syntactic analysis of the text, generating a graph of generalized constituents of the text and filtering arcs of the graph of generalized constituents with a combination classifier which includes a tree classifier and one or more linear classifiers. The combination classifier is trained using parallel analysis of an untagged two-language text corpus.
225 Citations
21 Claims
-
1. A method comprising:
-
identifying a sentence; identifying a graph of generalized constituents of the sentence based on rough syntactic analysis of a lexical-morphological structure of the sentence, wherein the graph of generalized constituents comprises arcs and nodes, wherein each of the nodes represents a constituent of the sentence comprising one or more words in the sentence that function as a unit within the sentence, and wherein each of the arcs between a pair of the nodes represents a syntactic slot expressing a type of relationship between lexical values of the pair; filtering, by a data processing apparatus, the arcs of the graph of generalized constituents using a combination classifier comprising a tree classifier and at least one linear classifier, wherein the tree classifier divides the arcs into clusters based on a predetermined set of symbolic features, and wherein the linear classifier filters the clusters of the arcs based on combinations of numerical features for each of the clusters; and identifying, by the data processing apparatus, a syntactic structure of the sentence by performing precise syntactic analysis of the sentence based on the graph of generalized constituents of the sentence with the filtered clusters of the arcs. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Non-transitory computer storage media having instructions stored therein that, when executed by a data processing apparatus, cause the data processing apparatus to:
-
identify a sentence; identify a graph of generalized constituents of the sentence based on rough syntactic analysis of a lexical-morphological structure of the sentence, wherein the graph of generalized constituents comprises arcs and nodes, wherein each of the nodes represents a constituent of the sentence comprising one or more words in the sentence that function as a unit within the sentence, and wherein each of the arcs between a pair of the nodes represents a syntactic slot expressing a type of relationship between lexical values of the pair; filter, by the data processing apparatus, the arcs of the graph of generalized constituents using a combination classifier comprising a tree classifier and at least one linear classifier, wherein the tree classifier divides the arcs into clusters based on a predetermined set of symbolic features, and wherein the linear classifier filters the clusters of the arcs based on combinations of numerical features for each of the clusters; and identify, by the data processing apparatus, a syntactic structure of the sentence by performing precise syntactic analysis of the sentence based on the graph of generalized constituents of the sentence with the filtered clusters of the arcs. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a data processing apparatus; and a computer-readable medium having instructions stored therein that, when executed by the data processing apparatus, cause the data processing apparatus to; identify a sentence; identify a graph of generalized constituents of the sentence based on rough syntactic analysis of a lexical-morphological structure of the sentence, wherein the graph of generalized constituents comprises arcs and nodes, wherein each of the nodes represents a constituent of the sentence comprising one or more words in the sentence that function as a unit within the sentence, and wherein each of the arcs between a pair of the nodes represents a syntactic slot expressing a type of relationship between lexical values of the pair; filter the arcs of the graph of generalized constituents using a combination classifier comprising a tree classifier and at least one linear classifier, wherein the tree classifier divides the arcs into clusters based on a predetermined set of symbolic features, and wherein the linear classifier filters the clusters of the arcs based on combinations of numerical features for each of the clusters; and identify a syntactic structure of the sentence by performing precise syntactic analysis of the sentence based on the graph of generalized constituents of the sentence with the filtered clusters of the arcs. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification