Parsing rule generalization by n-gram span clustering
First Claim
1. A computer-implemented method performed by a data processing apparatus, comprising:
- accessing command sentences stored in a data store, wherein each command sentence is a collection of n-grams and each command sentence includes at least one n-gram that is a non-terminal n-gram that maps to a non-terminal type, and wherein the command sentences include non-terminal n-grams that collectively map to a plurality of different non-terminal types;
for each of the non-terminal types;
identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including a non-terminal n-gram of the non-terminal type and one or more terminal n-grams that do not map to a non-terminal type;
determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and
for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new non-terminal type to which the terminal n-grams of the n-gram spans map.
2 Assignments
0 Petitions
Accused Products
Abstract
A method includes accessing command sentences stored in a data store, wherein each command sentence is a collection of n-grams and each command sentence includes at least one n-gram that is a non-terminal n-gram that maps to a non-terminal type, and wherein the command sentences include non-terminal n-grams that collectively map to a plurality of different non-terminal types; for each of the non-terminal types: identifying n-gram spans; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new non-terminal type to which the terminal n-grams of the n-gram spans map.
-
Citations
21 Claims
-
1. A computer-implemented method performed by a data processing apparatus, comprising:
-
accessing command sentences stored in a data store, wherein each command sentence is a collection of n-grams and each command sentence includes at least one n-gram that is a non-terminal n-gram that maps to a non-terminal type, and wherein the command sentences include non-terminal n-grams that collectively map to a plurality of different non-terminal types; for each of the non-terminal types; identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including a non-terminal n-gram of the non-terminal type and one or more terminal n-grams that do not map to a non-terminal type; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new non-terminal type to which the terminal n-grams of the n-gram spans map. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer readable storage medium storing instructions executable by a data processing apparatus and upon such execution cause the data processing to perform operations comprising:
-
accessing command sentences stored in a data store, wherein each command sentence is a collection of n-grams and each command sentence includes at least one n-gram that is a non-terminal n-gram that maps to a non-terminal type, and wherein the command sentences include non-terminal n-grams that collectively map to a plurality of different non-terminal types; for each of the non-terminal types; identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including a non-terminal n-gram of the non-terminal type and one or more terminal n-grams that do not map to a non-terminal type; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new non-terminal type to which the terminal n-grams of the n-gram spans map. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system, comprising:
-
a data processing apparatus; and a non-transitory computer readable storage medium in data communication with the data processing apparatus and storing instructions executable by the data processing apparatus and upon such execution cause the data processing to perform operations comprising; accessing command sentences stored in a data store, wherein each command sentence is a collection of n-grams and each command sentence includes at least one n-gram that is a non-terminal n-gram that maps to a non-terminal type, and wherein the command sentences include non-terminal n-grams that collectively map to a plurality of different non-terminal types; for each of the non-terminal types; identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including a non-terminal n-gram of the non-terminal type and one or more terminal n-grams that do not map to a non-terminal type; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new non-terminal type to which the terminal n-grams of the n-gram spans map.
-
Specification