Parsing rule generalization by N-gram span clustering
First Claim
1. A computer-implemented method, comprising:
- accessing command sentences stored in a data store, wherein each command sentence is a set of n-grams that constitute the command sentence and each command sentence includes a plurality of n-grams, wherein the command sentences include n-grams that collectively map to a plurality of n-gram types;
for each of the n-gram types;
identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including first n-grams of the n-gram type and one or more second n-grams that do not map to any of the plurality of n-gram types;
determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and
for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new type to which the n-grams of the n-gram spans map.
2 Assignments
0 Petitions
Accused Products
Abstract
A method includes accessing command sentences stored in a data store, wherein each command sentence is a collection of n-grams and each command sentence includes at least one n-gram that is a non-terminal n-gram that maps to a non-terminal type, and wherein the command sentences include non-terminal n-grams that collectively map to a plurality of different non-terminal types; for each of the non-terminal types: identifying n-gram spans; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new non-terminal type to which the terminal n-grams of the n-gram spans map.
12 Citations
17 Claims
-
1. A computer-implemented method, comprising:
-
accessing command sentences stored in a data store, wherein each command sentence is a set of n-grams that constitute the command sentence and each command sentence includes a plurality of n-grams, wherein the command sentences include n-grams that collectively map to a plurality of n-gram types; for each of the n-gram types; identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including first n-grams of the n-gram type and one or more second n-grams that do not map to any of the plurality of n-gram types; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new type to which the n-grams of the n-gram spans map. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer readable storage medium storing instructions executable by a data processing apparatus and upon such execution cause the data processing to perform operations comprising:
-
accessing command sentences stored in a data store, wherein each command sentence is a set of n-grams that constitute the command sentence and each command sentence includes a plurality of n-grams, wherein the command sentences include n-grams that collectively map to a plurality of n-gram types; for each of the n-gram types; identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including first n-grams of the n-gram type and one or more second n-grams that do not map to any of the plurality of n-gram types; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new type to which the n-grams of the n-gram spans map. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system, comprising:
-
a data processing apparatus; and a non-transitory computer readable storage medium in data communication with the data processing apparatus and storing instructions executable by the data processing apparatus and upon such execution cause the data processing to perform operations comprising; accessing command sentences stored in a data store, wherein each command sentence is a set of n-grams that constitute the command sentence and each command sentence includes a plurality of n-grams, wherein the command sentences include n-grams that collectively map to a plurality of n-gram types; for each of the n-gram types; identifying n-gram spans, each n-gram span being a proper subset of a set of n-grams that constitute a command sentence and including first n-grams of the n-gram type and one or more second n-grams that do not map to any of the plurality of n-gram types; determining clusters of the n-gram spans, each cluster including n-gram spans meeting a measure of similarity of n-grams spans that belong to the cluster; and for each cluster of n-gram spans, determining, from the n-gram spans belonging to the cluster, a new type to which the n-grams of the n-gram spans map.
-
Specification