Expedited techniques for generating string manipulation programs
First Claim
1. A method, performed using computing functionality, for generating subprograms that perform string manipulation tasks, comprising:
- receiving and storing input-output examples, the input-output examples providing input strings and corresponding output strings;
identifying tokens that are used in input strings of the input-output examples, to provide identified tokens;
identifying a first program-generating constraint;
determining whether substrings align with natural token boundaries;
generating, based on the first program-generating constraint, sets of subprograms for the respective input-output examples, each subprogram configured to transform at least one input string associated with a particular input-output example when the substrings align with natural token boundaries, otherwise excluding generation of loop-type expressions and subprograms for the respective input-output examples;
grouping the sets of subprograms into partitions, yielding at least one representative subprogram for each partition;
determining whether said generating and grouping provide results that satisfy a predefined success test; and
repeating said generating and grouping, as governed by a second program-generating constraint, if it is determined that said generating and grouping do not satisfy the predefined success test, the second program generating constraint permitting at least one generation of loop-type expressions or subprograms based on the substrings not aligning with natural token boundaries thereby imposing greater processing complexity and allowing for a larger set of subprograms compared to the first program generating constraint.
2 Assignments
0 Petitions
Accused Products
Abstract
A program creation system is described which generates sets of subprograms for respective input-output examples. The program creation system then groups the sets into partitions by performing an intersection operation. According to one aspect, the program creation system generates subprograms so as to exclude tokens that are not represented by the input strings of the input-output examples. According to another aspect, the program creation system first generates the subprograms without attempting to generate loop-type expressions. If this operation produces unsatisfactory results, the program creation system repeats its processing, this time including loop-type expressions. According to another aspect, the program creation system performs the grouping operation using an expedited graph-intersection operation. According to another aspect, the program creation system ranks programs (which are created based on the results of the grouping operation) based on the presence of preferred features found in the programs.
48 Citations
13 Claims
-
1. A method, performed using computing functionality, for generating subprograms that perform string manipulation tasks, comprising:
-
receiving and storing input-output examples, the input-output examples providing input strings and corresponding output strings; identifying tokens that are used in input strings of the input-output examples, to provide identified tokens; identifying a first program-generating constraint; determining whether substrings align with natural token boundaries; generating, based on the first program-generating constraint, sets of subprograms for the respective input-output examples, each subprogram configured to transform at least one input string associated with a particular input-output example when the substrings align with natural token boundaries, otherwise excluding generation of loop-type expressions and subprograms for the respective input-output examples; grouping the sets of subprograms into partitions, yielding at least one representative subprogram for each partition; determining whether said generating and grouping provide results that satisfy a predefined success test; and repeating said generating and grouping, as governed by a second program-generating constraint, if it is determined that said generating and grouping do not satisfy the predefined success test, the second program generating constraint permitting at least one generation of loop-type expressions or subprograms based on the substrings not aligning with natural token boundaries thereby imposing greater processing complexity and allowing for a larger set of subprograms compared to the first program generating constraint. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable storage memory comprising computer readable instructions stored thereon that, responsive to execution by one or more processing devices, are configured to:
-
receive input-output examples, the input-output examples providing input strings and corresponding output strings; identify tokens that are used in the input strings of the input-output examples, to provide identified tokens; identify a first program-generating constraint; determine whether substrings align with natural token boundaries; generate, based on the first program generating constraint, first sets of subprograms for the respective input-output examples, each subprogram configured to transform at least one input string associated with a particular input-output example into an output string associated with the particular input-output example; determine whether the generating provides results that satisfy a predefined success test; if it is determined that the generating does not satisfy the predefined success test, generate, based on a second program generating constraint, second sets of programs for the respective input-output examples, the first program-generating constraint excluding generation of loop-type expressions and excluding generation of subprograms based on a consideration of substrings that do not align with natural token boundaries, and the second program-generating constraint permitting at least one of generation of loop-type expressions or generation of subprograms based on a consideration of substrings that do not align with natural token boundaries. - View Dependent Claims (12)
-
-
13. A system comprising:
-
one or more processors; and one or more memories comprising instructions stored thereon that, responsive to execution by the one or more processors, perform operations comprising; receiving and storing input-output examples, the input-output examples providing input strings and corresponding output strings; identifying tokens that are used in input strings of the input-output examples, to provide identified tokens; identifying a first program-generating constraint; determining whether substrings align with natural token boundaries; generating, based on the first program generating constraint, sets of subprograms for the respective input-output examples, the first program-generating constraint excluding generation of loop-type expressions and excluding generation of subprograms based on a consideration of substrings that do not align with natural token boundaries; determining whether said generating provide results that satisfy a predefined success test; and repeating said generating, as governed by a second program-generating constraint, if it is determined that said generating does not satisfy the predefined success test, the second program-generating constraint permitting at least one of generation of loop-type expressions or generation of subprograms based on a consideration of substrings that do not align with natural token boundaries.
-
Specification