Query-by-example in large-scale code repositories
First Claim
1. A system configured to perform query-by-example, the system comprising a processor and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions for:
- maintaining, by a query module executing on the system, a source code repository containing a plurality of source code files, wherein each of the plurality of source code files is associated with a corresponding source syntax structure generated based on said each of the plurality of source code files and representative of a syntactic structure of said each of the plurality of source code files;
receiving, by the query module, a query snippet;
generating, by the query module, a query syntax structure based on the query snippet, wherein the query syntax structure represents a syntactic structure of the query snippet; and
identifying, by the query module, a first source code file from the plurality of source code files for being relevant to the query snippet by;
extracting a source sub-structure from a first syntax structure associated with the first source code file for matching with a query sub-structure extracted from the query syntax structure,identifying a matching pattern contained in the source sub-structure and in the query sub-structure,calculating a matching ratio based on the matching pattern, the source sub-structure'"'"'s size, and the query sub-structure'"'"'s size, andassigning the matching ratio as a first similarity score upon a second determination that the matching ratio is above a predetermined matching threshold, andidentifying the first source code file upon a first determination that the first similarity score is above a predetermined similarity threshold, wherein the being relevant to the query snippet is determined by a first relevance score which is calculated based on the query syntax structure and the first source code file'"'"'s corresponding source syntax structure.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for performing query-by-example are described. A query module executing on the system may maintain a source code repository containing a plurality of source code files. Each of the plurality of source code files is associated with a corresponding source syntax structure generated based on said each of the plurality of source code files. The query module may receive a query snippet, and generate a query syntax structure based on the query snippet. The query module may then identify a first source code file from the plurality of source code files for being relevant to the query snippet. The being relevant to the query snippet is determined by a first relevance score which is calculated based on the query syntax structure and the first source code file'"'"'s corresponding source syntax structure.
-
Citations
20 Claims
-
1. A system configured to perform query-by-example, the system comprising a processor and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions for:
-
maintaining, by a query module executing on the system, a source code repository containing a plurality of source code files, wherein each of the plurality of source code files is associated with a corresponding source syntax structure generated based on said each of the plurality of source code files and representative of a syntactic structure of said each of the plurality of source code files; receiving, by the query module, a query snippet; generating, by the query module, a query syntax structure based on the query snippet, wherein the query syntax structure represents a syntactic structure of the query snippet; and identifying, by the query module, a first source code file from the plurality of source code files for being relevant to the query snippet by; extracting a source sub-structure from a first syntax structure associated with the first source code file for matching with a query sub-structure extracted from the query syntax structure, identifying a matching pattern contained in the source sub-structure and in the query sub-structure, calculating a matching ratio based on the matching pattern, the source sub-structure'"'"'s size, and the query sub-structure'"'"'s size, and assigning the matching ratio as a first similarity score upon a second determination that the matching ratio is above a predetermined matching threshold, and identifying the first source code file upon a first determination that the first similarity score is above a predetermined similarity threshold, wherein the being relevant to the query snippet is determined by a first relevance score which is calculated based on the query syntax structure and the first source code file'"'"'s corresponding source syntax structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for performing query-by-example, the method being performed in a system comprising a processor and a memory coupled with the processor, the method comprising:
-
maintaining, by a query module executing on the system, a source code repository containing a plurality of source code files, wherein each of the plurality of source code files is associated with a corresponding source syntax structure generated based on said each of the plurality of source code files and representative of a syntactic structure of said each of the plurality of source code files; receiving, by the query module, a query snippet; generating, by the query module, a query syntax structure based on the query snippet, wherein the query syntax structure represents a syntactic structure of the query snippet; and identifying, by the query module, a first source code file from the plurality of source code files for being relevant to the query snippet by; extracting a source sub-structure from a first syntax structure associated with the first source code file for matching with a query sub-structure extracted from the query syntax structure, identifying a matching pattern contained in the source sub-structure and in the query sub-structure, calculating a matching ratio based on the matching pattern, the source sub-structure'"'"'s size, and the query sub-structure'"'"'s size, and assigning the matching ratio as a first similarity score upon a second determination that the matching ratio is above a predetermined matching threshold, and identifying the first source code file upon a first determination that the first similarity score is above a predetermined similarity threshold, wherein the being relevant to the query snippet is determined by a first relevance score which is calculated based on the query syntax structure and the first source code file'"'"'s corresponding source syntax structure. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium that includes a set of instructions which, in response to execution by a processor, cause the processor to perform a method for performing query-by-example, the method being performed in a system comprising a processor and a memory coupled with the processor, the method comprising:
-
maintaining, by a query module executing on the system, a source code repository containing a plurality of source code files, wherein each of the plurality of source code files is associated with a corresponding source syntax structure generated based on said each of the plurality of source code files and representative of a syntactic structure of said each of the plurality of source code files; receiving, by the query module, a query snippet; generating, by the query module, a query syntax structure based on the query snippet, wherein the query syntax structure represents a syntactic structure of the query snippet; and identifying, by the query module, a first source code file from the plurality of source code files for being relevant to the query snippet by; extracting a source sub-structure from a first syntax structure associated with the first source code file for matching with a query sub-structure extracted from the query syntax structure, identifying a matching pattern contained in the source sub-structure and in the query sub-structure, calculating a matching ratio based on the matching pattern, the source sub-structure'"'"'s size, and the query sub-structure'"'"'s size, and assigning the matching ratio as a first similarity score upon a second determination that the matching ratio is above a predetermined matching threshold, and identifying the first source code file upon a first determination that the first similarity score is above a predetermined similarity threshold, wherein the being relevant to the query snippet is determined by a first relevance score which is calculated based on the query syntax structure and the first source code file'"'"'s corresponding source syntax structure. - View Dependent Claims (18, 20)
-
-
19. The non-transitory computer-readable storage medium 17, wherein the source characteristic vector is a 1-level characteristic vector, and generating the source characteristic vector comprises:
-
recursively generating a set of child characteristic vectors for child nodes in the source sub-structure; and generating the source characteristic vector by accumulating the set of child characteristic vectors.
-
Specification