Systems and methods for finding project-related information by clustering applications into related concept categories
First Claim
Patent Images
1. A device comprising:
- a processor, at least partially implemented in hardware, to;
generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls,the similarity matrix being generated from a term document matrix using singular value decomposition,the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization,elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, andat least one of the API calls corresponding to one of the categories,the similarity being based on weights for the API calls contained in the plurality of computer applications,a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call;
receive a selection of a first computer application of the plurality of computer applications; and
provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer-readable medium, is described that finds similarities among programming applications based on semantic anchors found within the source code of such applications. The semantic anchors may be API calls, such as Java'"'"'s package and class calls of the JDK. Latent Semantic Indexing may be used to process the application and semantic anchor data and automatically develop a similarity matrix that contains numbers representing the similarity of one program to another.
64 Citations
20 Claims
-
1. A device comprising:
a processor, at least partially implemented in hardware, to; generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receive a selection of a first computer application of the plurality of computer applications; and provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A non-transitory computer-readable medium for storing instructions, the instructions comprising:
a plurality of instructions which, when executed by one or more processors, cause the one or more processors to; generate a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receive a selection of a first computer application of the plurality of computer applications; and provide an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A method comprising:
-
generating, by a device, a similarity matrix defining a similarity between a plurality of computer applications according to a categorization of application programming interface (API) calls, the similarity matrix being generated from a term document matrix using singular value decomposition, the term document matrix including a first dimension of first entries corresponding to the plurality of computer applications and a second dimension of second entries corresponding to categories of the categorization, elements of the term document matrix having values based on a quantity of API calls in a computer application corresponding to a first entry of the first dimension, and in a category, of the categories, corresponding to a second entry of the second dimension, and at least one of the API calls corresponding to one of the categories, the similarity being based on weights for the API calls contained in the plurality of computer applications, a respective weight for a respective API call in a respective computer application being based on a quantity of API calls in the respective computer application and a quantity of computer applications, of the plurality of computer applications, that contain the respective API call; receiving, by the device, a selection of a first computer application of the plurality of computer applications; and providing, by the device, an indication of at least one second computer application, of the plurality of computer applications, using the similarity matrix and based on the selection of the first computer application. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification