Skill extraction system
First Claim
Patent Images
1. A computer-implemented method comprising:
- causing one or more computer processors to execute instructions, the instructions causing the one or more computer processors to perform operations of;
extracting a plurality of skill seed phrases from a skills section of a plurality of member profiles of a social networking service by at least;
tokenizing data in the skills section of each of the plurality of member profiles into a plurality of tokens; and
selecting, as the plurality of skill seed phrases, tokens from the plurality of tokens that have a frequency of occurrence that is above a predetermined threshold frequency;
disambiguating the plurality of skill seed phrases to create a plurality of disambiguated skill seed phrases by at least clustering the plurality of skill seed phrases based on a count of a number of times both skills of respective pairs of the plurality of skill seed phrases are present in a same member profile of the plurality of member profiles; and
de-duplicating the plurality of disambiguated skill seed phrases to create a plurality of de-duplicated skill seed phrases, the de-duplicated skill seed phrases identifying a plurality of skills.
2 Assignments
0 Petitions
Accused Products
Abstract
In an example, disclosed is a machine automated method of identifying a set of skills. In some examples, the method includes extracting a plurality of skill seed phrases from a plurality of member profiles of a social networking site, creating a plurality of disambiguated skill seed phrases by disambiguating the plurality of skill seed phrases using one or more computer processors, and de-duplicating the plurality of disambiguated skill seed phrases to create a plurality of de-duplicated skill seed phrases.
-
Citations
24 Claims
-
1. A computer-implemented method comprising:
-
causing one or more computer processors to execute instructions, the instructions causing the one or more computer processors to perform operations of; extracting a plurality of skill seed phrases from a skills section of a plurality of member profiles of a social networking service by at least; tokenizing data in the skills section of each of the plurality of member profiles into a plurality of tokens; and selecting, as the plurality of skill seed phrases, tokens from the plurality of tokens that have a frequency of occurrence that is above a predetermined threshold frequency; disambiguating the plurality of skill seed phrases to create a plurality of disambiguated skill seed phrases by at least clustering the plurality of skill seed phrases based on a count of a number of times both skills of respective pairs of the plurality of skill seed phrases are present in a same member profile of the plurality of member profiles; and de-duplicating the plurality of disambiguated skill seed phrases to create a plurality of de-duplicated skill seed phrases, the de-duplicated skill seed phrases identifying a plurality of skills. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a computer processor; a non-transitory machine readable medium comprising instructions, which when executed by the computer processor, causes the computer processor to perform operations of; extracting a plurality of skill seed phrases from a skills section of a plurality of member profiles of a social networking service by at least; tokenizing data in the skills section of each of the plurality of member profiles into a plurality of tokens; and selecting tokens from the plurality of tokens with a frequency of occurrence that is above a predetermined threshold frequency as the plurality of skill seed phrases; disambiguating the plurality of skill seed phrases to create a plurality of disambiguated skill seed phrases by at least clustering the plurality of skill seed phrases based on a count of a number of times both skills of respective pairs of the plurality of skill seed phrases are present in a same member profile of the plurality of member profiles; and de-duplicating the plurality of disambiguated skill seed phrases to create a plurality of de-duplicated skill seed phrases, the de-duplicated skill seed phrases identifying a plurality of skills. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory machine-readable medium that stores instructions which when performed by a machine, cause the machine to perform operations comprising:
extracting a plurality of skill seed phrases from a skills section of a plurality of member profiles of a social networking service by at least; tokenizing data in the skills section of each of the plurality of member profiles into a plurality of tokens; and selecting, as the plurality of skill seed phrases, tokens from the plurality of tokens that have a frequency of occurrence that is above a predetermined threshold frequency; disambiguating the plurality of skill seed phrases to create a plurality of disambiguated skill seed phrases by at least clustering the plurality of skill seed phrases based on a count of a number of times both skills of respective pairs of the plurality of skill seed phrases are present in a same member profile of the plurality of member profiles; and de-duplicating the plurality of disambiguated skill seed phrases to create a plurality of de-duplicated skill seed phrases;
the de-duplicated skill seed phrases identifying a plurality of skills.- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
Specification