Systems and methods to develop training set of data based on resume corpus

US 10,748,118 B2
Filed: 04/05/2016
Issued: 08/18/2020
Est. Priority Date: 04/05/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

acquiring, by a computing system, a resume corpus;

processing, by the computing system, the resume corpus to generate resume tokens from the resume corpus, wherein the processing comprises;

determining a ratio based on co-occurrence of a first word and a second word of the resume corpus versus individual occurrence of the first word and the second word; and

determining, based on the ratio, the existence of a bigram including the first word and the second word to be used as training data;

training, by the computing system, a machine learning model to recommend a job classification based at least in part on the bigram; and

applying, by the computing system, the machine learning model to recommend a job classification based on evaluation data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and non-transitory computer readable media are configured to acquire a resume corpus. The resume corpus is processed to generate resume tokens. A machine learning model is trained based on the resume tokens. The machine learning model is applied to recommend a job classification based on evaluation data.

19 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- acquiring, by a computing system, a resume corpus;
  
  processing, by the computing system, the resume corpus to generate resume tokens from the resume corpus, wherein the processing comprises;
  
  determining a ratio based on co-occurrence of a first word and a second word of the resume corpus versus individual occurrence of the first word and the second word; and
  
  determining, based on the ratio, the existence of a bigram including the first word and the second word to be used as training data;
  
  training, by the computing system, a machine learning model to recommend a job classification based at least in part on the bigram; and
  
  applying, by the computing system, the machine learning model to recommend a job classification based on evaluation data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, wherein the resume corpus is based on textual data from a plurality of resumes.
  - 3. The computer-implemented method of claim 1, wherein the resume tokens include one or more unigrams and one or more bigrams.
  - 4. The computer-implemented method of claim 1, wherein the processing the resume corpus comprises:
    - removing stop words from the resume corpus.
  - 5. The computer-implemented method of claim 4, wherein the stop words include at least one of pronouns, prepositions, articles, and conjunctions.
  - 6. The computer-implemented method of claim 1, wherein the processing the resume corpus comprises:
    - modifying capitalized letters of words in the resume corpus to have lowercase letters.
  - 7. The computer-implemented method of claim 1, wherein the processing the resume corpus comprises:
    - generating a whitelist of bigrams constituting job titles parsed from the resume corpus,wherein the training a machine learning model to recommend a job classification is further based at least in part on the whitelist of bigrams constituting job titles.
  - 8. The computer-implemented method of claim 7, wherein the processing the resume corpus further comprises:
    - including a bigram in the whitelist based on satisfaction of a threshold appearance value relating to the bigram.
  - 9. The computer-implemented method of claim 1, wherein the job classification includes a job title or a job pipeline.
  - 10. The computer-implemented method of claim 1, wherein the processing further comprises:
    - comparing the ratio to a threshold value; and
      
      determining the existence of a bigram including the first word and the second word to be used as training data when the ratio satisfies the threshold value.

11. A system comprising:
- at least one processor; and
  
  a memory storing instructions that, when executed by the at least one processor, cause the system to perform;
  
  acquiring a resume corpus;
  
  processing the resume corpus to generate resume tokens from the resume corpus, wherein the processing comprises;
  
  determining a ratio based on co-occurrence of a first word and a second word of the resume corpus versus individual occurrence of the first word and the second word; and
  
  determining, based on the ratio, the existence of a bigram including the first word and the second word to be used as training data;
  
  training a machine learning model to recommend a job classification based at least in part on the bigram; and
  
  applying the machine learning model to recommend a job classification based on evaluation data.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, wherein the resume corpus is based on textual data from a plurality of resumes.
  - 13. The system of claim 11, wherein the resume tokens include one or more unigrams and one or more bigrams.
  - 14. The system of claim 11, wherein the processing the resume corpus comprises:
    - removing stop words from the resume corpus.
  - 15. The system of claim 14, wherein the stop words include at least one of pronouns, prepositions, articles, and conjunctions.

16. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing system, cause the computing system to perform a method comprising:
- acquiring a resume corpus;
  
  processing the resume corpus to generate resume tokens from the resume corpus, wherein the processing comprises;
  
  determining a ratio based on co-occurrence of a first word and a second word of the resume corpus versus individual occurrence of the first word and the second word; and
  
  determining, based on the ratio, the existence of a bigram including the first word and the second word to be used as training data;
  
  training a machine learning model to recommend a job classification based at least in part on the bigram; and
  
  applying the machine learning model to recommend a job classification based on evaluation data.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The non-transitory computer-readable storage medium of claim 16, wherein the resume corpus is based on textual data from a plurality of resumes.
  - 18. The non-transitory computer-readable storage medium of claim 16, wherein the resume tokens include one or more unigrams and one or more bigrams.
  - 19. The non-transitory computer-readable storage medium of claim 16, wherein the processing the resume corpus comprises:
    - removing stop words from the resume corpus.
  - 20. The non-transitory computer-readable storage medium of claim 19, wherein the stop words include at least one of pronouns, prepositions, articles, and conjunctions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Fang, Miaoqing
Primary Examiner(s)
Ouellette, Jonathan P

Application Number

US15/091,077
Publication Number

US 20170286914A1
Time in Patent Office

1,596 Days
Field of Search

705 11-912, 705320, 705321
US Class Current
CPC Class Codes

G06N 20/00   Machine learning

G06N 5/04   Inference or reasoning models

G06Q 10/105   Human resources

G06Q 10/1053   Employment or hiring

Systems and methods to develop training set of data based on resume corpus

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods to develop training set of data based on resume corpus

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links