Global Employment Dynamics
Project Overview
- Duration: January 2019 - May 2019, August 2019 - December 2019
- My role: Research assistant working alongside Prof. Anastassia Fedyk and Dr. James Hodson
- Tools & Frameworks: Python, pandas, scikit-learn
In the Spring and Fall of 2019, I applied through UC Berkeley's Undergraduate Research Apprentice Program (URAP) and was given the opportunity to work with Professor Fedyk under the Haas School of Business. My project consisted of leveraging machine learning techniques to better understand the possibility of predicting a firm's performance from its talent pool.
Motivation
This project is motivated by questions ranging from predicting individual career outcomes to understanding a firm's performance and has additional implications on job automation and the future of work. By analyzing employment shocks such as the collapse of Lehman Brothers, this project attempts to understand if certain hiring patterns based on the information contained in the resumes of a firm's employees such as skills and work experience can lead to possible predictions in a firm's performance. Additionally, this project also attempts to figure out if certain skill-sets are becoming more obsolete in the modern economy by observing large amounts of resume data.
Skills Classifier
Given the wide variety of possible wordings for skills on a resume, my spring project revolved around building a skills classifier that could classify employees into primary and secondary skill-sets from the skills listed on their resume. This was largely done to help simplify the problem so that skills such as "software development" and "software engineering" were all classified to the same bucket and later results were not largely skewed by the wording. This process was done through a Latent Dirichlet Allocation model that takes in a basket of skills from each employee and the number of potential skill-sets to try and fit the entire population. It then outputs a matrix consisting of the words that best define each skill-set along with the top skills in each employee's basket of skills that helped the model predict the skill-set the employee was classified into. After adjusting the potential skill-sets parameter to see which number of groupings provided the most interpretable results, I was able to classify employees into one of forty-four possible skill-sets.
Skills Evolution
My fall project revolved around skill prediction, which meant inferring a person's future set of skills given his or her current set of skills. This was a much more difficult project and involved experimenting with various language models and hidden markov models to create an accurate skill predictor. The models were trained on given time-series data that could be used to model the evolution of skills, such as skill acquisition and skill deletion, over a person's career. At each data point in time, I was able to see the newly updated basket of skills that an employee had. Since each skill change could have come from any of the previous skills, I decided to create a time-series where each time-stamp was a node containing a bag of skills and then I added directed edges from all of the previous skills in the previous time-stamp to the skill that was changed or updated. Starting at the source nodes, for each employee, I then randomly sampled these skill "sentences," following a random directed edge at each node, to generate data for the skill evolution over time. With data in the form of sentences, I used a language model, known as KenLM, to simplify this problem into predicting the next word in the sentence as the next skill to learn. This ended up being my preliminary prediction model that I am still refining.