What are the skills required for entry-level data analysis jobs?

By Terenci Claramunt

After years of pursuing data analysis as a hobby, I have finally decided to turn my passion into a career. I'm halfway through my Data Science degree, which has enhanced my mathematical and statistical skills and introduced me to fascinating new areas of data science, such as data warehousing and the AWK programming language. However, while researching the job market I realized that some of the most sought-after skills for data analysts, such as Excel or Business Intelligence tools like Tableau, are not included in my university's curriculum, so I'll have to develop these on my own. Considering the multitude of skills required for data analyst positions, which ones should I prioritize?

Web scraping

To answer the previous question I created a web scraper that extracts data from LinkedIn job postings and stores it in an SQL database. The scraper uses Python's Selenium library to automate the browser, simulating a user looking for job offers. It's structured in three classes: the first one, LinkedIn, handles the basic browsing and user login functionality, the second, Search, gets the resulting number of pages for a given search and extracts the job posting URLs from a specific search results page, and finally, Job, which extracts the job posting's details (title, description, company, etc.). Let me show you a simple example of how they can be used:

from scraper.LinkedIn import LinkedIn
from scraper.Search import Search
from scraper.Job import Job

credentials = {
    'email': 'example@mail.com',
    'password': 'DontHackMePlz'
}

linkedin = LinkedIn(requests_per_minute=20)
linkedin.login(credentials)

query = {
    'keywords': 'data analyst',
    'location': 'European union'
}

search = Search(query)
for page_number in search.page_range():
    search.go_to_page(page_number)
    urls = search.get_urls()
        for url in urls:
            job = Job(url).as_dict()
            print(job)

linkedin.close()

I limited my search to the set of jobs I'm personally interested in: entry-level jobs related to data analysis and business intelligence within the European Union (EU). The scraper collected information such as the job posting title and description, the company name, sector and location, the job title, and the list of skills required by the posting, which is automatically generated by LinkedIn from the job description. After an initial exploration of the data collected by the scraper I defined some additional questions that could be insightful to answer: Which countries and cities have the highest number of job postings? What's the proportion of remote jobs? Which industries are looking for the most data analysts?

Data cleaning

After two months of daily data gathering, I decided to dive into the dataset of LinkedIn job postings. Fortunately, the data was already well-structured and required very few adjustments before analysing it. Since the database included postings for roles unrelated to data analysis, I filtered it by job titles using the following SQL query:

SELECT
  job_posting_title, company, company_sector,
  location, job_title, mode, skills
FROM jobs
WHERE job_title IN (
  'Data Analyst', 'Data Specialist', 'Analyst',
  'Business Analyst', 'Business Intelligence Specialist',
  'Junior Business Analyst', 'Data Collector',
  'Analytics Specialist', 'Business Intelligence Engineer',
  'Digital Analyst', 'Data Consultant',
  'Quality Control Analyst', 'Business Intelligence Analyst',
  'Analytics Analyst', 'Insights Analyst',
  'Operations Analyst', 'Quality Assurance Analyst',
  'Data Associate', 'Sales Analyst',
  'Reporting Specialist', 'Business Intelligence Consultant',
  'Business Intelligence Developer', 'Functional Analyst',
  'Data Assistant', 'Business Data Analyst',
  'Research Analyst', 'Test Analyst',
  'Data Quality Analyst', 'Social Media Analyst',
  'Innovation Analyst', 'Reporting Analyst',
  'Assistant Business Analyst', 'Business Intelligence Expert',
  'Human Resources Analyst', 'Sales Operations Analyst',
  'Quality Analyst', 'Internet Analyst',
  'Associate Business Analyst', 'Performance Analyst',
  'Intelligence Analyst', 'Market Data Analyst',
  'Financial Analyst', 'Junior Financial Analyst',
  'Marketing Analyst', 'Junior Marketing Analyst'
)

After removing duplicates and a few postings that contained errors I ended up with 3569 relevant jobs. There was only one problem left to solve: the job location extracted from LinkedIn postings doesn't have a strict format so it can contain anything from the name of a city, a metropolitan area, or just a country. Luckily, we can feed a location to Google's Geocoding API to get the city and the country it corresponds to. To do this in Python, we need to obtain an API key, and then call the geocode function from the googlemaps package with our location, which returns a JSON response with various components such as locality, country and the names of the administrative regions in between:

import googlemaps

gmaps = googlemaps.Client(key='D6o7N3T2s1t0E7-A6L')

location = "Greater Milan Metropolitan Area"

geocoded = {}
response = gmaps.geocode(location)
for component in response[0]['address_components']:
  if 'locality' in component['types']:
    geocoded['city'] = component['long_name']
  if 'country' in component['types']:
    geocoded['country'] = component['long_name']

print(geocoded)

Which returns:

{'city': 'Milan', 'country': 'Italy'}

Data analysis

Question 1: Which countries and cities in the EU have the highest number of entry-level data analysis job postings?

Let's visualize the dataset using a map to represent two different kinds of data: The purple dots show the top 25 cities with the most job openings, with the size of the dot indicating the proportion of jobs from that city. Additionally, EU countries are shaded from yellow (low job density) to red (high job density), according to the number of jobs for each country in the dataset per 1 million people:

As we can observe, north-western Europe has the highest concentration of entry-level data analysis job openings, since 4 of the top 5 countries with the most jobs relative to their population are in this area: Luxembourg, with 20.14 jobs per 1 million inhabitants, Ireland, with 19.57, the Netherlands, with 13.70, and Belgium, with 9.37; additionally, the city with the most job openings, Paris, is also in this area. The fifth top country is Malta, that despite having a relatively small population of 0.5 million people, has the same amount of job openings in the dataset as larger countries such as Estonia.

It's also interesting to note that jobs are mostly concentrated in large metropolitan areas. We can observe this in countries such as France, whose capital has the highest number of jobs in the dataset despite having a low density of job openings relative to its population. In fact, the list of the largest EU cities by population correlates closely with the list of top cities in our dataset. Only a few of the top 25 cities have much more data analyst jobs openings than expected by their population sizes: Dublin, with 3.42% of job postings, Milan, with 3.37%, Amsterdam, with 3.15%, and Vilnius with 0.65%.

Question 2: What's the proportion of remote entry-level data analysis jobs in the EU?

53% of job postings are for on-site jobs, 37% for hybrid jobs, and only 10% for remote jobs, as we can visualize in the following doughnut chart:

Question 3: Which industries are looking for the most entry-level data analysts in the EU?

Let's visualize the percentage of job openings for the top sectors in the dataset:

During the spring of 2023, the Information Technology & Services sector was leading the demand for entry-level data analysts in the EU, with 23.4% of total job postings. These companies provide a variety of technology-related services, including IT consulting and support, software development, cloud services, and data management. The companies Appen and TELUS stand out as the leading recruiters in this sector.

The Staffing & Recruiting industry comes second, accounting for 13.45% of job postings, which shows the importance of data in hiring processes. These companies specialize in sourcing, recruiting, placing workers into positions across a wide range of industries, and also offer services such as HR consulting and employee training. The top posters in this sector are Michael Page, Page Personnel and Adecco.

Next is the Financial Services sector, with 6.02% of job postings. These firms’ main activity is money management and it includes investment banks, credit card companies, stock brokerages, financial information companies, investment funds, etc. Some notable examples from the dataset are BNP Paribas CIB and Société Générale CIB.

The following sectors are Computer Software with 5.77% of job postings, whose largest poster is the company join.com; Human Resources with 4.79% and Free-Work as the top poster; Management Consulting with 4.37% and Deloitte as the top poster; Banking with 3.73% and BNP Paribas as the top poster; Retail with 2.89% and Carrefour as the top poster. Note the diversity of industries in the dataset, which shows how ubiquitous data is in contemporary businesses across all domains.

Question 4: What are the skills required for entry-level data analysis jobs in the EU?

We can visualize the 50 most frequent skills using a word cloud, where the size of each term corresponds to its relative importance, measured by the proportion of job postings that require the skill:

Word Cloud

The word cloud clearly shows the importance non-technical skills, such as critical thinking, problem-solving, and communication. I classified the 50 skills into 5 categories, each with their own subgroups; each skill is followed by the percentage of job listings it appears on:

1 Analytical skills:

Analytical skills are essential for any data analyst, since data analysis is not just about transforming and visualising data, but turning it into valuable insights that can be used by business stakeholders to guide data-driven decision-making. It's clear that employers value highly the ability to think critically and solve complex problems. This group includes skills such as:

2 Data skills:

Data skills form the technical foundation of a data analyst's job, encompassing a wide range of abilities necessary to handle and interpret data effectively. In this group, Visualization and Databases are the most sought-after skills, followed Statistics and Data Modelling:

3 Soft skills:

A data analyst needs to collaborate with a diverse range of individuals within an organization, consequently, excellent communication skills top the list of most sought-after soft skills. English is also important, given that a significant portion of the companies in the dataset operate on a global scale, and teams are composed of people from diverse nationalities.

4 Software skills:

As we have seen previously, visualization and databases are some of the most crucial technical skills; in the case of visualization Excel and Power BI stand out amongst similar tools, and in the case of databases SQL is the most sought-after skill in the dataset.

5 Domain knowledge:

Apart from soft and technical skills, companies also require knowledge of the field the analyst will be working on. These are the 4 most common in the database:

Conclusion

This project has been an enriching journey, both as a technical exercise and an invaluable insight into the dynamic landscape of the data analytics job market. This has been my second venture into web scraping; learning how to use Python's Selenium library to extract data from LinkedIn has been a rewarding challenge, and I will continue to work on my web scraping skills. Additionally, I learned how to use Google's geocoding API to convert a location name into structured data, which surely will prove a useful skill for future projects.

I conceived this project as a simple addition my data analysis portfolio, but it has proven to be a valuable aid in deciding what skills I should work on in my journey towards becoming a data analyst. Among the many technical and soft skills highlighted by my analysis, I've chosen to focus my learning efforts on Excel and Power BI, since these tools appear in numerous job postings but are not part of my university’s curriculum.

Additional resources

View the web scraper's code on GithHub

View a sample of the cleaned data on Kaggle

Contact

For any additional questions or business inquiries contact me through LinkedIn or the following form: