Data Cleaning Challenge
DATA CLEANING CHALLENGE WITH PANDAS
Python Projects for Data Analysis
Tailored specifically for aspiring data scientists, data analysts, and any other data professionals, this data cleaning challenge is designed to provide you with hands-on experience in tackling the most common data cleaning tasks faced by professionals in the field. With a collection of six carefully curated files in various formats, including Excel and CSV, you will have the opportunity to practice and refine your data wrangling skills in a dynamic and realistic setting.
You are given two files with sales information of an industrial company that produces, distributes and sells electronics worldwide.
The goal of the project is to clean sales data stored in the two datasets, merge them and get a single aggregated dataframe in long and wide format with the 2020 revenue broken down by product, branch, sector and quarter. The final dataframe must be ready to be loaded in well-known visualization tools like Qlik or Tableu so that senior managers can quickly check company data and create reports and charts.
In addition to the two files with retail (129000 rows) and wholesale (9400 rows) data, you receive four extra files to complete some of the tasks required to create the final dataset. Your task consists of cleaning both tables, combine them with the other files and calculate the revenue of 2020 according to several conditions.
Cleaning data with Pandas involves dealing with null values, unwanted characters, duplicates, column types, string manipulation, etc. Moreover, you will have to create new columns, change date formats, standardize and replace values through dictionaries, apply list comprehension and many more so you can create the final table.
Anastasia has divided the project in more than 30 data cleaning exercises so you can complete the challenge step by step. It is a great project to practice the Pandas library and to get confident with data manipulation. You will work on the majority of tasks data scientists and data analysts from a wide range of industries must perform to prepare and exploit data.
Anastasia Migunova is a data scientist and she is currently working in a big 4 firm. Based in Germany, she holds a Ph.D. in Applied Math and M.A. in Computer Science.
WHAT CAN YOU EXPECT FROM THIS PANDAS PROJECT
- Remove, select, rename, and filter columns and rows: Gain mastery over data manipulation techniques by learning how to extract and transform the exact information you need.
- Handle null values: Discover effective strategies to deal with missing data and ensure the integrity of your analyses.
- Manage data types: Learn how to convert, validate, and manipulate data types to ensure consistency and accuracy.
- Perform conditional slicing: Harness the power of conditional statements to extract valuable insights from your datasets based on specific criteria.
- Utilize groupby: Unleash the power of groupby operations to aggregate and analyze data based on categorical variables.
- Tackle duplicates: Acquire the skills to identify and handle duplicate records, ensuring the reliability and quality of your data.
- Convert to long and wide format: Learn how to reshape your data to suit different analytical needs, whether it be long or wide format.
- Master handling dates: Discover techniques to handle dates and time series data effectively, enabling in-depth temporal analysis.
- Perform merges and joins: Learn how to combine multiple datasets to enrich your analyses and unlock hidden relationships.
- Harness the power of loops, lists, dictionaries, and list comprehension: Strengthen your Python skills by mastering these fundamental programming concepts.
- Unleash the power of apply and lambda functions: Explore advanced techniques to transform and manipulate data using these powerful tools.
DIGITAL DOWNLOAD / CONTENT
After purchase, you will receive an email with the Python challenge in a ZIP file. The download is always available on your Practity account. The zip includes:
- One PDF with the instructions and guidelines, including the Pandas challenge broken down into 34 exercises.
- 6 files with data. 4 spreadsheets and 2 “.csv”.
- A Jupyter Notebook file with the solutions. It contains not only the source code but also detailed explanations and comments about how the code works. The code has been written by a senior developer so it is reliable, clean and easy to understand.
IMPORTANT: to see the solutions (Notebook) you need to have jupyter or ANACONDA package installed on your machine. If you do not have it, you may download it here. It is free.
If you need additional information, do not hesitate to contact us.
Specification: Data Cleaning Challenge
1 review for Data Cleaning Challenge
Only logged in customers who have purchased this product may leave a review.