Data Cleansing Challenge

Add your review





Anastasia Migunova is a data scientist and she is currently working in a big 4 firm. Based in Germany, she holds a Ph.D. in Applied Math and M.A. in Computer Science.

You are given two files with sales information of an industrial company that produces, distributes and sells  electronics worldwide.
The goal of the project is to clean sales data stored in the two data sets, merge them and get a single aggregated data frame in long and wide format with the 2020 revenue broken down by product, branch, sector and  quarter. The final data frame must be ready to be loaded in well-known visualization tools like Qlik or Tableu so that senior managers can quickly check company data and create reports and charts.
In addition to the  two files with retail (129000 rows)  and wholesale (9400 rows) data,  you are provided with four extra files to complete some of the exercises required to create the final data set. Your task consists of cleaning both tables, combine them with the  other files and calculate the revenue of 2020 according to several conditions.
Cleaning involves dealing with null values, unwanted characters, duplicates, column types, string manipulation, etc. Moreover, you will have to create new columns, change date formats, standardize and replace values through dictionaries, apply list comprehension and many more with the aim of creating a final table with a required data model.

Anastasia has divided the project in more than 30 assignments so you can complete the challenge step by step.
It is a great project to practice the Pandas library and to get confident with data manipulation in Python. You will practice the majority of tasks data scientists and data analysts from a wide range of industries must perform to prepare and exploit data.

You will receive an email with a ZIP file. If you are a  registered user, the download is always available on your account.
The downloadable zip  is made up of:
1) One PDF with the instructions and guidelines, including the project broken down into 34 exercises that you may follow in case you need guidance .
2) 6 files with data. 4 spreadsheets and 2 .csv
3) A  Notebook file with the solutions. It contains not only the source code but also detailed explanations and comments about how the code works. The code  has been written by a senior developer so it is clean and easy to understand.
IMPORTANT: to see the solutions (Notebook) you need to have jupyter or ANACONDA package installed on your machine. If you do not have it, you may download it here. It is free.

– Libraries: Pandas, Numpy, datetime.
– Import and read .csv and Excels files.
Remove, select, rename, filter columns and rows.
– Nulls.
– Data types.
– Conditional slicing.
– Groupby.
– Duplicates.
– Convert to long and wide format.
– Dates.
– Merge and joins.
– Loops (for).
– Lists and dictionaries.
– apply + lambda
– List comprehension
– Melt.

Python 3.8

If you need additional information, do not hesitate to contact us.


Additional information

Specification: Data Cleansing Challenge


Time Estimate

2 Days

Reviews (0)

User Reviews

0.0 out of 5
Write a review

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Register New Account
Reset Password
Shopping cart