Data Cleaning Challenge

5
Add your review
Product is rated as #2 in category Python Data Science
9.4

$9.95

Roninis10
0

Description

DATA CLEANING CHALLENGE WITH PANDAS

 

INSTRUCTOR
Anastasia Migunova is a data scientist and she is currently working in a big 4 firm. Based in Germany, she holds a Ph.D. in Applied Math and M.A. in Computer Science.

PROJECT DESCRIPTION
You are given two files with sales information of an industrial company that produces, distributes and sells  electronics worldwide.
The goal of the project is to clean sales data stored in the two datasets, merge them and get a single aggregated dataframe in long and wide format with the 2020 revenue broken down by product, branch, sector and  quarter. The final dataframe must be ready to be loaded in well-known visualization tools like Qlik or Tableu so that senior managers can quickly check company data and create reports and charts.
In addition to the  two files with retail (129000 rows)  and wholesale (9400 rows) data,  you are provided with four extra files to complete some of the exercises required to create the final dataset. Your task consists of cleaning both tables, combine them with the  other files and calculate the revenue of 2020 according to several conditions.
Cleaning involves dealing with null values, unwanted characters, duplicates, column types, string manipulation, etc. Moreover, you will have to create new columns, change date formats, standardize and replace values through dictionaries, apply list comprehension and many more with the aim of creating a final table with a required data model.

Anastasia has divided the project in more than 30 assignments so you can complete the challenge step by step.
It is a great project to learn Python and practice the Pandas library and to get confident with data manipulation. You will practice the majority of tasks data scientists and data analysts from a wide range of industries must perform to prepare and exploit data.

DIGITAL DOWNLOAD / CONTENT
You will receive an email with a ZIP file. In addition, the download is always available on your account.
Content
1) One PDF with the instructions and guidelines, including the project broken down into 34 exercises that you may follow in case you need guidance .
2) 6 files with data. 4 spreadsheets and 2 .csv
3) A  Notebook file with the solutions. It contains not only the source code but also detailed explanations and comments about how the code works. It is a Python tutorial about data cleaning. The code  has been written by a senior developer so it is clean and easy to understand.
IMPORTANT: to see the solutions (Notebook) you need to have jupyter or ANACONDA package installed on your machine. If you do not have it, you may download it here. It is free.

WHAT YOU WILL PRACTICE
– Python Libraries: Pandas, Numpy, datetime.
– Import and read .csv and Excels files.
Remove, select, rename, filter columns and rows.
– Nulls.
– Data types.
– Conditional slicing.
– Groupby.
– Duplicates.
– Convert to long and wide format.
– Dates.
– Python Data Science
– Merge and joins.
– Loops (for).
– Lists and dictionaries.
– apply + lambda
– List comprehension
– Melt.

VERSION
Python 3.8

CONTACT
If you need additional information, do not hesitate to contact us.

  •  
9.4Expert Score
TweetShareShareDATA CLEANING CHALLENGE WITH PANDAS   INSTRUCTOR Anastasia Migunova is a data scientist and she is currently working in a big 4 firm. Based in Germany, she holds a Ph.D. in Applied Math and M.A. in Computer Science. PROJECT DESCRIPTION You are given two files with sales information of an industrial company that produces, distributes […]
Roninis
10
0

Additional information

Specification: Data Cleaning Challenge

Time Estimate

2 Days

Reviews (1)

1 review for Data Cleaning Challenge

0.0 out of 5
0
0
0
0
0
Write a review
Show all Most Helpful Highest Rating Lowest Rating
  1. Roninis

    Pandas is the most important Python Library if you want to jump in the data analysis domain. The workload is fair and it covers all the necessary topics of Pandas like Data Wrangling, Aggregation, merges, strings, etc.
    The solutions to the exercises come in a Jupyter Notebook and they are concise, well structured and properly explained.

    Helpful(0) Unhelpful(0)You have already voted this

    Only logged in customers who have purchased this product may leave a review.

    Practity
    Register New Account
    Shopping cart