PYTHON PANDAS EXERCISES
DATA MANAGEMENT AND DATA CEANING PRACTICE PROJECTS
This is a great compilation of Python Pandas exercises and projects to master data management and data analysis with Python and the popular Pandas library.
This practice contains the following assignments that are also available in the corresponding link.
Anastasia Migunova is a data scientist with wide experience in big consulting firms among others. Currently based in Germany, she holds a Ph.D. in Applied Maths and M.A. in Computer Science.
This practice consists of more than 40 exercises (focused on Pandas and Numpy) that replicate the tasks that data analysts or data scientists must perform to clean and analyze a dataset. You will practice the most used Pandas functions and tools. There are also visualization problems as well as some Python coding.
Once you finish it, you will acquire the basic knowledge about Pandas required for data analysis in the real -world.
You are given two files with sales information of an industrial company that produces, distributes and sells electronics worldwide.
The goal of the project is to clean sales data stored in the two datasets, merge them and get a single aggregated dataframe in long and wide format with the 2020 revenue broken down by product, branch, sector and quarter. The final dataframe must be ready to be loaded in well-known visualization tools like Qlik or Tableu so that senior managers can quickly check company data and create reports and charts.
In addition to the two files with retail (129000 rows) and wholesale (9400 rows) data, you are provided with four extra files to complete some of the exercises required to create the final dataset. Your task consists of cleaning both tables, combine them with the other files and calculate the revenue of 2020 based on several conditions.
Cleaning involves dealing with null values, unwanted characters, duplicates, column types, string manipulation, etc. Moreover, you will have to create new columns, change date formats, standardize and replace values through dictionaries, apply list comprehension and many more with the aim of creating a final table with a specific data model.
Anastasia has divided the project in more than 30 Pandas assignments so you can complete the challenge step by step.
It is a great project to learn Python, practice the Pandas library and get confident with data manipulation. You will practice the majority of tasks data scientists and data analysts from a wide range of industries must perform to prepare and exploit data.
These Pandas exercises are focused on Excel. Their goal is to help spreadsheet users migrate from Excel to Python. The practice comes with 17 assignments with more than 35 exercises to practice how to implement in Pandas and Python the most common functions and formulas of Excel.
Some of the problems are real with real datasets, based on the tasks different types of professionals must perform with the spreadsheet. The exercises replicate real-life problems so you can learn to solve them with Python instead of Excel, especially while dealing with large amounts of data. Some of the exercises are available in our Excel projects section.
Replacing Excel with Python is a hard step, many tasks may seem easier and faster to do in the spreadsheet but in the end it is a manual tool and therefore the risk of errors is very high. Pandas allows you to automatically do the same things with code, and export the result to Excel so the final output is the same except for the fact that the reliability of the result is unbeatable.
The practice is focused on common topics like:
- Consolidate in one data frame tables placed in different Excel worksheets or files.
- Implement functions and use methods to replace Excel formulas or functions, such as SUMIF, VLOOKUP, COUNTIF, INDEX/MATCH, PIVOT TABLES, etc. The problems replicate business cases where this type of functions is a must.
- Clean data stored in several worksheets. Deal with null values, unwanted characters, duplicates, column types, string manipulation and so on.
- Prepare tables for calculations and data validations. Combine, filter and merge files or different datasets, drop unnecessary fields. or apply functions to create new columns based on multiple conditions.
Exercises and Projects included
- Budget & Costs: Given two spreadsheets with the budget and expenses of 20 different departments of a multinational corporation, compare them and find teams with the highest deficit and surplus. Disclose also the most common type of expense among certain teams.
- Create a mortgage calculator. You will have to calculate monthly payments of a fix interest rate loan and build the amortization schedule.
- Exercises to compare and summarize data through breakdown tables.
- Problems about finding, matching and retrieving values.
- Problems where you will have to write code that yields the same output as lookup functions such as vlookup, hlookup, or reference functions like index and match.
DOWNLOAD / CONTENT
You will receive an email with a ZIP file. The download is always available on your Practity account.
The ZIP contains the following files:
- Instructions PDF
1) One PDF with the data wrangling exercises description.
2) One PDF with the data cleaning instructions along with the project broken down into 34 exercises that you may follow in case you need guidance.
3) One PDF with Excel exercises description
1) A link to the datasets you will use in the project. The datasets are public data.
2) 4 spreadsheets and 2 “.csv” files for the data cleaning challenge.
3) 17 Excel files with the data required for the Excel exercises
1) Three Jupyter Notebooks with the solutions for each assignment. It contains not only the source code but also explanations and comments about how the code works.
2) Excel solutions: 15 Excel files with the exercises solved in Excel.
WHAT YOU WILL PRACTICE
– Python Libraries: Pandas, numpy, datetime, matplotlib and seaborn.
– Import/export gzip, csv and Excels.
– Remove, select, rename, filter columns and rows.
– Data types.
– Outliers and duplicates.
– Convert long to wide format.
– Basic Regex.
– Date formats.
– Data engineering.
– One hot encoding.
– Merge and joins.
– Loops (for).
– Visualization: boxplot, bars, countplot, scatterplot, heatmaps, pairwise, etc.
– Import/export Excels files.
– Remove, select, rename, filter columns and rows.
– Conditional slicing.
– apply + lambda