How to Add New Columns to Dataframe

How to Add New Columns to a Pandas DataFrame

 

In data analysis and manipulation, it is often necessary to add new columns to a Pandas DataFrame. It helps expand the scope of the analysis, find patterns, and perform data transformations. In this comprehensive guide, we will explore different methods to add new columns to an existing DataFrame in Pandas.

Method 1: Adding an Empty Column or a Column with a Constant Value

The first method is for adding a new column with either an empty value or a constant value for all rows in the DataFrame. This can be useful when we want to initialize a column before populating it with actual values.
To add an empty column, we can simply assign an empty list or array to the desired column name. For example:

## Option 1
df['new_column'] = []

## Option 2
df['new_column'] = ''

Method 2: New Column Derived from an Existing One

Another common scenario is adding a new column that is derived from an existing field in the DataFrame. This can involve performing calculations, applying functions, or mapping values from one column to another.
To create a new column using calculations or functions, we can use brackets to apply the desired operation to the existing column. For example, given the following dataset about sales, let’s see how to add two extra fields:

## Our dataset is sales
print(sales)

IDproductcategorypricecurrency
A1TVelectronics500.0USD
A2phoneelectronics175.0EUR
A3laptopelectronics450.0USD
A4screenelectronicsNaNNaN
# Add an new column with the final price including a tax (VAT) of 20% sales['final_price'] = sales['price'] * 1.2 ## Output
IDproductcategorypricecurrencyfinal_price
A1TVelectronics500.0USD600
A2phoneelectronics175.0EUR210
A3laptopelectronics450.0USD540
A4screenelectronicsNaNNaNNaN

If we want to map values from one column to another based on a dictionary or a list, we can use the “map()" function. For instance, we can add the origin country of the product through a dictionary:

## We define the dictionary
origin = {'TV':'Germany','phone':'China','laptop':'India','screen':'Taiwan'}

# We apply the map funtion() to fill in the new column. It's based on the existing column, "product".

## For values not included in the dictionary, we apply the "fillna()" function.
sales['origin_country'] = sales['product'].map(origin).fillna('unknown')
## Output

IDproductcategorypricefinal_priceorigin_country
A1TVelectronics500.0600Germany
A2phoneelectronics175.0210China
A3laptopelectronics450.0540Taiwan
A4screenelectronicsNaNNaNunknown

Method 3: Adding a New Column with iloc[]

The "iloc[]" function allows to access DataFrame elements by their index. We use this function to add a new column at a specific position in the DataFrame.

## We create a new column with the currency at the end of the table

sales.loc[:,'currency'] = 'USD'
## Output

IDproductcategorypricefinal_priceorigin_countrycurrency
A1TVelectronics500.0600GermanyUSD
A2phoneelectronics175.0210ChinaUSD
A3laptopelectronics450.0540TaiwanUSD
A4screenelectronicsNaNNaNunknownUSD

Method 4: New columns with insert()

To add a new column at a specific position, we can use the insert() function:

## We add the VAT percentaje in the third position
sales.insert(2, 'vat', '0.2')
## Output

IDproductvatcategorypricecurrencyfinal_priceorigin_countrycurrency
A1TV0.2electronics500.0USD600GermanyUSD
A2phone0.2electronics175.0EUR210ChinaUSD
A3laptop0.2electronics450.0USD540TaiwanUSD
A4screen0.2electronicsNaNNaNNaNunknownUSD

In the above code, 2 represents the position where the new column will be inserted (bear in mind it starts at 0).
“vat” is the name of the new column and “0.2” is the value to input.

Method 5: New Column Based on Conditions with np.where()

Sometimes, we need to add a new column based on a condition. The Numpy "np.where()" function can create a new column based on a given condition.

import numpy as np
# We create a new filed based on the "final_price" field
sales['discount'] = np.where(sales['final_price'] > 250, 'yes', 'no')
## Output

IDproductcategorypricefinal_priceorigin_countrycurrencydiscunt
A1TVelectronics500.0600GermanyUSDyes
A2phoneelectronics175.0210ChinaUSDno
A3laptopelectronics450.0540TaiwanUSDyes
A4screenelectronicsNaNNaNunknownUSDno

The new field purpose is to inform if a discount applies based on the price. The condition is simple, if the price is greater than 250, a discount is applied, otherwise no. Therefore if the “final_price” value is greater than 250, the new column has the value ‘yes’, if it is equal or less than 250, ‘no’.

Method 5: Adding a New Column with apply() and a Python Function

In some cases, we may need to apply a custom function to each row of a DataFrame to calculate the values for a new column. We can use the “apply()" function in combination with a Python function to achieve this.
This method is very useful when the values of the new field depend on multiple conditions and none of the former methods are valid.
To illustrate how the “apply()” function works, we use a dataset (df) with prices of six products sold worldwide. We need a new field to display the final price in the currency of the country is sold. To do so, we need to implement the following conditions:
1) If the country is “usa” and the currency is not USD convert it to dollars.
2) If the country is in the UE and the currency is not EUR convert it to euros.
3) For any other country, apply prices in USD.
The exchange rate of the EUR/USD is 1.058

## Our dataframe df:

productpricecurrencycountry
A500USDusa
B175USDgermany
C450USDindia
D50USDuk
E100EURusa
F80EURspain
## We create a list with European countries eu = ['germany','spain'] ## We define the Python function to implement the conditions def country_price(df): if df['country'] = 'usa' and df['currency'] != 'USD': return df['price'] * 1.058 elif df['country'] in eu and df['currency'] !='EUR': return df['price'] /1.058 elif df['country'] in eu and df['currency']='EUR': return df['price'] else: return df['price'] ## We run the apply() function to implement the conditions to the new column df['sale_price'] = df.apply(country_price, axis=1) ## Output
productpricecurrencycountrysale_price
A500USDusa500
B175USDgermany165.4064
C450USDindia450
D50USDuk50
E100EURusa105.8
F80EURspain80

Using df.apply() with axis=1, we apply the function to each row of the DataFrame.

Conclusion

In this comprehensive guide, we have explored different methods to add new columns to an existing Pandas DataFrame. We have learned how to add an empty column or a column with a constant value, derive a new column from an existing one, add a column at a specific position, add a column based on conditions, and add a column using "apply()" and a Python function. These methods provide us with the flexibility to manipulate and analyze data effectively using Pandas.
To gain a better understanding of these techniques, experiment with these methods working on real projects of data analysis and data wrangling and adapt them to your specific use cases.

Python and Excel Projects for practice
Register New Account
Shopping cart