How to Add New Columns to Dataframe

How to Add New Columns to a Pandas DataFrame


In data analysis and manipulation, it is often necessary to add new columns to a Pandas DataFrame. It helps expand the scope of the analysis, find patterns, and perform data transformations. In this comprehensive guide, we will explore different methods to add new columns to an existing DataFrame in Pandas.

Method 1: Adding an Empty Column or a Column with a Constant Value

The first method is for adding a new column with either an empty value or a constant value for all rows in the DataFrame. This can be useful when we want to initialize a column before populating it with actual values.
To add an empty column, we can simply assign an empty list or array to the desired column name. For example:

## Option 1
df['new_column'] = []

## Option 2
df['new_column'] = ''

Method 2: New Column Derived from an Existing One

Another common scenario is adding a new column that is derived from an existing field in the DataFrame. This can involve performing calculations, applying functions, or mapping values from one column to another.
To create a new column using calculations or functions, we can use brackets to apply the desired operation to the existing column. For example, given the following dataset about sales, let’s see how to add two extra fields:

## Our dataset is sales

# Add an new column with the final price including a tax (VAT) of 20% sales['final_price'] = sales['price'] * 1.2 ## Output

If we want to map values from one column to another based on a dictionary or a list, we can use the “map()" function. For instance, we can add the origin country of the product through a dictionary:

## We define the dictionary
origin = {'TV':'Germany','phone':'China','laptop':'India','screen':'Taiwan'}

# We apply the map funtion() to fill in the new column. It's based on the existing column, "product".

## For values not included in the dictionary, we apply the "fillna()" function.
sales['origin_country'] = sales['product'].map(origin).fillna('unknown')
## Output


Method 3: Adding a New Column with iloc[]

The "iloc[]" function allows to access DataFrame elements by their index. We use this function to add a new column at a specific position in the DataFrame.

## We create a new column with the currency at the end of the table

sales.loc[:,'currency'] = 'USD'
## Output


Method 4: New columns with insert()

To add a new column at a specific position, we can use the insert() function:

## We add the VAT percentaje in the third position
sales.insert(2, 'vat', '0.2')
## Output


In the above code, 2 represents the position where the new column will be inserted (bear in mind it starts at 0).
“vat” is the name of the new column and “0.2” is the value to input.

Method 5: New Column Based on Conditions with np.where()

Sometimes, we need to add a new column based on a condition. The Numpy "np.where()" function can create a new column based on a given condition.

import numpy as np
# We create a new filed based on the "final_price" field
sales['discount'] = np.where(sales['final_price'] > 250, 'yes', 'no')
## Output


The new field purpose is to inform if a discount applies based on the price. The condition is simple, if the price is greater than 250, a discount is applied, otherwise no. Therefore if the “final_price” value is greater than 250, the new column has the value ‘yes’, if it is equal or less than 250, ‘no’.

Method 5: Adding a New Column with apply() and a Python Function

In some cases, we may need to apply a custom function to each row of a DataFrame to calculate the values for a new column. We can use the “apply()" function in combination with a Python function to achieve this.
This method is very useful when the values of the new field depend on multiple conditions and none of the former methods are valid.
To illustrate how the “apply()” function works, we use a dataset (df) with prices of six products sold worldwide. We need a new field to display the final price in the currency of the country is sold. To do so, we need to implement the following conditions:
1) If the country is “usa” and the currency is not USD convert it to dollars.
2) If the country is in the UE and the currency is not EUR convert it to euros.
3) For any other country, apply prices in USD.
The exchange rate of the EUR/USD is 1.058

## Our dataframe df:

## We create a list with European countries eu = ['germany','spain'] ## We define the Python function to implement the conditions def country_price(df): if df['country'] = 'usa' and df['currency'] != 'USD': return df['price'] * 1.058 elif df['country'] in eu and df['currency'] !='EUR': return df['price'] /1.058 elif df['country'] in eu and df['currency']='EUR': return df['price'] else: return df['price'] ## We run the apply() function to implement the conditions to the new column df['sale_price'] = df.apply(country_price, axis=1) ## Output

Using df.apply() with axis=1, we apply the function to each row of the DataFrame.


In this comprehensive guide, we have explored different methods to add new columns to an existing Pandas DataFrame. We have learned how to add an empty column or a column with a constant value, derive a new column from an existing one, add a column at a specific position, add a column based on conditions, and add a column using "apply()" and a Python function. These methods provide us with the flexibility to manipulate and analyze data effectively using Pandas.
To gain a better understanding of these techniques, experiment with these methods working on real projects of data analysis and data wrangling and adapt them to your specific use cases.

Python and Excel Projects for practice
Register New Account
Shopping cart