NumPy

Introduction to NumPy

quick tutorial of python library numpy

In the world of data science and machine learning, there is one library that stands out as a fundamental tool for numerical computation: NumPy. Short for Numerical Python, NumPy provides a powerful array object for efficient manipulation of large datasets and complex mathematical operations.
At its core, NumPy is a Python library designed for efficient numerical computations. It provides a multidimensional array object, along with a collection of functions that operate on these arrays. This makes NumPy an essential tool for tasks such as scientific computing, data analysis, and machine learning.
One of the main reasons for NumPy’s popularity among data scientists and Python developers is its ability to handle large datasets with ease. NumPy arrays are memory-efficient, allowing for fast and efficient manipulation of numerical data. Additionally, NumPy includes multiple mathematical functions and operations that can be applied to these arrays, making it a versatile library for numerical computations.

The N-Dimensional Array: Creating and Manipulating Data

The cornerstone of NumPy is the N-dimensional array, or ndarray. This powerful data structure allows for the representation and manipulation of multi-dimensional data in an efficient and intuitive manner.

Creating 1D, 2D, and 3D Arrays

Creating a NumPy array can be done in several ways. One common method is to convert a Python list into a NumPy array using the np.array() function. For example, to create a 1D array, we can do the following:

import numpy as np

list_a = [1, 2, 3, 4, 5]
array_a = np.array(list_a)

## Output
array([1, 2, 3, 4, 5])

We can also create arrays filled with zeros or ones using the np.zeros() and np.ones() functions, respectively. For example, to create a 2D array filled with zeros, we can use the following code:

array_zeros = np.zeros((3, 3))

## Output
array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]

Similarly, to create a 3D array filled with ones, we can use the np.ones() function:

array_ones = np.ones((2, 2, 2))

## Output
array([[[1., 1.], [1., 1.]], [[1., 1.], [1., 1.]]])

NumPy Arrays vs. Python Lists

While NumPy arrays may seem similar to Python lists, there are several key differences that make arrays more suitable for numerical computations.
Firstly, NumPy arrays are homogeneous, meaning that they can only contain elements of the same data type. This allows for more efficient memory storage and faster computation compared to Python lists, which can contain elements of different types.
Secondly, NumPy arrays support vectorized operations, which allow efficient element-wise computations. This means that mathematical operations can be applied to entire arrays, rather than iterating over individual elements. This results in faster and more concise code compared to using traditional for loops.

# Element-wise multiplication using NumPy arrays
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])
array_c = array_a * array_b

## Output
print(array_c)
array([4,10,18]

# Equivalent code using Python lists 
list_a = [1, 2, 3] 
list_b = [4, 5, 6] 
list_c = [a * b for a, b in zip(list_a, list_b)] 

Important NumPy Functions: Examples and Applications

NumPy provides a vast array of functions that cover a wide range of mathematical operations. Here, we will explore some of the most commonly used functions and their applications.

Array Creation Functions

NumPy offers various functions for creating arrays with specific properties. The "np.arange()" function, for example, creates an array with evenly spaced values within a given range.

array_range = np.arange(0, 10, 2)
## Output 
array([0, 2, 4, 6, 8])

The "np.linspace()" function is another useful tool for creating arrays. It generates an array with a specified number of equally spaced values between a start and end point.

array_linspace = np.linspace(0, 1, 10)

## Output
array([0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444, 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])

Conditional Functions

The goal of the  "np.where()" function is to replace or check values based on a condition.

# Set the condition with np.where()
# If the value of array_a is less than 2, replace it with -1, otherwise print 100
print(np.where(array_a < 2, -1, 100))
[ -1 100 100]

It is also extermely useful for data validation in Pandas dataframes or to create new fields. For example:

import pandas as pd
import numpy as np

## We create a simple dataframe
df1 = pd.DataFrame({'product': ['table', 'phone', 'laptop'], 'price': [200, 120, 400]})

productprice
table200
phone120
laptop400
## To replace the prices of a certain product: df1['price'] = np.where(df1['product']=='table';'650',df1['price']) ## Data engineering: To create a new field df1['category'] = np.where(df1['product']=='table','furniture','electronics') ## Output
productpricecategory
table200furniture
phone120electronics
laptop400electronics

Mathematical and Statistical Functions

NumPy includes mathematical and statistical functions optimized for performance, making them essential tools for developers or data professionals. To calculate the average of an array or the standard deviation:

## To calculate the mean of an array
mean_value = np.mean(array_a)
## Output
2

## The standard deviation 
standard_deviation = np.std(array_a)
## Output
0.816496580927726

Broadcasting and Vectorization

One of the key features of NumPy is broadcasting, which enables element-wise operations between arrays of different shapes. This eliminates the need for explicit loops and makes code more concise and efficient.
Additionally, NumPy supports vectorized operations, which allow mathematical operations to be performed on entire arrays rather than individual elements. This results in faster computation and simplified code.

# Broadcasting example
array_a = np.array([1, 2, 3])
scalar = 2
result = array_a * scalar
## Ouput
array([2, 4, 6]) 

# Vectorized operation example 
array_b = np.array([4, 5, 6]) 
result = array_a * array_b 
## Output
array([ 4, 10, 18])

Conclusion

NumPy is a powerful library that provides efficient numerical computation capabilities for data scientists. With its N-dimensional array object and a wide range of mathematical and statistical functions.
NumPy’s ability to handle large datasets efficiently, support vectorized operations, and provide a wide range of mathematical functions makes it an ideal choice for data scientists. Its seamless integration with other libraries, such as Pandas, further enhances its usefulness in data analysis and machine learning tasks.

Python and Excel Projects for practice
Register New Account
Shopping cart