Python Tutorial
Learn Python for business analysis using real-world data. No coding experience necessary.
Start Now
Mode Studio
The Collaborative Data Science Platform
NumPy
NumPy is a scientific computing library for Python. It offers high-level mathematical functions and a multi-dimensional structure (know as ndarray
) for manipulating large data sets.
While NumPy on its own offers limited functions for data analysis, many other libraries that are key to analysis—such as SciPy, matplotlib, and pandas are heavily dependent on NumPy. SciPy, for instance, offers advanced mathematical functions built on top of NumPy's array data structure, ndarray
.
NumPy, along with the libraries mentioned above, is a part of the core SciPy stack—a group of tools for scientific computing in Python.
NumPy tutorials
- NumPy tutorial (Nicolas P. Rougier) - Uses cellular automation as a way to illustrate the main differences between pure Python and NumPy.
- NumPy Basics: Arrays and Vectorized Computation (Wes McKinney) - An introduction to NumPy from a data analyst's perspective.
- NumPy: creating and manipulating numerical data (SciPy Lecture Notes) - Good overview of NumPy with exercises to try out.
- NumPy Discussion - A mailing list devoted only to the NumPy package (not the SciPy stack).
The NumPy array
NumPy's array (or ndarray
) is a Python object used for storing data. The main advantage of NumPy over other Python data structures, such as Python's lists
or pandas' Series
, is speed at scale. It's most useful when you're creating large matrices with billions of data points.
You don't need a deep understanding of NumPy's array for most analytical tasks—it's more often used for programming—but there are times when it's more efficient than other Python data structures. If you're interested in diving deeper into this issue, check out these resources:
- What advantages do NumPy arrays offer over (nested) Python lists? (SciPy.org*)*
- Why NumPy arrays instead of Python lists? (Stack Overflow*)*
- Performance of Pandas Series vs NumPy Arrays (Pen and Pants*)*
NumPy features
Linear algebra
- Computing the eigenvalues of a matrix
- Manipulating linear matrices
- Vectorization
Statistics
- Finding the min, max, and percentiles of a dataset
- Calculating averages and variances of a dataset, such as the mean, median, and standard deviation
- Computing the histogram of a dataset