NumPy is a scientific computing library for Python. It offers high-level mathematical functions and a multi-dimensional structure (know as
ndarray) for manipulating large data sets.
While NumPy on its own offers limited functions for data analysis, many other libraries that are key to analysis—such as SciPy, matplotlib, and pandas are heavily dependent on NumPy. SciPy, for instance, offers advanced mathematical functions built on top of NumPy's array data structure,
NumPy, along with the libraries mentioned above, is a part of the core SciPy stack—a group of tools for scientific computing in Python.
- NumPy tutorial (Nicolas P. Rougier) - Uses cellular automation as a way to illustrate the main differences between pure Python and NumPy.
- NumPy Basics: Arrays and Vectorized Computation (Wes McKinney) - An introduction to NumPy from a data analyst's perspective.
- NumPy: creating and manipulating numerical data (SciPy Lecture Notes) - Good overview of NumPy with exercises to try out.
- NumPy Discussion - A mailing list devoted only to the NumPy package (not the SciPy stack).
NumPy's array (or
ndarray) is a Python object used for storing data. The main advantage of NumPy over other Python data structures, such as Python's
lists or pandas'
Series, is speed at scale. It's most useful when you're creating large matrices with billions of data points.
You don't need a deep understanding of NumPy's array for most analytical tasks—it's more often used for programming—but there are times when it's more efficient than other Python data structures. If you're interested in diving deeper into this issue, check out these resources:
- What advantages do NumPy arrays offer over (nested) Python lists? (SciPy.org)
- Why NumPy arrays instead of Python lists? (Stack Overflow)
- Performance of Pandas Series vs NumPy Arrays (Pen and Pants)
- Computing the eigenvalues of a matrix
- Manipulating linear matrices