numpy、scipy、pandas

The following is an introduction to numpy, Scipy, and pandas. Although some of the structures provided by these packages are "more advanced and more efficient" than python itself, more advanced because they can complete more advanced tasks, but try not to separate the understanding from python when learning, it is best to look at it dialectically Problems, we must see both the difference and the connection, so that we can understand deeply and truly use them freely. For example, python provides a data structure like list, which can be used as an array; for example, we define a list. a=range(10), but if you operate on each element at the same time, such as a square operation, you may need to write a loop at this time, which is not as convenient as np.arange(10)**2. At the same time, the following introductions and cases will also start from the perspective of comparison and connection, and learn these concepts or entities together.

  1. What is numpy?

The standard installation of Python uses a list (list) to store a set of values, which can be used as an array, but since the elements of the list can be any object, the pointer to the object is stored in the list. So in order to save a simple [1,2,3], we need to have 3 pointers and 3 integer objects. For numerical operations, this structure obviously wastes memory and CPU computing time.

In addition, Python also provides an array module. The array object is different from the list. It directly stores the value, which is similar to the one-dimensional array of the C language. However, since it does not support multi-dimensionality and does not have various operation functions, it is not suitable for numerical operations.

The birth of NumPy makes up for these deficiencies. NumPy provides two basic objects: ndarray (N-dimensional array object) and ufunc (universal function object). That is, it provides functions such as commonly used numerical arrays and matrices. ndarray (hereinafter collectively referred to as array) is a multi-dimensional array that stores a single data type, and ufunc is a function that can process arrays.

The advantage is that it is based on vectorized operations, and Numpy arrays are more efficient than lists when performing numerical operations.

The above part mainly refers to the book "Using numpy for scientific computing" (click the title of the book to view the detailed introduction)

  1. What is Scipy?

Numpy accurately provides a basic library for scientific computing in python, chivalrously speaking, it focuses on numerical computing, and can even be said to be a library for multi-dimensional array processing; while scipy is based on numpy and provides a library in python It is a tool set for scientific computing in China, that is to say, it is a higher-level library, mainly including the following modules:

statistics
optimization
numerical integration
linear algebra
Fourier transforms
signal processing
image processing
ODE solvers
special functions

  1. What is pandas?

It is an advanced data structure and sophisticated tool built on Numpy to process data quickly and easily.

A data structure with a labeled axis that supports automatic or explicit data alignment.
Integrated time series functionality.
Process time series and non-time series with the same data structure.
Arithmetic operations and reductions that pass metadata (axis labels) are supported.
  For example, without pandas, manipulating or transposing the elements of a row might require many loops.

Flexible handling of lost data.
Merge and other relational operations in common data-based databases (eg, SQL-based).
Provides data structures: Series and DataFrame

Related Posts