Ipython Numpy pandas

Ipython: Interactive Python Command Line

TAB: autocomplete

?: introspection, command space search

!: Execute system commands

%run: command to execute file code

Ipython commonly used magic commands:

%quickref: Displays a quick reference for Ipython

%magic: show detailed documentation of all magic commands

%debug: Enter the interactive command debugger from the bottom of the latest exception trace

%hist: print the input (optional output) history of the command

%pdb: automatically enter the debugger after an exception occurs

%paste: Execute Python code in the clipboard

%cpaste: Opens a special prompt for manual pasting of Python code to be executed

%reset: delete all variables/names in the interactive namespace

%page OBJECT: print out OBJECT through pager

%run script.py: Execute a Python script file in Ipython

%prun statement: execute the statement through cProfile and print the output of the analyzer

%time statement: reports the execution time of the statement

%timeit statement: Execute the statement multiple times to calculate the average execution time of the system, useful for code with very small execution time

%who %who_is %whos: Display variables defined in the interactive namespace, variable information level/redundancy

%xdel variable: delete the variable and try to clear all references to its object in Ipython

Ipython shortcut keys:

Ctrl+P or Up Arrow: Search backwards in the command history for commands that begin with the currently entered text

Crel+N or Down Arrow: Search forward in the command history for commands that begin with the currently entered text

Ctrl+R: reverse history search by line read (partial match)

Ctrl+shift+V: Paste text from clipboard

Ctrl+C: Abort currently executing code

Ctrl+A: move the cursor to the beginning of the line

Ctrl+E: move the cursor to the end of the line

Ctrl+K: delete the text from the cursor to the end of the line

Ctrl+U: Clear all text executed on the current line

Ctrl+F: move the cursor forward one character

Ctrl+B: move the cursor back one character

Ctrl+L: Clear the screen

NumPy: Array Operations:

Numpy is the basic package for high-performance scientific computing and data analysis, and it is the basis for various other tools such as pandas

The main functions of Numpy:

ndarray, a multidimensional array structure, efficient and space-saving, without looping over the entire set of array data, tools for reading and writing disk data and tools for manipulating memory-mapped files, linear algebra, random number generation and Fourier transform functions , tools for integrating C, C++ and other code

Installation method: pip install numpy

Drinking method: import numpy as np

ndarray: multidimensional array object

Create ndarray: np.array()

ndarray: can also be a multidimensional array, but the element types must be the same

Common properties:

T: Transpose of the array (for high-dimensional arrays)

dtype: The data type of the array element

size: the number of elements in the array

ndim: the dimension of the array

shape: latitude size of the array (in tuple)

dtype: bool_, int, float Type conversion: astype()

Create an ndarray:

array(): Convert the list to an array, optionally specify the dtype explicitly,

arange(): numpy version of range, supports floating point numbers

?: Helps to view method parameters

linspace(): similar to arange(), the third parameter is the length of the array

zeros(): Creates an all-zero array according to the specified shape and dtype

ones(): Creates an array of all 1s according to the specified shape and dtype

reshape()

Numpy: indexing and slicing

Operations between arrays and scalars

a+1,a*3,1//a,a**0.5

Operations between arrays of the same size

a+b,a/b,a**b

Index of the array:

a[5],a2[2][3],a2[2,3]

Slice of array: a[5:8],a[:3],a2[1:2,:4],a2[:,:1] ,a2[:,1]

Boolean index:

Given an array, select all numbers greater than 5 in the array: a[a>5]

Principle: a>5 will judge each element in a and return a boolean array,

Boolean index: Passing a boolean number of the same size into the index will return an array of elements corresponding to all True positions

Question 2: Given an array, select all even numbers greater than 5 in the array.

Question 3: Given an array, select all numbers greater than 5 and even numbers in the array

答案:a[(a>5)&(a%2==0)],a[(a>5)|(a%2==0)]

Fancy Index:

For an array, select the 1st, 3rd, 4th, 6th, and 7th elements to form a new array

a[[1,3,4,6,7]

For a two-dimensional array, select the first and third column answers:

a[:,[1,3]]

NumPy: General Purpose Functions

Universal function: a function that can operate on all elements in an array at the same time

Common general functions:

Unary functions: abs, sqrt, exp, log, ceil, floor, rint, trunc, modf, isnan, isinf, cos, sin,

Binary functions: add,substract,multiply,divide,power,mod,maximum,mininum

Mathematical and Statistical Methods

sum: sum, mean: mean, std: standard deviation, var: variance, min: minimum, max: maximum

argmin: find the minimum index, argmax: find the maximum index

np.random: common functions

rand: Generates a random array of given shape (numbers between 0 and 1)

randint: Generates random integers for a given shape

choice: Generate a random choice for a given shape (normal distribution?)

Shuffle: same as random.shuffle, shuffle the sorting and change the original array

uniform: Generate a random array of a given shape

pandas: data analysis

pandas: is a powerful toolkit for Python data analysis,

pandas is built on NumPy

The main function of pandas: data structure DataFrame, Series, integrated time series function with its functions,

Provides rich mathematical operations and operations to flexibly handle missing data

Installation method: pip install pandas

Reference method: import pandas as pd

A Series is an object similar to a one-dimensional array, consisting of a set of data and a set of data labels (indexes) associated with it

series is more like a combination of lists (arrays) and dictionaries

Creation method: pd.Series([4,7,-5,3])

pd.Series([4,7,-5,3],index=['a','b','c','d'])

pd.Series({'a':1,'b':2})

pd.Series(0,index=['a','b','c','d'])

Get value array and index array: values ​​property and index property

Series Features:

Series supports features of the Numpy module (subscripts):

Create Series from ndarray: Series(arr)

And scalar operation: sr*2

Two Series operations: sr1+sr2

index; sr[0], sr[[1,2,4]] (fancy index)

slice:sr[0:2]

Generic function: np.abs(sr)

Boolean filter: sr[sr>0]

Series supports dictionary features (labels):

Create a Series from a dictionary: Series(dic),

in运算:'a' in sr

键索引:sr['a'],sr[['a','b','d']]

pandas: Series data alignment

When pandas operates, it will be aligned by index and then operated. If there are different indexes, the index of the result is the union of the indexes of the two operands.

E.g:

sr1=pd.Series([12,23,34],index=['c','a','d'])

sr2=pd.Series([11,20,10],index=['d','c','a'])

sr1+sr2

sr3=pd.Series([11,20,10,14],index=['d','c','a','b'])

sr1+sr3

pandas: Series with missing data

Missing data: use NaN (Not a Number) to represent missing data, its value is equal to np.nan, the built-in None value will also be treated as naN

Relevant methods for handling real data:

dropna(): filter out rows with NaN values

fillna(): fill in missing data

isnull(): Returns a boolean array with missing values ​​corresponding to True

notnull(): Returns a boolean array with missing values ​​corresponding to False

Filter missing data: sr.dropna() or sr[data.notnull()]

Fill missing data: fillna(0)

pandas:DataFrame

A DataFrame is a tabular data structure that contains an ordered set of columns

DataFrame can be seen as a dictionary composed of Series and share an index

How to create:

pd.DataFrame({'one':[1,2,3,4],'two':[4,3,2,1]})

pd.DataFrame({'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([1,2,3,4],index=['b','a','c','d'])})

csv file reading and writing:

df.read_csv('filename.csv')

df.to_csv()

View common properties and methods of data:

index gets the index, T transposes, columns gets the column index, values: gets the value array, describe(): gets quick statistics

DataFrame indexing and slicing

DataFrame has row index and column index

Get by label: df['A'],df[['a','b']],df['a'][0],df[0:10][['a','c'] ]

df.loc[:,['a','b']],df.loc[:,'a':'c'],df.loc[0,'a'],df.loc[0:10,['a','c']]

Obtained until passed: df.iloc[3],df.iloc[3,3],df.iloc[0:3,4:6],df.iloc[1:5,:],df.iloc[[1 ,2,3],[0,3]]

Filter by boolean:

df[df['a']>0],df[df['a'].isin([1,3,5])],df[df<0]=0

Data alignment and missing data

When the DataFrame object is operated, the data will also be aligned. The row index and column index of the result are the union of the row index and column index of the two operands, respectively.

DataFrame methods for dealing with missing data:

drop(axis=0,where='any',...)

fillna(),isnull(),notnull()

pandas: other common methods

Common pandas methods (using Series and DataFrame):

mean(axis=0,skipna=False),sun(aixs=1),

sort_index(axis,...,ascending) sort by row or column index

NumPy's general functions also work with pandas

pandas reads from file:

Read file: load data from filename, URL, file object

read_csv default delimiter is csv

The default delimiter for read_table is \t

The main parameters of the read file function:

sep: Specify the separator, you can use regular expressions such as '\s+'

headerNone specifies that the file has no column name

name specifies the column name

index_col specifies a column as an index

skip_row specifies to skip certain rows

na_values: Specify certain strings to represent missing values

parse_dates : Specifies whether certain columns are to be parsed as dates, booleans or lists

Write to file: to_csv

The main parameters of the write file function:

sep, na_rep specifies the string for missing value conversion, the default is an empty string

header=False does not save the column name index=False does not save the row index cols specifies the output column, pass in the list