Ipython Numpy pandas
Ipython: Interactive Python Command Line
TAB: autocomplete
?: introspection, command space search
!: Execute system commands
%run: command to execute file code
Ipython commonly used magic commands:
%quickref: Displays a quick reference for Ipython
%magic: show detailed documentation of all magic commands
%debug: Enter the interactive command debugger from the bottom of the latest exception trace
%hist: print the input (optional output) history of the command
%pdb: automatically enter the debugger after an exception occurs
%paste: Execute Python code in the clipboard
%cpaste: Opens a special prompt for manual pasting of Python code to be executed
%reset: delete all variables/names in the interactive namespace
%page OBJECT: print out OBJECT through pager
%run script.py: Execute a Python script file in Ipython
%prun statement: execute the statement through cProfile and print the output of the analyzer
%time statement: reports the execution time of the statement
%timeit statement: Execute the statement multiple times to calculate the average execution time of the system, useful for code with very small execution time
%who %who_is %whos: Display variables defined in the interactive namespace, variable information level/redundancy
%xdel variable: delete the variable and try to clear all references to its object in Ipython
Ipython shortcut keys:
Ctrl+P or Up Arrow: Search backwards in the command history for commands that begin with the currently entered text
Crel+N or Down Arrow: Search forward in the command history for commands that begin with the currently entered text
Ctrl+R: reverse history search by line read (partial match)
Ctrl+shift+V: Paste text from clipboard
Ctrl+C: Abort currently executing code
Ctrl+A: move the cursor to the beginning of the line
Ctrl+E: move the cursor to the end of the line
Ctrl+K: delete the text from the cursor to the end of the line
Ctrl+U: Clear all text executed on the current line
Ctrl+F: move the cursor forward one character
Ctrl+B: move the cursor back one character
Ctrl+L: Clear the screen
NumPy: Array Operations:
Numpy is the basic package for high-performance scientific computing and data analysis, and it is the basis for various other tools such as pandas
The main functions of Numpy:
ndarray, a multidimensional array structure, efficient and space-saving, without looping over the entire set of array data, tools for reading and writing disk data and tools for manipulating memory-mapped files, linear algebra, random number generation and Fourier transform functions , tools for integrating C, C++ and other code
Installation method: pip install numpy
Drinking method: import numpy as np
ndarray: multidimensional array object
Create ndarray: np.array()
ndarray: can also be a multidimensional array, but the element types must be the same
Common properties:
T: Transpose of the array (for high-dimensional arrays)
dtype: The data type of the array element
size: the number of elements in the array
ndim: the dimension of the array
shape: latitude size of the array (in tuple)
dtype: bool_, int, float Type conversion: astype()
Create an ndarray:
array(): Convert the list to an array, optionally specify the dtype explicitly,
arange(): numpy version of range, supports floating point numbers
?: Helps to view method parameters
linspace(): similar to arange(), the third parameter is the length of the array
zeros(): Creates an all-zero array according to the specified shape and dtype
ones(): Creates an array of all 1s according to the specified shape and dtype
reshape()
Numpy: indexing and slicing
Operations between arrays and scalars
a+1,a*3,1//a,a**0.5
Operations between arrays of the same size
a+b,a/b,a**b
Index of the array:
a[5],a2[2][3],a2[2,3]
Slice of array: a[5:8],a[:3],a2[1:2,:4],a2[:,:1] ,a2[:,1]
Boolean index:
Given an array, select all numbers greater than 5 in the array: a[a>5]
Principle: a>5 will judge each element in a and return a boolean array,
Boolean index: Passing a boolean number of the same size into the index will return an array of elements corresponding to all True positions
Question 2: Given an array, select all even numbers greater than 5 in the array.
Question 3: Given an array, select all numbers greater than 5 and even numbers in the array
答案:a[(a>5)&(a%2==0)],a[(a>5)|(a%2==0)]
Fancy Index:
For an array, select the 1st, 3rd, 4th, 6th, and 7th elements to form a new array
a[[1,3,4,6,7]
For a two-dimensional array, select the first and third column answers:
a[:,[1,3]]
NumPy: General Purpose Functions
Universal function: a function that can operate on all elements in an array at the same time
Common general functions:
Unary functions: abs, sqrt, exp, log, ceil, floor, rint, trunc, modf, isnan, isinf, cos, sin,
Binary functions: add,substract,multiply,divide,power,mod,maximum,mininum
Mathematical and Statistical Methods
sum: sum, mean: mean, std: standard deviation, var: variance, min: minimum, max: maximum
argmin: find the minimum index, argmax: find the maximum index
np.random: common functions
rand: Generates a random array of given shape (numbers between 0 and 1)
randint: Generates random integers for a given shape
choice: Generate a random choice for a given shape (normal distribution?)
Shuffle: same as random.shuffle, shuffle the sorting and change the original array
uniform: Generate a random array of a given shape
pandas: data analysis
pandas: is a powerful toolkit for Python data analysis,
pandas is built on NumPy
The main function of pandas: data structure DataFrame, Series, integrated time series function with its functions,
Provides rich mathematical operations and operations to flexibly handle missing data
Installation method: pip install pandas
Reference method: import pandas as pd
A Series is an object similar to a one-dimensional array, consisting of a set of data and a set of data labels (indexes) associated with it
series is more like a combination of lists (arrays) and dictionaries
Creation method: pd.Series([4,7,-5,3])
pd.Series([4,7,-5,3],index=['a','b','c','d'])
pd.Series({'a':1,'b':2})
pd.Series(0,index=['a','b','c','d'])
Get value array and index array: values property and index property
Series Features:
Series supports features of the Numpy module (subscripts):
Create Series from ndarray: Series(arr)
And scalar operation: sr*2
Two Series operations: sr1+sr2
index; sr[0], sr[[1,2,4]] (fancy index)
slice:sr[0:2]
Generic function: np.abs(sr)
Boolean filter: sr[sr>0]
Series supports dictionary features (labels):
Create a Series from a dictionary: Series(dic),
in运算:'a' in sr
键索引:sr['a'],sr[['a','b','d']]
pandas: Series data alignment
When pandas operates, it will be aligned by index and then operated. If there are different indexes, the index of the result is the union of the indexes of the two operands.
E.g:
sr1=pd.Series([12,23,34],index=['c','a','d'])
sr2=pd.Series([11,20,10],index=['d','c','a'])
sr1+sr2
sr3=pd.Series([11,20,10,14],index=['d','c','a','b'])
sr1+sr3
pandas: Series with missing data
Missing data: use NaN (Not a Number) to represent missing data, its value is equal to np.nan, the built-in None value will also be treated as naN
Relevant methods for handling real data:
dropna(): filter out rows with NaN values
fillna(): fill in missing data
isnull(): Returns a boolean array with missing values corresponding to True
notnull(): Returns a boolean array with missing values corresponding to False
Filter missing data: sr.dropna() or sr[data.notnull()]
Fill missing data: fillna(0)
pandas:DataFrame
A DataFrame is a tabular data structure that contains an ordered set of columns
DataFrame can be seen as a dictionary composed of Series and share an index
How to create:
pd.DataFrame({'one':[1,2,3,4],'two':[4,3,2,1]})
pd.DataFrame({'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([1,2,3,4],index=['b','a','c','d'])})
csv file reading and writing:
df.read_csv('filename.csv')
df.to_csv()
View common properties and methods of data:
index gets the index, T transposes, columns gets the column index, values: gets the value array, describe(): gets quick statistics
DataFrame indexing and slicing
DataFrame has row index and column index
Get by label: df['A'],df[['a','b']],df['a'][0],df[0:10][['a','c'] ]
df.loc[:,['a','b']],df.loc[:,'a':'c'],df.loc[0,'a'],df.loc[0:10,['a','c']]
Obtained until passed: df.iloc[3],df.iloc[3,3],df.iloc[0:3,4:6],df.iloc[1:5,:],df.iloc[[1 ,2,3],[0,3]]
Filter by boolean:
df[df['a']>0],df[df['a'].isin([1,3,5])],df[df<0]=0
Data alignment and missing data
When the DataFrame object is operated, the data will also be aligned. The row index and column index of the result are the union of the row index and column index of the two operands, respectively.
DataFrame methods for dealing with missing data:
drop(axis=0,where='any',...)
fillna(),isnull(),notnull()
pandas: other common methods
Common pandas methods (using Series and DataFrame):
mean(axis=0,skipna=False),sun(aixs=1),
sort_index(axis,...,ascending) sort by row or column index
NumPy's general functions also work with pandas
pandas reads from file:
Read file: load data from filename, URL, file object
read_csv default delimiter is csv
The default delimiter for read_table is \t
The main parameters of the read file function:
sep: Specify the separator, you can use regular expressions such as '\s+'
headerNone specifies that the file has no column name
name specifies the column name
index_col specifies a column as an index
skip_row specifies to skip certain rows
na_values: Specify certain strings to represent missing values
parse_dates : Specifies whether certain columns are to be parsed as dates, booleans or lists
Write to file: to_csv
The main parameters of the write file function:
sep, na_rep specifies the string for missing value conversion, the default is an empty string
header=False does not save the column name index=False does not save the row index cols specifies the output column, pass in the list