pandas and numpy

Posted on 2022-04-02 by Admin

Pandas

group calculation

groupby usage

df.type1.value_counts() # 输出“type1”这一列不同元素的数量

grouped=df.groupby('type1')  # 按照某一列的元素来分组
grouped.size() # 输出“type1”这一列不同元素的数量
grouped.groups # 列举每个组内的索引
len(grouped) # 输出总组数

Calculations

grouped.sum()  # 以分组的列为索引，求总和
grouped.mean()  # 以分组的列为索引，求平均值
grouped['against_bug'].std()  # 以分组的列为索引，求某一列的标准值
import numpy as np
grouped.agg([np.mean,np.sum,np.std])   #agg()函数可以同时运算多个统计量
grouped.agg({
    
       
    'against_bug':np.mean,'against_dark':np.sum,'against_dragon':np.std})  #不同列计算不同统计量，应用字典

combination of tables

pd.concat([df1,df2,df3])   # df1,df2,df3为三个列表，concat()函数用于将含有相同列名的多个列表联合起来

函数merge()一次只能组合两张表

pd.merge(df1,df2,on=['C','D']) # 用于将 只有部分列名相同的多个列表合并起来，合并的列元素必须完全相同，on=相同的列名
pd.merge(df1,df3,how='left',on=['C','D'])  #合并的列元素可以不同，how决定以左边/右边为主
pd.merge(df1,df3,how='outer',on=['C','D'])  # 全部显示出来，不相同的部分不结合

df1.set_index(['C','D'],inplace=True)  # 合并索引时，不能出现序列号索引
df3.set_index(['C','D'],inplace=True)
pd.merge(df1,df3,left_index=True,right_index=True)    # 合并索引，两个列表的索引个数可以不同

pd.merge(df1,df3,left_on=['A','B'],right_on=['C','D'])  # 合并含有不同列名的列表

df1.set_index(['C','D'],inplace=True)
pd.merge(df1,df3,left_index=True,right_on=['C','D'])   # 将一个列表的索引和另一个列表的列合并

Pivot

import numpy as np
pd.pivot_table(df,values=['against_fairy'],index=['against_bug'],columns=['type1'],aggfunc=[np.sum,np.mean]) # 将某一列设为列名/索引,用value=列名，求其总和、平均值，若不设定value，则求所有列，若不设定函数公式则默认求其平均值

pd.crosstab(df['against_bug'],df['type1'],margins=True)  # 将某一列设为列名/索引,求因子之间的频率，margins=True代表增加最后一列求总和all，默认不求总和all

Numpy

array array

import numpy as np
a = np.array([[[1,3,4,6,10]]])  # 创建多维数组
a.shape  # 显示数组维度
np.zeros_like(a)   #创建0元素，与a形状一致的多维数组

np.zeros([3,2,1])  # 创建0元素的多维数组

result:
insert image description here

np.ones([3,2,1])    # 创建1元素的多维数组

np.eye(3)    # 创建单位矩阵

a=np.random.rand(3,4)   # 随机生成一个3行4列的数组
a.size   # 求数组a的总数字个数
len(a)    # 求数组a的行数
np.sum(a)   # 求数组a所有数字的总和
np.sum(a,axis=1) #每行总和
np.sum(a,axis=0) #每列总和
np.mean(a)   # 求平均值
np.std(a)     #求标准值
np.sort(a)   # 对每行元素进行升序排序，默认axis=1
np.sort(a,axis=0)  # 对每列元素进行升序排序
np.argsort(a)  #返回的是排序后元素的索引位置
np.max(a,axis=0)  #显示每一列的最大值
np.argmax(a,axis=0)   #显示每一列最大值的索引位置


np.where(a>0.5)  #显示符合条件的数字的索引位置
结果：
(array([0, 0, 0, 1, 1, 2, 2, 2, 2], dtype=int64),
 array([1, 2, 3, 1, 3, 0, 1, 2, 3], dtype=int64))   竖着看（0，1）0行1列、（0，2）、（0，3）

random number

np.random.randn(10)   # 创建符合标准正态分布的数组

np.random.choice(10,size=(3,4))   # 创建整数数组，取值范围只能设置最大值，[0,10)，不包含最大值10，数组为3行4列
np.random.choice([1,4,5,7],(3,4))  # 创建整数数组，取值范围只能在列表中选择，数组为3行4列
np.random.randint(1,10,size=(3,4)) # 创建整数数组，取值范围前闭后开，不包含10，数组为3行4列

np.random.rand(3,4) # 创建3行4列的数组，数值取0-1之间，前闭后开
np.random.random((3,4)) # 创建3行4列的数组，数值取0-1之间，前闭后开

the index of the array

a=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])  # 创建二维数组
a[1:2,0:2]  # 指定下标显示其中部分元素
a[[1,2],[0,1]]   # 指定下标（1，0）1行0列以及（2，1）2行1列，显示其中2个元素
a[(1,2),(0,1)]   # 指定下标（1，0）1行0列以及（2，1）2行1列，显示其中2个元素
a[a>4]  # 选取符合条件的元素，a>4会返回布尔型

a=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
mask=np.array([0,1,1],dtype=bool)  # [0,1,1]分别表示0行，1行，2行，布尔型为0表示不选，为1表示选择，即0行2列不选，1行2列选择，2行2列选择
a[mask,2]

a[2::2,::2]  #隔行选取第2行到最后一行，所有列隔列选取

Mathematical Calculations for Arrays

a+2  # 对a中每个元素+2
a*10   # 对a中每个元素*10
np.add(a,b)  相当于 a+b  #两种均可
np.subtract(a,b)   相当于 a-b
np.multiply(a,b)    相当于 a*b
np.divide(b,a)   相当于 b/a

# 若a、b均为3行4列的数组
a.dot(b.T)  #两个数组相乘，(3*4)*(4*3)=3*3

ProgrammerSought

pandas and numpy

Pandas

group calculation

groupby usage

Calculations

combination of tables

Pivot

Numpy

array array

random number

the index of the array

Mathematical Calculations for Arrays

Related Posts