用熊猫绘制相关矩阵

我有一个具有大量特征的数据集，因此分析相关矩阵变得非常困难。我想绘制一个相关矩阵，我们使用dataframe.corr()函数从pandas库中获得。pandas库是否提供了任何内置函数来绘制这个矩阵?

当前回答

corrmatrix = df.corr()
corrmatrix *= np.tri(*corrmatrix.values.shape, k=-1).T
corrmatrix = corrmatrix.stack().sort_values(ascending = False).reset_index()
corrmatrix.columns = ['Признак 1', 'Признак 2', 'Корреляция']
corrmatrix[(corrmatrix['Корреляция'] >= 0.7) + (corrmatrix['Корреляция'] <= -0.7)]
drop_columns = corrmatrix[(corrmatrix['Корреляция'] >= 0.82) + (corrmatrix['Корреляция'] <= -0.7)]['Признак 2']
df.drop(drop_columns, axis=1, inplace=True)
corrmatrix[(corrmatrix['Корреляция'] >= 0.7) + (corrmatrix['Корреляция'] <= -0.7)]

2021-11-11 18:45:58

其他回答

除了其他方法，还有对图也很好，它将给出所有情况下的散点图

import pandas as pd
import numpy as np
import seaborn as sns
rs = np.random.RandomState(0)
df = pd.DataFrame(rs.rand(10, 10))
sns.pairplot(df)

2020-01-24 07:11:20

你可以使用matplotlib中的pyplot.matshow():

import matplotlib.pyplot as plt

plt.matshow(dataframe.corr())
plt.show()

编辑:

在评论中有一个关于如何更改轴勾标签的请求。这是一个豪华版，它画在一个更大的图形尺寸上，有轴标签来匹配数据框架，还有一个颜色条图例来解释颜色尺度。

我包括如何调整标签的大小和旋转，我正在使用一个图形比例，使颜色条和主要图形出来的高度相同。

编辑2: 由于df.corr()方法忽略非数值列，在定义x和y标签时应该使用.select_dtypes(['number'])，以避免不必要的标签移位(包括在下面的代码中)。

f = plt.figure(figsize=(19, 15))
plt.matshow(df.corr(), fignum=f.number)
plt.xticks(range(df.select_dtypes(['number']).shape[1]), df.select_dtypes(['number']).columns, fontsize=14, rotation=45)
plt.yticks(range(df.select_dtypes(['number']).shape[1]), df.select_dtypes(['number']).columns, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16);

2015-04-03 13:04:18

corrmatrix = df.corr()
corrmatrix *= np.tri(*corrmatrix.values.shape, k=-1).T
corrmatrix = corrmatrix.stack().sort_values(ascending = False).reset_index()
corrmatrix.columns = ['Признак 1', 'Признак 2', 'Корреляция']
corrmatrix[(corrmatrix['Корреляция'] >= 0.7) + (corrmatrix['Корреляция'] <= -0.7)]
drop_columns = corrmatrix[(corrmatrix['Корреляция'] >= 0.82) + (corrmatrix['Корреляция'] <= -0.7)]['Признак 2']
df.drop(drop_columns, axis=1, inplace=True)
corrmatrix[(corrmatrix['Корреляция'] >= 0.7) + (corrmatrix['Корреляция'] <= -0.7)]

2021-11-11 18:45:58

你可以使用来自seaborn的heatmap()来查看b/w不同特征的相关性:

import matplot.pyplot as plt
import seaborn as sns

co_matrics=dataframe.corr()
plot.figure(figsize=(15,20))
sns.heatmap(co_matrix, square=True, cbar_kws={"shrink": .5})

2021-04-24 17:58:36

形成相关矩阵，在我的情况下，zdf是我需要执行相关矩阵的数据框架。

corrMatrix =zdf.corr()
corrMatrix.to_csv('sm_zscaled_correlation_matrix.csv');
html = corrMatrix.style.background_gradient(cmap='RdBu').set_precision(2).render()

# Writing the output to a html file.
with open('test.html', 'w') as f:
   print('<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-widthinitial-scale=1.0"><title>Document</title></head><style>table{word-break: break-all;}</style><body>' + html+'</body></html>', file=f)

然后我们可以截屏。或者将HTML转换为图像文件。

2020-03-05 04:56:26

用熊猫绘制相关矩阵

推荐文章

最新文章

标签