我有以下DataFrame(df):
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
我通过分配添加更多列:
df['mean'] = df.mean(1)
如何将列的意思移到前面,即将其设置为第一列,而其他列的顺序保持不变?
我有以下DataFrame(df):
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
我通过分配添加更多列:
df['mean'] = df.mean(1)
如何将列的意思移到前面,即将其设置为第一列,而其他列的顺序保持不变?
当前回答
我想在一个数据帧前面加上两列,我不知道所有列的确切名称,因为它们是从之前的pivot语句生成的。所以,如果你也遇到同样的情况:把你知道名字的列放在前面,然后让它们跟着“所有其他列”,我提出了以下一般解决方案:
df = df.reindex_axis(['Col1','Col2'] + list(df.columns.drop(['Col1','Col2'])), axis=1)
其他回答
我有一个在panda中重新排序列名的非常具体的用例。有时我在基于现有列的数据帧中创建一个新列。默认情况下,panda将在末尾插入我的新列,但我希望新列插入到它派生的现有列旁边。
def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
'''
Helper function to re-arrange the order of items in a list.
Useful for moving column in pandas dataframe.
Inputs:
input_list - list
input_item_to_move - item in list to move
input_item_insert_here - item in list, insert before
returns:
output_list
'''
# make copy for output, make sure it's a list
output_list = list(input_list)
# index of item to move
idx_move = output_list.index(input_item_to_move)
# pop off the item to move
itm_move = output_list.pop(idx_move)
# index of item to insert here
idx_insert = output_list.index(input_item_insert_here)
# insert item to move into here
output_list.insert(idx_insert, itm_move)
return output_list
import pandas as pd
# step 1: create sample dataframe
df = pd.DataFrame({
'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
'initial_odometer': [101, 500, 322],
'final_odometer': [201, 515, 463],
'other_col_1': ['blah', 'blah', 'blah'],
'other_col_2': ['blah', 'blah', 'blah']
})
print('Step 1: create sample dataframe')
display(df)
print()
# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
display(df)
print()
# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
df=df[ls_cols]
print('Step 3: rearrange columns')
display(df)
您可以使用以下名称列表对数据帧列进行重新排序:
df=df.filter(list_of_col_name)
我相信,如果你知道另一列的位置,@Aman的答案是最好的。
如果您不知道mean的位置,但只有它的名称,则不能直接使用cols=cols[-1:]+cols[:-1]。以下是我接下来能想到的最好的东西:
meanDf = pd.DataFrame(df.pop('mean'))
# now df doesn't contain "mean" anymore. Order of join will move it to left or right:
meanDf.join(df) # has mean as first column
df.join(meanDf) # has mean as last column
这里有一个非常简单的答案(只有一行)。
在将“n”列添加到df中之后,可以执行以下操作。
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
df['mean'] = df.mean(1)
df
0 1 2 3 4 mean
0 0.929616 0.316376 0.183919 0.204560 0.567725 0.440439
1 0.595545 0.964515 0.653177 0.748907 0.653570 0.723143
2 0.747715 0.961307 0.008388 0.106444 0.298704 0.424512
3 0.656411 0.809813 0.872176 0.964648 0.723685 0.805347
4 0.642475 0.717454 0.467599 0.325585 0.439645 0.518551
5 0.729689 0.994015 0.676874 0.790823 0.170914 0.672463
6 0.026849 0.800370 0.903723 0.024676 0.491747 0.449473
7 0.526255 0.596366 0.051958 0.895090 0.728266 0.559587
8 0.818350 0.500223 0.810189 0.095969 0.218950 0.488736
9 0.258719 0.468106 0.459373 0.709510 0.178053 0.414752
### here you can add below line and it should work
# Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.
df = df[list(('mean',0, 1, 2,3,4))]
df
mean 0 1 2 3 4
0 0.440439 0.929616 0.316376 0.183919 0.204560 0.567725
1 0.723143 0.595545 0.964515 0.653177 0.748907 0.653570
2 0.424512 0.747715 0.961307 0.008388 0.106444 0.298704
3 0.805347 0.656411 0.809813 0.872176 0.964648 0.723685
4 0.518551 0.642475 0.717454 0.467599 0.325585 0.439645
5 0.672463 0.729689 0.994015 0.676874 0.790823 0.170914
6 0.449473 0.026849 0.800370 0.903723 0.024676 0.491747
7 0.559587 0.526255 0.596366 0.051958 0.895090 0.728266
8 0.488736 0.818350 0.500223 0.810189 0.095969 0.218950
9 0.414752 0.258719 0.468106 0.459373 0.709510 0.178053
假设您有列为A、B、C的df。
最简单的方法是:
df = df.reindex(['B','C','A'], axis=1)