我有一个数据帧有一个(字符串)列,我想把它分成两个(字符串)列,其中一个列标题为“fips”和另一个“行”

我的数据框架df看起来是这样的:

          row
0    00000 UNITED STATES
1    01000 ALABAMA
2    01001 Autauga County, AL
3    01003 Baldwin County, AL
4    01005 Barbour County, AL

我不知道如何使用df.row。Str[:]来实现拆分行单元格的目标。我可以使用df['fips'] = hello添加一个新列,并用hello填充它。什么好主意吗?

         fips       row
0    00000 UNITED STATES
1    01000 ALABAMA 
2    01001 Autauga County, AL
3    01003 Baldwin County, AL
4    01005 Barbour County, AL

当前回答

没想到我还没见过这个。如果你只需要两段,我强烈推荐…

Series.str.partition

分区在分隔符上执行一次分割,通常性能相当好。

df['row'].str.partition(' ')[[0, 2]]

       0                   2
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

如果需要重命名行,

df['row'].str.partition(' ')[[0, 2]].rename({0: 'fips', 2: 'row'}, axis=1)

    fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

如果你需要把它连接回原来的,使用join或concat:

df.join(df['row'].str.partition(' ')[[0, 2]])

pd.concat([df, df['row'].str.partition(' ')[[0, 2]]], axis=1)

                        row      0                   2
0       00000 UNITED STATES  00000       UNITED STATES
1             01000 ALABAMA  01000             ALABAMA
2  01001 Autauga County, AL  01001  Autauga County, AL
3  01003 Baldwin County, AL  01003  Baldwin County, AL
4  01005 Barbour County, AL  01005  Barbour County, AL

其他回答

没想到我还没见过这个。如果你只需要两段,我强烈推荐…

Series.str.partition

分区在分隔符上执行一次分割,通常性能相当好。

df['row'].str.partition(' ')[[0, 2]]

       0                   2
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

如果需要重命名行,

df['row'].str.partition(' ')[[0, 2]].rename({0: 'fips', 2: 'row'}, axis=1)

    fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

如果你需要把它连接回原来的,使用join或concat:

df.join(df['row'].str.partition(' ')[[0, 2]])

pd.concat([df, df['row'].str.partition(' ')[[0, 2]]], axis=1)

                        row      0                   2
0       00000 UNITED STATES  00000       UNITED STATES
1             01000 ALABAMA  01000             ALABAMA
2  01001 Autauga County, AL  01001  Autauga County, AL
3  01003 Baldwin County, AL  01003  Baldwin County, AL
4  01005 Barbour County, AL  01005  Barbour County, AL
df[['fips', 'row']] = df['row'].str.split(' ', n=1, expand=True)

使用df。赋值以创建一个新的df。参见https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html

split = df_selected['name'].str.split(',', 1, expand=True)
df_split = df_selected.assign(first_name=split[0], last_name=split[1])
df_split.drop('name', 1, inplace=True)

或者以方法链的形式:

df_split = (df_selected
            .assign(list_col=lambda df: df['name'].str.split(',', 1, expand=False),
                    first_name=lambda df: df.list_col.str[0],
                    last_name=lambda df: df.list_col.str[1])
            .drop(columns=['list_col']))

也许有更好的方法,但这是一种方法:

                            row
    0       00000 UNITED STATES
    1             01000 ALABAMA
    2  01001 Autauga County, AL
    3  01003 Baldwin County, AL
    4  01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
                                 columns = ['fips','row'])
   fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

我发现没人用切片法,所以我把2美分写在这里。

df["<col_name>"].str.slice(stop=5)
df["<col_name>"].str.slice(start=6)

该方法将创建两个新列。