如何在 Pandas 中由两列组成元组列
- 2025-03-18 08:54:00
- admin 原创
- 37
问题描述:
我有一个 Pandas DataFrame,我想将“纬度”和“经度”列组合起来形成一个元组。
<class 'pandas.core.frame.DataFrame'>
Int64Index: 205482 entries, 0 to 209018
Data columns:
Month 205482 non-null values
Reported by 205482 non-null values
Falls within 205482 non-null values
Easting 205482 non-null values
Northing 205482 non-null values
Location 205482 non-null values
Crime type 205482 non-null values
long 205482 non-null values
lat 205482 non-null values
dtypes: float64(4), object(5)
我尝试使用的代码是:
def merge_two_cols(series):
return (series['lat'], series['long'])
sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
但是,这返回了以下错误:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-261-e752e52a96e6> in <module>()
2 return (series['lat'], series['long'])
3
----> 4 sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
5
...
AssertionError: Block shape incompatible with manager
我该如何解决这个问题?
解决方案 1:
熟悉zip
。它在处理列数据时非常有用。
df['new_col'] = list(zip(df.lat, df.long))
它比使用apply
或更简单,也更快map
。 类似 的np.dstack
速度是 的两倍zip
,但不会给你元组。
解决方案 2:
In [10]: df
Out[10]:
A B lat long
0 1.428987 0.614405 0.484370 -0.628298
1 -0.485747 0.275096 0.497116 1.047605
2 0.822527 0.340689 2.120676 -2.436831
3 0.384719 -0.042070 1.426703 -0.634355
4 -0.937442 2.520756 -1.662615 -1.377490
5 -0.154816 0.617671 -0.090484 -0.191906
6 -0.705177 -1.086138 -0.629708 1.332853
7 0.637496 -0.643773 -0.492668 -0.777344
8 1.109497 -0.610165 0.260325 2.533383
9 -1.224584 0.117668 1.304369 -0.152561
In [11]: df['lat_long'] = df[['lat', 'long']].apply(tuple, axis=1)
In [12]: df
Out[12]:
A B lat long lat_long
0 1.428987 0.614405 0.484370 -0.628298 (0.484370195967, -0.6282975278)
1 -0.485747 0.275096 0.497116 1.047605 (0.497115615839, 1.04760475074)
2 0.822527 0.340689 2.120676 -2.436831 (2.12067574274, -2.43683074367)
3 0.384719 -0.042070 1.426703 -0.634355 (1.42670326172, -0.63435462504)
4 -0.937442 2.520756 -1.662615 -1.377490 (-1.66261469102, -1.37749004179)
5 -0.154816 0.617671 -0.090484 -0.191906 (-0.0904840623396, -0.191905582481)
6 -0.705177 -1.086138 -0.629708 1.332853 (-0.629707821728, 1.33285348929)
7 0.637496 -0.643773 -0.492668 -0.777344 (-0.492667604075, -0.777344111021)
8 1.109497 -0.610165 0.260325 2.533383 (0.26032456699, 2.5333825651)
9 -1.224584 0.117668 1.304369 -0.152561 (1.30436900612, -0.152560909725)
解决方案 3:
Pandas 有专门的itertuples
方法来实现这个功能:
list(df[['lat', 'long']].itertuples(index=False, name=None))
解决方案 4:
您应该尝试使用pd.to_records(index=False)
:
import pandas as pd
df = pd.DataFrame({'language': ['en', 'ar', 'es'], 'greeting': ['Hi', 'اهلا', 'Hola']})
df
language greeting
0 en Hi
1 ar اهلا
2 es Hola
df['list_of_tuples'] = list(df[['language', 'greeting']].to_records(index=False))
df['list_of_tuples']
0 [en, Hi]
1 [ar, اهلا]
2 [es, Hola]
享受!
解决方案 5:
我想添加df.values.tolist()
。(只要您不介意获取一列列表而不是元组)
import pandas as pd
import numpy as np
size = int(1e+07)
df = pd.DataFrame({'a': np.random.rand(size), 'b': np.random.rand(size)})
%timeit df.values.tolist()
1.47 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(zip(df.a,df.b))
1.92 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
解决方案 6:
假设您有 2 列“A”和“B”:
import pandas as pd
df = pd.DataFrame({'A': ['one', 'two', 'three'], 'B': [1, 2, 3]})
print(df)
A B
0 one 1
1 two 2
2 three 3
现在您想要将 A 列和 B 列合并在一起,您可以执行以下操作:
print(df[['A', 'B']].apply(list, axis=1))
0 [one, 1]
1 [two, 2]
2 [three, 3]
dtype: object
或者如果你想要嵌套列表那么:
print(df[['A', 'B']].apply(list, axis=1).tolist())
#[['one', 1], ['two', 2], ['three', 3]]
解决方案 7:
这是这个答案的变体,但由于原始问题要求tuple
,因此这里是修改后的代码:
import pandas as pd
df = pd.DataFrame({"A": ["one", "two", "three"], "B": [1, 2, 3]})
print(df[["A", "B"]].apply(tuple, axis=1))
# 0 (one, 1)
# 1 (two, 2)
# 2 (three, 3)
# dtype: object
相关推荐
热门文章
项目管理软件有哪些?
热门标签
曾咪二维码
扫码咨询,免费领取项目管理大礼包!
云禅道AD