标题:pd.DataFrame 如何将连续满足条件的多行数据进行比较,返回较大值和较小值? ...
只看楼主
reedwu
Rank: 1
等 级:新手上路
威 望:1
帖 子:113
专家分:5
注 册:2006-9-23
结帖率:100%
已结贴  问题点数:20 回复次数:2 
pd.DataFrame 如何将连续满足条件的多行数据进行比较,返回较大值和较小值?
import pandas as pd

## the time column doesn't matter in your problem
df = pd.DataFrame({
    'time':['2021-3-19','2021-3-20','2021-3-21','2021-3-22',
    '2021-3-23','2021-3-24','2021-3-25','2021-3-26','2021-3-27'],
    'value':[10,11,9,5,4,2,4,9,5],
    'status':['X']*3+['Y']+['X']+['Y']*2+['X']*2
})

df_new = pd.DataFrame(columns=df.columns)

## perform a groupby on consecutive values
for _, g in df.groupby([(df.status != df.status.shift()).cumsum()]):
    g = g.sort_values(by='value')
    ## keep the highest value for X
    if g.status.values[0] == 'X':
        g = g.drop_duplicates(subset=['status'], keep='last')

    ## keep the lowest value for Y
    elif g.status.values[0] == 'Y':
        g = g.drop_duplicates(subset=['status'], keep='first')

    else:
        pass
    df_new = pd.concat([df_new, g])
df_new = df_new.reset_index(drop=True)
print(df_new)
————————
程序输出
        time value status
0  2021-3-20    11      X
1  2021-3-22     5      Y
2  2021-3-23     4      X
3  2021-3-24     2      Y
4  2021-3-26     9      X
怎么能在原表中实现如下效果?==x 取较大值  ==y 取较小值
    time    value    status  value_1
0    2021-3-19    10    X        10
1    2021-3-20    11    X        11
2    2021-3-21    9    X        11
3    2021-3-22    5    Y         5
4    2021-3-23    4    X         4
5    2021-3-24    2    Y         2
6    2021-3-25    4    Y         2
7    2021-3-26    9    X         9
8    2021-3-27    5    X         9
搜索更多相关主题的帖子: value 条件 较大值 time status 
2021-03-28 16:48
zyb159357
Rank: 3Rank: 3
等 级:论坛游侠
帖 子:25
专家分:113
注 册:2021-3-15
得分:20 
import pandas as pd

## the time column doesn't matter in your problem
df = pd.DataFrame({
    'time':['2021-3-19','2021-3-20','2021-3-21','2021-3-22',
    '2021-3-23','2021-3-24','2021-3-25','2021-3-26','2021-3-27'],
    'value':[10,11,9,5,4,2,4,9,5],
    'status':['X']*3+['Y']+['X']+['Y']*2+['X']*2
})

df_new = pd.DataFrame(columns=df.columns)
## perform a groupby on consecutive values
for _, g in df.groupby([(df.status != df.status.shift()).cumsum()]):
    g = g.sort_values(by='value')
    ## keep the highest value for X
    if g.status.values[0] == 'X':
        g = g.drop_duplicates(subset=['status'], keep='last')

    ## keep the lowest value for Y
    elif g.status.values[0] == 'Y':
        g = g.drop_duplicates(subset=['status'], keep='first')

    else:
        pass
    df_new = pd.concat([df_new, g])
df_new = df_new.reset_index(drop=True)

def inp(x):
    if x.time in df_new.time.values:
        return x.value

               
df["value_1"]=df.apply(inp,axis=1)
#参照df_new.time,给相应df.value_1赋值.

df.loc[:0,'value_1']=df.head(1).value
#第一行必然: df.value_1= df.value

df["value_1"]=df.value_1.fillna(method='ffill')
#对空值NaN 参照上一行的value_1赋值.

print(df)
exit()
2021-03-29 01:22
reedwu
Rank: 1
等 级:新手上路
威 望:1
帖 子:113
专家分:5
注 册:2006-9-23
得分:0 
回复 2楼 zyb159357
谢谢
2021-03-29 11:25



参与讨论请移步原网站贴子:https://bbs.bccn.net/thread-505276-1-1.html




关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 0.266930 second(s), 7 queries.
Copyright©2004-2024, BCCN.NET, All Rights Reserved