First, I grouped my data as follow:
df_group = df_train.groupby(['SaleCondition', 'Neighborhood'])['SalePrice'].sum()Result:
# SaleCondition Neighborhood # Abnorml BrDale 288900 # BrkSide 309600 # ClearCr 505000 # CollgCr 562900 # Crawfor 587000 # Edwards 831900 # Gilbert 181000 # IDOTRR 499887 # MeadowV 92000 # Mitchel 417686 # NAmes 3070950 # NPkVill 140000 # NWAmes 993000 # NoRidge 1603000 # OldTown 1180680 # SWISU 489434 # Sawyer 728300 # SawyerW 739400 # Somerst 791552 # StoneBr 187500 # Timber 599500 # AdjLand Edwards 416500 # Alloca Crawfor 559724 # Edwards 453970 # IDOTRR 55993 # Mitchel 206300 # OldTown 89471 # Sawyer 108959 # SawyerW 534112 # Family BrDale 88000 # ... # Normal Gilbert 12121140 # IDOTRR 3148700 # MeadowV 1583800 # Mitchel 6527250 # NAmes 29211643Then, we filter only the largest values for each SaleCondition:
df_group.groupby(level=0, group_keys=False).nlargest(5)Result:
# SaleCondition Neighborhood # Abnorml NAmes 3070950 # NoRidge 1603000 # OldTown 1180680 # NWAmes 993000 # Edwards 831900 # AdjLand Edwards 416500 # Alloca Crawfor 559724 # SawyerW 534112 # Edwards 453970 # Mitchel 206300 # Sawyer 108959 # Family NAmes 533000 # Gilbert 484000 # OldTown 473000 # NWAmes 404500 # Crawfor 393500 # Normal NAmes 29211643 # CollgCr 25010162 # NridgHt 12827100 # OldTown 12518308 # NWAmes 12403155 # Partial NridgHt 11525738 # Somerst 7920842 # CollgCr 4121804 # StoneBr 3337049 # Gilbert 2449366This must be useful to build Pareto charts.
No comments:
Post a Comment