pyfan.panda.stats.cutting.pd_winsorize_columnwise¶
-
pyfan.panda.stats.cutting.
pd_winsorize_columnwise
(df, winsor_coln_list, coln_perc_cutoffs, return_type, print_array=False, json_debug=False)[source]¶ Winsorizing column by column, no dependence across coluns. Winsorize column by column
cols = 5 rows = 20 np.random.seed(123) data = (np.random.rand(rows,cols)-0.5)*100
df = pd.DataFrame(data, columns=[‘col’ + str(ctr) for ctr in range(cols)]) winsor_coln_list = [‘col0’, ‘col1’,’col3’,’col4’]
- Parameters
- df: dataFrame
initial dataset
- winsor_coln_list: list
list of column names to winsorize [‘col0’, ‘col1’,’col3’,’col4’]
- coln_perc_cutoffs: dictionary
a nested dictionary where keys are elements of winsor_coln_list, and values are a dictionary with min and max percentiles of winsorizing values. if min is 0, do not create cutcolss
- {‘col0’:{‘q_ge’:0,’q_le’:0.9, ‘v_ll’:10},
‘col1’:{‘q_ge’:0.30,’v_le’:50}, ‘col3’:{‘q_ge’:0.01,’q_le’:0.60, ‘v_ll’:40}, ‘col4’:{‘q_ge’:0.01,’q_le’:1, ‘v_ll’:33, ‘v_gg’:-5}}
- return_type: string
‘winsorize’ or ‘cutsubset’