pyfan.panda.stats.cutting

Created on Aug 13, 2018

@author: fan

import panda.cutting as pd_cut

Module Contents

Functions

pd_winsorize_columnwise(df, winsor_coln_list, coln_perc_cutoffs, return_type, print_array=False, json_debug=False)

Winsorizing column by column, no dependence across coluns.

sample_run()

pyfan.panda.stats.cutting.logger[source]
pyfan.panda.stats.cutting.pd_winsorize_columnwise(df, winsor_coln_list, coln_perc_cutoffs, return_type, print_array=False, json_debug=False)[source]

Winsorizing column by column, no dependence across coluns. Winsorize column by column

cols = 5 rows = 20 np.random.seed(123) data = (np.random.rand(rows,cols)-0.5)*100

df = pd.DataFrame(data, columns=[‘col’ + str(ctr) for ctr in range(cols)]) winsor_coln_list = [‘col0’, ‘col1’,’col3’,’col4’]

Parameters
df: dataFrame

initial dataset

winsor_coln_list: list

list of column names to winsorize [‘col0’, ‘col1’,’col3’,’col4’]

coln_perc_cutoffs: dictionary

a nested dictionary where keys are elements of winsor_coln_list, and values are a dictionary with min and max percentiles of winsorizing values. if min is 0, do not create cutcolss

{‘col0’:{‘q_ge’:0,’q_le’:0.9, ‘v_ll’:10},

‘col1’:{‘q_ge’:0.30,’v_le’:50}, ‘col3’:{‘q_ge’:0.01,’q_le’:0.60, ‘v_ll’:40}, ‘col4’:{‘q_ge’:0.01,’q_le’:1, ‘v_ll’:33, ‘v_gg’:-5}}

return_type: string

‘winsorize’ or ‘cutsubset’

pyfan.panda.stats.cutting.sample_run()[source]