Title: | Collapses Levels, Computes Information Value and WoE |
---|---|
Description: | Contains functions to help in selecting and exploring features ( or variables ) in binary classification problems. Provides functions to compute and display information value and weight of evidence (WoE) of the variables , and to convert numeric variables to categorical variables by binning. Functions are also provided to determine which levels ( or categories ) of a categorical variable can be collapsed (or combined ) based on their response rates. The functions provided only work for binary classification problems. |
Authors: | Krishanu Mukherjee |
Maintainer: | Krishanu Mukherjee <[email protected]> |
License: | GPL-2 |
Version: | 0.3.0 |
Built: | 2025-02-11 03:41:39 UTC |
Source: | https://github.com/cran/CollapseLevels |
This function displays the Information Values of the levels of an attribute.
displayIV(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
displayIV(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
dset |
The data frame containing the data set |
col |
A character respresenting the name of the attribute . The attribute can either be numeric or categorical |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
bins |
A number denoting the number of bins.Default value is 10 |
# Load the German_Credit data set supplied with this package data("German_Credit") displayIV(German_Credit,col="Credit_History",resp="Good_Bad")
# Load the German_Credit data set supplied with this package data("German_Credit") displayIV(German_Credit,col="Credit_History",resp="Good_Bad")
This function displays the response percents of the levels of an attribute.
displayResponseRatebyLevels( dset, col = "job", resp = "Good_Bad", bins = 10, adjFactor = 0.5 )
displayResponseRatebyLevels( dset, col = "job", resp = "Good_Bad", bins = 10, adjFactor = 0.5 )
dset |
The data frame containing the data set |
col |
A character respresenting the name of the attribute . The attribute can either be numeric or categorical |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
# Load the German_Credit data set supplied with this package data("German_Credit") displayResponseRatebyLevels(German_Credit,col="Credit_History",resp="Good_Bad")
# Load the German_Credit data set supplied with this package data("German_Credit") displayResponseRatebyLevels(German_Credit,col="Credit_History",resp="Good_Bad")
This function displays the Weight of Evidence of the levels of an attribute.
displayWOE(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
displayWOE(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
dset |
The data frame containing the data set |
col |
A character respresenting the name of the attribute . The attribute can either be numeric or categorical |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
bins |
A number denoting the number of bins.Default value is 10 |
# Load the German_Credit data set supplied with this package data("German_Credit") displayWOE(German_Credit,col="Credit_History",resp="Good_Bad")
# Load the German_Credit data set supplied with this package data("German_Credit") displayWOE(German_Credit,col="Credit_History",resp="Good_Bad")
This data set classifies customers as "Good" or "Bad" as per their credit risks.This data set was contributed by Professor Dr. Hans Hofmann,and can be downloaded from the UCI Machine Learning Repository.
data("German_Credit")
data("German_Credit")
A data frame with 1000 observations on the following 21 variables.
Account_Balance
a factor with levels A11
A12
A13
A14
Duration
a numeric vector
Credit_History
a factor with levels A30
A31
A32
A33
A34
Purpose
a factor with levels A40
A41
A410
A42
A43
A44
A45
A46
A48
A49
Credit_Amount
a numeric vector
Saving_Accounts_Bonds
a factor with levels A61
A62
A63
A64
A65
Current_Employment_Length
a factor with levels A71
A72
A73
A74
A75
Installment_Rate
a numeric vector
MaritalStatusnGender
a factor with levels A91
A92
A93
A94
Guarantors
a factor with levels A101
A102
A103
a numeric vector
Valuable_Asset
a factor with levels A121
A122
A123
A124
Age
a numeric vector
Other_Credit
a factor with levels A141
A142
A143
Housing
a factor with levels A151
A152
A153
Existing_Credits
a numeric vector
Job
a factor with levels A171
A172
A173
A174
Dependents
a numeric vector
Telephone
a factor with levels A191
A192
ForeignWorker
a factor with levels A201
A202
Good_Bad
a numeric vector
https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
data(German_Credit) str(German_Credit)
data(German_Credit) str(German_Credit)
This function displays the Information Values by the levels of an attribute This information is displayed for all attributes in the data set
IVCalc(dset, resp = "y", bins = 10, adjFactor = 0.5)
IVCalc(dset, resp = "y", bins = 10, adjFactor = 0.5)
dset |
The data frame containing the data set |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
A list containing the tables of Information Values by levels for every attribute
# Load the German_Credit data set supplied with this package data("German_Credit") l<-list() # Call the function as follows l<-IVCalc(German_Credit,resp="Good_Bad",bins=10) # Information Value for the attribute Account_Balance in the German_Credit data l$Account_Balance
# Load the German_Credit data set supplied with this package data("German_Credit") l<-list() # Call the function as follows l<-IVCalc(German_Credit,resp="Good_Bad",bins=10) # Information Value for the attribute Account_Balance in the German_Credit data l$Account_Balance
This function displays the Information Values of all the attributes in the data set
IVCalc2(dset, resp = "y", bins = 10, adjFactor = 0.5)
IVCalc2(dset, resp = "y", bins = 10, adjFactor = 0.5)
dset |
The data frame containing the data set |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
A data frame containing the Information Values for every attribute
# Load the German_Credit data set supplied with this package data("German_Credit") d<-data.frame() # Call the function as follows d<-IVCalc2(German_Credit,resp="Good_Bad",bins=10) # Information Value for all the attributes in the German_Credit data d
# Load the German_Credit data set supplied with this package data("German_Credit") d<-data.frame() # Call the function as follows d<-IVCalc2(German_Credit,resp="Good_Bad",bins=10) # Information Value for all the attributes in the German_Credit data d
This function displays the response rates by the levels of an attribute Levels with similar response rates may be combined
levelsCollapser(dset, resp = "y", bins = 10)
levelsCollapser(dset, resp = "y", bins = 10)
dset |
The data frame containing the data set |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
A list containing the tables of response rate by levels for every attribute
# Load the German_Credit data set supplied with this package data("German_Credit") # Create an empty list l<-list() # Call the function as follows l<-levelsCollapser(German_Credit,resp="Good_Bad",bins=10) # response rate by levels of the Account_Balance in the German_Credit data l$Account_Balance # Collapse levels with similar response percentages.
# Load the German_Credit data set supplied with this package data("German_Credit") # Create an empty list l<-list() # Call the function as follows l<-levelsCollapser(German_Credit,resp="Good_Bad",bins=10) # response rate by levels of the Account_Balance in the German_Credit data l$Account_Balance # Collapse levels with similar response percentages.
This function categorizes a numerical variable by binning
numericToCategorical(dset, col = "job", resp = "y", bins = 10, adjFactor = 0.5)
numericToCategorical(dset, col = "job", resp = "y", bins = 10, adjFactor = 0.5)
dset |
The data frame containing the data set |
col |
A character respresenting the name of the numeric attribute which we want to categorize |
resp |
A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values |
bins |
A number denoting the number of bins.Default value is 10 |
adjFactor |
A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute |
A list containing the categorized attribute,a table of Information Values for the levels of the categorized attribute,the Information Value for the entire attribute,a table showing the response rates of the levels of the categorized attribute
# Load the German_Credit data set supplied with this package data("German_Credit") # Create an empty list l<-list() # Call the function as follows. #This will categorize the numeric variable Duration in the German_Credit dataset. l<-numericToCategorical(German_Credit,col="Duration",resp="Good_Bad") # To view the categorized variable l$categoricalVariable # To view the IV table of the levels of the categorized variable l$IVTable # To view the total IV value of the categorized variable l$IV # To view the response rates of the levels of the categorized variable l$collapseLevels
# Load the German_Credit data set supplied with this package data("German_Credit") # Create an empty list l<-list() # Call the function as follows. #This will categorize the numeric variable Duration in the German_Credit dataset. l<-numericToCategorical(German_Credit,col="Duration",resp="Good_Bad") # To view the categorized variable l$categoricalVariable # To view the IV table of the levels of the categorized variable l$IVTable # To view the total IV value of the categorized variable l$IV # To view the response rates of the levels of the categorized variable l$collapseLevels