Package 'CollapseLevels' reference manual

Title:	Collapses Levels, Computes Information Value and WoE
Description:	Contains functions to help in selecting and exploring features ( or variables ) in binary classification problems. Provides functions to compute and display information value and weight of evidence (WoE) of the variables , and to convert numeric variables to categorical variables by binning. Functions are also provided to determine which levels ( or categories ) of a categorical variable can be collapsed (or combined ) based on their response rates. The functions provided only work for binary classification problems.
Authors:	Krishanu Mukherjee
Maintainer:	Krishanu Mukherjee <[email protected]>
License:	GPL-2
Version:	0.3.0
Built:	2025-03-13 03:46:11 UTC
Source:	https://github.com/cran/CollapseLevels

displayIV

Description

This function displays the Information Values of the levels of an attribute.

Usage

displayIV(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
displayIV(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)

Arguments

`dset`	The data frame containing the data set
`col`	A character respresenting the name of the attribute . The attribute can either be numeric or categorical
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`adjFactor`	A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute
`bins`	A number denoting the number of bins.Default value is 10

Examples


# Load the German_Credit data set supplied with this package


data("German_Credit")

displayIV(German_Credit,col="Credit_History",resp="Good_Bad")

# Load the German_Credit data set supplied with this package


data("German_Credit")

displayIV(German_Credit,col="Credit_History",resp="Good_Bad")

displayResponseRatebyLevels

Description

This function displays the response percents of the levels of an attribute.

Usage

displayResponseRatebyLevels(
  dset,
  col = "job",
  resp = "Good_Bad",
  bins = 10,
  adjFactor = 0.5
)
displayResponseRatebyLevels(
  dset,
  col = "job",
  resp = "Good_Bad",
  bins = 10,
  adjFactor = 0.5
)

Arguments

`dset`	The data frame containing the data set
`col`	A character respresenting the name of the attribute . The attribute can either be numeric or categorical
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`bins`	A number denoting the number of bins.Default value is 10
`adjFactor`	A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute

Examples


# Load the German_Credit data set supplied with this package

data("German_Credit")

displayResponseRatebyLevels(German_Credit,col="Credit_History",resp="Good_Bad")

# Load the German_Credit data set supplied with this package

data("German_Credit")

displayResponseRatebyLevels(German_Credit,col="Credit_History",resp="Good_Bad")

displayWOE

Description

This function displays the Weight of Evidence of the levels of an attribute.

Usage

displayWOE(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)
displayWOE(dset, col = "xyz", resp = "y", adjFactor = 0.5, bins = 10)

Arguments

`dset`	The data frame containing the data set
`col`	A character respresenting the name of the attribute . The attribute can either be numeric or categorical
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`adjFactor`	A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute
`bins`	A number denoting the number of bins.Default value is 10

Examples


# Load the German_Credit data set supplied with this package

data("German_Credit")

displayWOE(German_Credit,col="Credit_History",resp="Good_Bad")

# Load the German_Credit data set supplied with this package

data("German_Credit")

displayWOE(German_Credit,col="Credit_History",resp="Good_Bad")

This data set classifies customers as "Good" or "Bad" as per their credit risks.This data set was contributed by Professor Dr. Hans Hofmann,and can be downloaded from the UCI Machine Learning Repository.

Usage

data("German_Credit")data("German_Credit")

Format

A data frame with 1000 observations on the following 21 variables.

Account_Balance: a factor with levels A11 A12 A13 A14
Duration: a numeric vector
Credit_History: a factor with levels A30 A31 A32 A33 A34
Purpose: a factor with levels A40 A41 A410 A42 A43 A44 A45 A46 A48 A49
Credit_Amount: a numeric vector
Saving_Accounts_Bonds: a factor with levels A61 A62 A63 A64 A65
Current_Employment_Length: a factor with levels A71 A72 A73 A74 A75
Installment_Rate: a numeric vector
MaritalStatusnGender: a factor with levels A91 A92 A93 A94
Guarantors: a factor with levels A101 A102 A103
‘⁠Duration in Current Address⁠’: a numeric vector
Valuable_Asset: a factor with levels A121 A122 A123 A124
Age: a numeric vector
Other_Credit: a factor with levels A141 A142 A143
Housing: a factor with levels A151 A152 A153
Existing_Credits: a numeric vector
Job: a factor with levels A171 A172 A173 A174
Dependents: a numeric vector
Telephone: a factor with levels A191 A192
ForeignWorker: a factor with levels A201 A202
Good_Bad: a numeric vector

Source

https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)

Examples

data(German_Credit)
str(German_Credit)
data(German_Credit)
str(German_Credit)

IVCalc

Description

This function displays the Information Values by the levels of an attribute This information is displayed for all attributes in the data set

Usage

IVCalc(dset, resp = "y", bins = 10, adjFactor = 0.5)
IVCalc(dset, resp = "y", bins = 10, adjFactor = 0.5)

Arguments

`dset`	The data frame containing the data set
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`bins`	A number denoting the number of bins.Default value is 10
`adjFactor`	A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute

Value

A list containing the tables of Information Values by levels for every attribute

Examples


# Load the German_Credit data set supplied with this package

data("German_Credit")

l<-list()

# Call the function as follows

l<-IVCalc(German_Credit,resp="Good_Bad",bins=10)

# Information Value for  the attribute Account_Balance in the German_Credit data

l$Account_Balance


# Load the German_Credit data set supplied with this package

data("German_Credit")

l<-list()

# Call the function as follows

l<-IVCalc(German_Credit,resp="Good_Bad",bins=10)

# Information Value for  the attribute Account_Balance in the German_Credit data

l$Account_Balance

IVCalc2

Description

This function displays the Information Values of all the attributes in the data set

Usage

IVCalc2(dset, resp = "y", bins = 10, adjFactor = 0.5)
IVCalc2(dset, resp = "y", bins = 10, adjFactor = 0.5)

Arguments

`dset`	The data frame containing the data set
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`bins`	A number denoting the number of bins.Default value is 10
`adjFactor`	A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute

Value

A data frame containing the Information Values for every attribute

Examples


# Load the German_Credit data set supplied with this package

data("German_Credit")

d<-data.frame()

# Call the function as follows

d<-IVCalc2(German_Credit,resp="Good_Bad",bins=10)

# Information Value for all the attributes in the German_Credit data

d


# Load the German_Credit data set supplied with this package

data("German_Credit")

d<-data.frame()

# Call the function as follows

d<-IVCalc2(German_Credit,resp="Good_Bad",bins=10)

# Information Value for all the attributes in the German_Credit data

d

levelsCollapser

Description

This function displays the response rates by the levels of an attribute Levels with similar response rates may be combined

Usage

levelsCollapser(dset, resp = "y", bins = 10)
levelsCollapser(dset, resp = "y", bins = 10)

Arguments

`dset`	The data frame containing the data set
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`bins`	A number denoting the number of bins.Default value is 10

Value

A list containing the tables of response rate by levels for every attribute

Examples


# Load the German_Credit data set supplied with this package

data("German_Credit")

# Create an empty list

l<-list()

# Call the function as follows

l<-levelsCollapser(German_Credit,resp="Good_Bad",bins=10)

# response rate by levels of the Account_Balance in the German_Credit data

l$Account_Balance

# Collapse levels with similar response percentages.

# Load the German_Credit data set supplied with this package

data("German_Credit")

# Create an empty list

l<-list()

# Call the function as follows

l<-levelsCollapser(German_Credit,resp="Good_Bad",bins=10)

# response rate by levels of the Account_Balance in the German_Credit data

l$Account_Balance

# Collapse levels with similar response percentages.

numericToCategorical

Description

This function categorizes a numerical variable by binning

Usage

numericToCategorical(dset, col = "job", resp = "y", bins = 10, adjFactor = 0.5)
numericToCategorical(dset, col = "job", resp = "y", bins = 10, adjFactor = 0.5)

Arguments

`dset`	The data frame containing the data set
`col`	A character respresenting the name of the numeric attribute which we want to categorize
`resp`	A character respresenting the name of the binary outcome variable The binary outcome variable may be a factor with two levels or an integer (or numeric ) with two unique values
`bins`	A number denoting the number of bins.Default value is 10
`adjFactor`	A number or a decimal denoting what is to be added to the number of responses (binary outcome variable is 1 ) or to the number of non responses (binary outcome variable is 0) if either is zero for any level of the attribute

Value

A list containing the categorized attribute,a table of Information Values for the levels of the categorized attribute,the Information Value for the entire attribute,a table showing the response rates of the levels of the categorized attribute

Examples


# Load the German_Credit data set supplied with this package

data("German_Credit")

# Create an empty list

l<-list()

# Call the function as follows.
#This will categorize the numeric variable Duration in the German_Credit dataset.

l<-numericToCategorical(German_Credit,col="Duration",resp="Good_Bad")


# To view the categorized variable

 l$categoricalVariable

 # To view the IV table of the levels of the categorized variable

 l$IVTable

 # To view the total IV value of the  categorized variable

 l$IV

 # To view the response rates of the levels of the categorized variable

 l$collapseLevels

# Load the German_Credit data set supplied with this package

data("German_Credit")

# Create an empty list

l<-list()

# Call the function as follows.
#This will categorize the numeric variable Duration in the German_Credit dataset.

l<-numericToCategorical(German_Credit,col="Duration",resp="Good_Bad")


# To view the categorized variable

 l$categoricalVariable

 # To view the IV table of the levels of the categorized variable

 l$IVTable

 # To view the total IV value of the  categorized variable

 l$IV

 # To view the response rates of the levels of the categorized variable

 l$collapseLevels

Package 'CollapseLevels'

Help Index

displayIV

Description

Usage

Arguments

Examples

displayResponseRatebyLevels

Description

Usage

Arguments

Examples

displayWOE

Description

Usage

Arguments

Examples

German Credit data set

Description

Usage

Format

Source

Examples

IVCalc

Description

Usage

Arguments

Value

Examples

IVCalc2

Description

Usage

Arguments

Value

Examples

levelsCollapser

Description

Usage

Arguments

Value

Examples

numericToCategorical

Description

Usage

Arguments

Value

Examples