Feature Discretization

rw-book-cover

Metadata

Author: Avi Chawla
Full Title: Feature Discretization
URL: https://blog.dailydoseofds.com/p/feature-discretization

Highlights

During model development, one of the techniques that many don’t experiment with is feature discretization. (View Highlight)
As the name suggests, the idea behind discretization is to transform a continuous feature into discrete features. (View Highlight)
My rationale for using feature discretization has almost always been simple: “It just makes sense to discretize a feature.” (View Highlight)
(View Highlight)
(View Highlight)
In many use cases, like understanding spending behavior based on transaction history, such continuous variables are better understood when they are discretized into meaningful groups → youngsters, adults, and seniors. (View Highlight)
This would result in some coefficients for each feature, which would tell us the influence of each feature on the final prediction. (View Highlight)
But if you think again, in our goal of understanding spending behavior, are we really interested in learning the correlation between exact age and spending behavior? (View Highlight)
Instead, it makes more sense to learn the correlation between different age groups and spending behavior. (View Highlight)
Now that we understand the rationale, there are 2 techniques that are widely preferred. (View Highlight)
One way of discretizing features involves decomposing a feature into equally sized bins. (View Highlight)
Another technique involves decomposing a feature into equal frequency bins: (View Highlight)
After that, the discrete values are one-hot encoded. (View Highlight)
One advantage of feature discretization is that it enables non-linear behavior even though the model is linear. (View Highlight)
This can potentially lead to better accuracy, which is also evident from the image below: (View Highlight)
- Tags: favorite
(View Highlight)
A linear model with feature discretization results in a: • non-linear decision boundary. • better test accuracy. So, in a way, we get to use a simple linear model but still get to learn non-linear patterns. (View Highlight)
Simply put, “signal” refers to the meaningful or valuable information in the data. (View Highlight)
Binnng a feature helps us mitigate the influence of minor fluctuations, which are often mere noise. (View Highlight)
Before I conclude, do remember that feature discretization with one-hot encoding increases the number of features → thereby increasing the data dimensionality. (View Highlight)
And typically, as we progress towards higher dimensions, data become more easily linearly separable. Thus, feature discretization can lead to overfitting. (View Highlight)
To avoid this, don’t overly discretize all features. (View Highlight)
Instead, use it when it makes intuitive sense, as we saw earlier. (View Highlight)
Of course, its utility can vastly vary from one application to another, but at times, I have found that: • Discretizing geospatial data like latitude and longitude can be useful. • Discretizing age/weight-related data can be useful. • Features that are typically constrained between a range makes sense, like savings/income (practically speaking), etc. (View Highlight)

Pelayo Arbués

Explorer

Recent Notes

Why Software Engineers Should Learn a Bit of Data Science

A recommender beast

The next generation of weak learners

Feature Discretization

Metadata

Highlights

Graph View

Table of Contents

Now Reading

![CDATA[Not Boring by Packy McCormick]]>