CatBoost Part 1: Ordered Target Encoding
StatQuest with Josh Starmer StatQuest with Josh Starmer
1.17M subscribers
27,523 views
0

 Published On Feb 26, 2023

One of the defining features of CatBoost is its concerted effort to avoid data leakage at all costs. In this video, we'll see how it eliminates a potential threat in Target Encoding by ordering the data and encoding it sequentially. This ordered approach is central to everything CatBoost does and we'll see it again in Part 2 when we talk about how it builds trees.

NOTE: This StatQuest is based on the original CatBoost manuscript... https://arxiv.org/abs/1706.09516
...and an example provided in the CatBoost documentation...
https://catboost.ai/en/docs/concepts/...

English
This video has been dubbed using an artificial voice via https://aloud.area120.google.com to increase accessibility. You can change the audio track language in the Settings menu.

Spanish
Este video ha sido doblado al español con voz artificial con https://aloud.area120.google.com para aumentar la accesibilidad. Puede cambiar el idioma de la pista de audio en el menú Configuración.

Portuguese
Este vídeo foi dublado para o português usando uma voz artificial via https://aloud.area120.google.com para melhorar sua acessibilidade. Você pode alterar o idioma do áudio no menu Configurações.


If you'd like to support StatQuest, please consider...
Patreon:   / statquest  
...or...
YouTube Membership:    / @statquest  

...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/

...or just donating to StatQuest!
https://www.paypal.me/statquest

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
  / joshuastarmer  

0:00 Awesome song and introduction
1:56 A slight problem with k-fold target encoding
3:42 Ordered Target Encoding

Corrections:
4:09 It is also worth noting that if there were more than 2 target values, for example, if Loves Troll 2 could be 0, 1 and 2, then, when calculating the OptionCount for a sample with Loves Troll 2 = 1, we would include rows that had Loves Troll 2 = 1 and 2.

#StatQuest #CatBoost #dubbedwithaloud

show more

Share/Embed