XGBoost With Python: Discover The Algorithm That Is Winning Machine Learning Competitions 2018 | ISBN: n/a | English | 115 pages | PDF | 1.2 MB
Why Is XGBoost So Powerful? … the secret is its “speed” and “model performance”
The Gradient Boosting algorithm has been around since 1999. So why is it so popular right now?
The reason is that we now have machines fast enough and enough data to really make this algorithm shine.
Academics and researchers knew it was a dominant algorithm, more powerful than random forest, but few people in industry knew about it.
This was due to two main reasons:
The implementations of gradient boosting in R and Python were not really developed for performance and hence took a long time to train even modest sized models. Because of the lack of attention on the algorithm, there were few good heuristics on which parameters to tune and how to tune them.
Naive implementations are slow, because the algorithm requires one tree to be created at a time to attempt to correct the errors of all previous trees in the model.
This sequential procedure results in models with really great predictive capability, but can be very slow to train when hundreds or thousands of trees need to be created from large datasets. XGBoost Changed Everything
XGBoost was developed by Tianqi Chen and collaborators for speed and performance.
Tianqi is a top machine learning researcher, so he knows deeply how the algorithm works. He is also a very good engineer, so he knows how to build high-quality software.
This combination allowed him to combine his talents and re-frame the interns of the gradient boosting algorithm in such a way that it can exploit the full potential of the memory and CPU cores of your hardware.
In XGBoost, individual trees are created using multiple cores and data is organized to minimize the lookup times, all good computer science tips and tricks.
The result is an implementation of gradient boosting in the XGBoost library that can be configured to squeeze the best performance from your machine, whilst offering all of the knobs and dials to tune the behavior of the algorithm to your specific problem. This Power Did Not Go Unnoticed
Soon after the release of XGBoost, top machine learning competitors started using it.
More than that, they started winning competitions on sites like Kaggle. And they were not shy about sharing the news about XGBoost.