Tag Archives: boosted


Gradient boosted trees (GBT) is an machine learning technique for classification and regression problems which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. GBT typically demonstrates strong predictive performance and it is used widely in many commercial applications.

The core idea of GBT is to combine weak learners into a single strong learner. It differs from a traditional decision tree algorithm in two key ways:

It builds the trees in a sequential, stage-wise fashion where each successive tree aims to improve upon the previous.

It fit the trees not only on the target but also on the negative gradient of the loss function with respect to the prediction of the previous trees in the ensemble. This is done to directly minimize the loss function.

The algorithm starts with an initial prediction, usually the mean value of the target attribute in the training data (for regression) or the most probable class (for classification). It then builds the trees sequentially as follows:

In the first iteration, it builds a tree that best predicts the negative gradient of the loss function with respect to the initial prediction on the training data. It does so by splitting the training data into regions based on the values of the predictor attributes. Then within each region it fits a simple model (e.g. mean value for regression) and produces a new set of predictions.

In the next iteration, a tree is added to improve upon the existing ensemble by considering the negative gradient of the loss function with respect to the current ensemble’s prediction from the previous iteration. This process continues for a fixed number of iterations or until no further improvement in predictive performance is observed on a validation set.

The process can be summarized as follows:

Fit tree h1(x) to residuals r-1=y-yn=0 where yn=0 is the initial prediction (e.g. mean of y)

Update model: f1(x)=yn=0+h1(x)

Compute residuals: r1=y-f1(x)

Fit tree h2(x) to residuals r1

Update model: f2(x)=f1(x)+h2(x)

Compute residuals: r2=y-f2(x)

Repeat until terminal condition is met.

The predictions of the final additive model are the predictions of the grown trees combined. Importantly, the trees are not pure decision trees but are fit to approximations of the negative gradients – this turns the boosting process into an optimization algorithm that directly minimizes the loss function.

Some key aspects in which GBT can be optimized include:

Number of total trees (or boosting iterations): More trees generally lead to better performance but too many may lead to overfitting. A value between 50-150 is common.

Learning rate: Shrinks the contribution of each tree. Lower values like 0.1 prevent overfitting but require more trees for convergence. It is tuned by validation.

Tree depth: Deeper trees have more flexibility but risk overfitting. A maximum depth of 5 is common but it also needs tuning.

Minimum number of instances required in leaf nodes: Prevents overfitting by not deeply splitting on small subsets of data.

Subsample ratio of training data: Takes a subset for training each tree to reduce overfitting and adds randomness. 0.5-1 is typical.

Column or feature sampling: Samples a subset of features to consider for splits in trees.

Loss function: Cross entropy for classification, MSE for regression. Other options exist but these are most widely used.

Extensive parameter tuning is usually needed due to complex interactions between hyperparmeters. Grid search, random search or Bayesian optimization are commonly applied techniques. The trained model can consist of anywhere between a few tens to a few thousands of trees depending on the complexity of the problem.

Gradient boosted trees rely on the stage-wise expansion of weak learners into an ensemble that directly optimizes a differentiable loss function. Careful hyperparameter tuning is needed to balance accuracy versus complexity for best generalization performance on new data. When implemented well, GBT can deliver state-of-the-art results on a broad range of tasks.


Science education programs around the world have successfully boosted student comprehension of science through engaging hands-on learning experiences. Some notable examples include:

The Science Olympiad program in the United States encourages K-12 students to explore science concepts through a series of competitive events requiring the application of science knowledge. The program covers over 40 events rotating annually across diverse topics like anatomy, astronomy, chemistry, physics, geology and technology. Participation in Science Olympiad has been shown to improve students’ critical thinking skills and long term interest in STEM disciplines. A 2010 study found that Science Olympiad alumni were three times more likely to major in physical science or engineering compared to their non-participating peers.

Another highly effective program is Science Clubs run both in-school and externally by organizations like 4-H and Discovery Education. Science Clubs engage students in weekly hands-on science activities and experiments largely driven by student curiosity. A 2019 study across 12 US states found that students regularly participating in 4-H Science Clubs for one school year gained on average a 19 percentile point boost in science comprehension versus their non-participating peers based on state standardized tests. The social aspect of Science Clubs combined with student choice in activities also positively impacted student engagement and motivation in science.

Increasingly, immersive summer programs are also proving very impactful for boosting deeper science learning. Well-known examples include the Research Science Institute hosted by MIT each summer. This highly selective program partners rising high school seniors with MIT faculty to work on mentored research projects across a wide range of STEM fields for 6 weeks. Longitudinal tracking has shown RSI alumni are over 4 times more likely to major in and have careers in STEM versus their peers. Similarly, programs like US Science & Engineering Festival’s summer STEM camps integrate project-based learning, field trips and mentorships to foster student enthusiasm and comprehension of complex topics in fields like genetics, aerospace engineering and environmental science. Studies have found participating students gain on average 2 full years of higher science learning versus baseline.

Internationally, many countries have implemented national level programs as part of school curriculum to support science learning. Finland’s extensive investment in its teacher training and classroom resources is widely credited for producing top PISA science scores. Key elements supporting Finland’s success include emphasizing student-centered, collaborative and applied learning approaches through project work. Similarly, Singapore’s “Teach Less, Learn More” philosophy shifts traditional class time towards hands-on lab work, outdoor learning and other inquiry modes. This places students at the center of actively constructing their understanding of scientific concepts and principles. Both Finland and Singapore also leverage community partnerships for field trips, mentorships and career exposure to contextualize STEM learning.

Looking ahead, emerging practices like design thinking and STEAM (Science, Technology, Engineering, Arts and Math) integration show promise in further advancing science comprehension when coupled with experiential learning. By engaging students in tackling real-world problems through iterative design cycles that combine creativity and scientific reasoning, design thinking nurtures competencies like collaboration, critical thinking and communication – all increasingly important for the workforce. STEAM programs allowing students to study science through artistic mediums have also gained traction. For example, a 2019 Australian study found middle schoolers who created science documentaries saw boosted conceptual understanding versus traditional lessons alone.

Successful science comprehension programs share key attributes of hands-on, student-centered, real-world applied and social learning supported through community partnerships and adequate teacher development. National investments enabling these approaches can yield substantial returns by graduating students with deeper STEM comprehension and enthusiasm for lifelong science learning and careers. With continuous refinements guided by educational research, such programs worldwide will continue advancing science capacity and literacy for all.