Frederico J.J.B. Santos, Jose Manuel Muñoz Contreras, Berfin Sakallioglu, Damaris Nieri Melendez Hualinga, Alberto Tonda, Leonardo Trujillo
Conference Paper, ACM Genetic and Evolutionary Computation Conference (GECCO), San José, Costa Rica, 2026 (to appear)
Ensembles of decision trees, state-of-the-art for regression, share conceptual similarities with Geometric Semantic Genetic Programming (GSGP). Both combine weak learners to build a final ensemble. Boosting algorithms (XGBoost, LightGBM, Gradient Boosting) fit new decision trees to compensate for current errors, while GSGP builds ensembles incrementally using stochastic mutations toward target semantics. Other methods generate weak models in batch, ensuring diversity via bagging (Random Forest) or increased randomness (Extremely Randomized Trees). This paper explores visualization strategies to analyze how these methods navigate semantic space during learning. Kernel PCA projects high-dimensional semantics into 2D representations. Heatmaps show fitness distributions; density plots illustrate which regions each method visits most frequently. For incremental methods, trajectory visualizations capture the sequence of semantic points visited during learning. Qualitative analysis reveals distinct patterns: batch ensembles cluster tightly near target semantics; boosting methods follow directed incremental paths; GSGP explores more broadly than any other method. These insights suggest opportunities for hybrid approaches combining GSGP's exploration with the efficient exploitation of decision tree based methods.