@mastersthesis{sakallioglu2023study,
title={A Study of Geometric Semantic Genetic Programming with Linear Scaling},
author={Sakallioglu, Berfin},
year={2023},
school={Universidade NOVA de Lisboa (Portugal)}
}
Berfin Sakallioglu
Master's Thesis, 2023
Machine Learning (ML) is a scientific discipline that endeavors to enable computers to learn without the need for explicit programming. Evolutionary Algorithms (EAs), a subset of ML algorithms, mimic Darwin's Theory of Evolution by using natural selection mechanisms (i.e., survival of the fittest) to evolve a group of individuals (i.e., possible solutions to a given problem). Genetic Programming (GP) is the most recent type of EA and it evolves computer programs (i.e., individuals) to map a set of input data into known expected outputs. Geometric Semantic Genetic Programming (GSGP) extends this concept by allowing individuals to evolve and vary in the semantic space, where the output vectors are located, rather than being constrained by syntax-based structures. Linear Scaling (LS) is a method that was introduced to facilitate the task of GP of searching for the best function matching a set of known data. GSGP and LS have both, independently, shown the ability to outperform standard GP for symbolic regression. GSGP uses Geometric Semantic Operators (GSOs), different from the standard ones, without altering the fitness, while LS modifies the fitness without altering the genetic operators. To the best of our knowledge, there has been no prior utilization of the combined methodology of GSGP and LS for classification problems. Furthermore, despite the fact that they have been used together in one practical regression application, a methodological evaluation of the advantages and disadvantages of integrating these methods for regression or classification problems has never been performed. In this dissertation, a study of a system that integrates both GSGP and LS (GSGP-LS) is presented. The performance of the proposed method, GSGP-LS, was tested on six hand-tailored regression benchmarks, nine real-life regression problems and three real-life classification problems. The obtained results indicate that GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected benefit of this integration. However, for some particularly hard regression datasets, GSGP-LS overfits training data, being outperformed by GSGP on unseen data. This contradicts the idea that LS is always beneficial for GP, warning the practitioners about its risk of overfitting in some specific cases.