The Use of Machine Learning Models and SHAP Interaction Values to Predict the Soil Swelling Index

Authors

  • Myriam Letif
    Affiliation
    LEEGO Laboratory, Department of Geotechnic and Hydraulic, Faculty of Civil Engineering, University of Sciences and Technology Houari Boumediene (USTHB), P. O. B. 32, El-Alia Bab-Ezzouar, 16111 Algiers, Algeria
  • Ramdane Bahar
    Affiliation
    LEEGO Laboratory, Department of Geotechnic and Hydraulic, Faculty of Civil Engineering, University of Sciences and Technology Houari Boumediene (USTHB), P. O. B. 32, El-Alia Bab-Ezzouar, 16111 Algiers, Algeria
  • Nourredine Mezouar
    Affiliation
    Département Microzonage Sismique, Centre National de Recherche Appliquée en Génie Parasismique (CGS), Rue Kaddour Rahim, P. O. B. 252, Hussein Dey, 16040 Algiers, Algeria
https://doi.org/10.3311/PPci.36880

Abstract

Predicting the soil swelling index (CS) is crucial for geotechnical engineer to ensure the stability of civil engineering conceptions. Recently, ML models has sparked great interest from researchers in predicting the soil swelling index. However, due to the black-box nature of ML models, their prediction capabilities are still uninterpretable. This study aims to predict the soil swelling index using ML algorithms and interpret predictions. First, it employs the prediction capability of the Gaussian process regression (GPR) algorithm and compares it to the artificial neural network (ANN) for prediction the soil swelling index. Second, the SHAP algorithm as one recent explainable artificial intelligence (XAI) models is applied to interpret the predictions of the complex GPR and ANN models. The compiled experimental database covers 362 clayey samples gathered from different sites located in Northern Algeria. The modeling involved six input features, including the liquid limit (LL), plastic limit (PL), plasticity index (PI), water content (ωn), dry density (γd), and void ratio (e) to predict the soil swelling index. The findings based on statistical metrics showed a good performance of GPR with R2 = 0.78 and of ANN with R2 = 0.79. Comparative study based on Wilcoxon signed- rank test and sign test indicated that the ANN outperform better than GPR. Based on the interpretations obtained by SHAP algorithm, it is observed that the liquid limit (LL) and plastic limit (PL) are the two main input features that influence the CS, indicating, the higher content of LL and PL increase the model's output.

Keywords:

soil swelling index, machine learning, Gaussian process regression, artificial neural network, SHAP algorithm

Citation data from Crossref and Scopus

Published Online

2024-11-04

How to Cite

Letif, M., Bahar, R., Mezouar, N. “The Use of Machine Learning Models and SHAP Interaction Values to Predict the Soil Swelling Index”, Periodica Polytechnica Civil Engineering, 2024. https://doi.org/10.3311/PPci.36880

Issue

Section

Research Article