publications
2026
- ASOCCrafting Imperceptible On-Manifold Adversarial Attacks for Tabular DataZhipeng He, Alexander Stevens, Chun Ouyang, Johannes De Smedt, Alistair Barros, and Catarina MoreiraApplied Soft Computing, 186(Part D), 2026, 114286
Adversarial attacks on tabular data present unique challenges due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise -norm constraints, often producing adversarial examples that deviate from the original data distributions, making them detectable. To address this, we propose a latent-space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate statistically consistent adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We introduce In-Distribution Success Rate (IDSR) to jointly evaluate attack effectiveness and distributional alignment. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches, achieving substantially lower outlier rates and higher IDSR across six datasets and three model architectures. Our comprehensive analyses of hyperparameter sensitivity, sparsity control, and generative architecture demonstrate that the effectiveness of VAE-based attacks depends strongly on reconstruction quality and the availability of sufficient training data. When these conditions are met, the proposed framework achieves superior practical utility and stability compared with input-space methods. This work underscores the importance of maintaining on-manifold perturbations for generating realistic and robust adversarial examples in tabular domains.
- ESWATabAttackBench: A Benchmark for Adversarial Attacks on Tabular DataExpert Systems with Applications, 301, 2026, 130491
Adversarial attacks pose a significant threat to machine learning models by inducing incorrect predictions through imperceptible perturbations to input data. While these attacks are well studied in unstructured domains such as images, their behaviour on tabular data remains underexplored due to mixed feature types and complex inter-feature dependencies. This study introduces a comprehensive benchmark that evaluates adversarial attacks on tabular datasets with respect to both effectiveness and imperceptibility. We assess five white-box attack algorithms (FGSM, BIM, PGD, DeepFool, and C&W) across four representative models (LR, MLP, TabTransformer and FT-Transformer) using eleven datasets spanning finance, energy, and healthcare domains. The benchmark employs four quantitative imperceptibility metrics (proximity, sparsity, deviation, and sensitivity) to characterise perturbation realism. The analysis quantifies the trade-off between these two aspects and reveals consistent differences between attack types, with ℓ∞-based attacks achieving higher success but lower subtlety, and L2-based attacks offering more realistic perturbations. The benchmark findings offer actionable insights for designing more imperceptible adversarial attacks, advancing the understanding of adversarial vulnerability in tabular machine learning.
2025
- ISWAInvestigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical AnalysisIntelligent Systems with Applications, 25, 2025, 200461
Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.
- MIMICCuration and Analysis of MIMICEL - An Event Log for MIMIC-IV Emergency DepartmentarXiv preprint arXiv:2505.19389, 2025, 200461
The global issue of overcrowding in emergency departments (ED) necessitates the analysis of patient flow through ED to enhance efficiency and alleviate overcrowding. However, traditional analytical methods are time-consuming and costly. The healthcare industry is embracing process mining tools to analyse healthcare processes and patient flows. Process mining aims to discover, monitor, and enhance processes by obtaining knowledge from event log data. However, the availability of event logs is a prerequisite for applying process mining techniques. Hence, this paper aims to generate an event log for analysing processes in ED. In this study, we extract an event log from the MIMIC-IV-ED dataset and name it MIMICEL. MIMICEL captures the process of patient journey in ED, allowing for analysis of patient flows and improving ED efficiency. We present analyses conducted using MIMICEL to demonstrate the utility of the dataset. The curation of MIMICEL facilitates extensive use of MIMIC-IV-ED data for ED analysis using process mining techniques, while also providing the process mining research communities with a valuable dataset for study.
2022
- KBSBuilding interpretable models for business process prediction using shared and specialised attention mechanismsKnowledge-Based Systems, 248, 2022, 108773
Predictive process analytics, often underpinned by deep learning techniques, is a newly emerged discipline dedicated for providing business process intelligence in modern organisations. Whilst accuracy has been a dominant criterion in building predictive capabilities, the use of deep learning techniques comes at the cost of the resulting models being used as ‘black boxes’, i.e., they are unable to provide insights into why a certain business process prediction was made. So far, little attention has been paid to interpretability in the design of deep learning-based process predictive models. In this paper, we address the ‘black-box’ problem in the context of predictive process analytics by developing attention-based models that are capable to inform both what and why is a process prediction. We propose i) two types of attentions—event attention to capture the impact of specific events on a prediction, and attribute attention to reveal which attribute(s) of an event influenced the prediction; and ii) two attention mechanisms—shared attention mechanism and specialised attention mechanism to reflect different design decisions between whether to construct attribute attention on individual input features (specialised) or using the concatenated feature tensor of all input feature vectors (shared). These lead to two distinct attention-based models, and both are interpretable models that incorporate interpretability directly into the structure of a process predictive model. We conduct experimental evaluation of the proposed models using real-life dataset and comparative analysis between the models for accuracy and interpretability, and draw insights from the evaluation and analysis results. The results demonstrate that i) the proposed attention-based models can achieve reasonably high accuracy; ii) both are capable of providing relevant interpretations (when validated against domain knowledge); and iii) whilst the two models perform equally in terms of prediction accuracy, the specialised attention-based model tends to provide more relevant interpretations than the shared attention-based model, reflecting the fact that the specialised attention-based model is designed to facilitate better interpretability.
- MIMICMIMICEL: MIMIC-IV Event Log for Emergency DepartmentPhysionet, 2022, 108773Version 1.0.0
In this work, we extract an event log from the MIMIC-IV-ED dataset by adopting a well-established event log generation methodology, and we name this event log MIMICEL. The data tables in the MIMIC-IV-ED dataset relate to each other based on the existing relational database schema, and each table records the individual activities of patients along their journey in the emergency department (ED). While the data tables in the MIMIC-IV-ED dataset catch snapshots of a patient journey in the ED, the extracted event log MIMICEL aims to capture an end-to-end patient journey process. This will enable us to analyse the existing patient flows, thereby improving the efficiency of an ED process.
2021
- HonoursInvestigating the Impact of Event Logs on Deep Learning-based Process Prediction PerformanceZhipeng HeQueensland University of Technology, 2021, 108773Honours Thesis
Business process predictive analytics exploit historical process execution logs, known as event logs, to generate predictions of running cases of a business process, such as next event or remaining time. In the state-of-the-art approaches, deep learning algorithms have attracted increasing attention and as a result deep learning-based prediction models become the mainstream of the research. Often encoding methods for event logs and neural network architectures have been considered as two factors that would impact models’ prediction performance. In fact, an event log, as the input data for prediction, also plays an important role in the predictive pipeline and should not be overlooked. However, there is no recent research concerning with the potential influence of event logs on prediction performance. This thesis aims to investigate how different event logs affect the performance of deep learning-based process prediction models. We propose and implement a benchmark on two different encoding methods and three Long Short-Term Memory (LSTM) models with seven real-life event logs for predicting next activity, next resource and next interval time. Based on the above benchmark, this thesis explores and analyses some key characteristics of event logs and extracts findings on relationships between the characteristics of event logs and performance of process prediction models.