publications | Zhipeng "Zippo" He

2025

ISWA

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, and Catarina Moreira

Intelligent Systems with Applications, 2025

Abs DOI Code Poster

Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.
arXiv

TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data

Zhipeng He, Chun Ouyang, Lijie Wen, Cong Liu, and Catarina Moreira

arXiv preprint arXiv:2505.21027, 2025

Under Review

Abs arXiv Code

Adversarial attacks pose a significant threat to machine learning models by inducing incorrect predictions through imperceptible perturbations to input data. While these attacks have been extensively studied in unstructured data like images, their application to tabular data presents new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ significantly from those in image data. To address these differences, it is crucial to consider imperceptibility as a key criterion specific to tabular data. Most current research focuses primarily on achieving effective adversarial attacks, often overlooking the importance of maintaining imperceptibility. To address this gap, we propose a new benchmark for adversarial attacks on tabular data that evaluates both effectiveness and imperceptibility. In this study, we assess the effectiveness and imperceptibility of five adversarial attacks across four models using eleven tabular datasets, including both mixed and numerical-only datasets. Our analysis explores how these factors interact and influence the overall performance of the attacks. We also compare the results across different dataset types to understand the broader implications of these findings. The findings from this benchmark provide valuable insights for improving the design of adversarial attack algorithms, thereby advancing the field of adversarial machine learning on tabular data.
MIMIC

Curation and Analysis of MIMICEL - An Event Log for MIMIC-IV Emergency Department

Jia Wei, Chun Ouyang, Bemali Wickramanayake, Zhipeng He, Keshara Perera, and Catarina Moreira

arXiv preprint arXiv:2505.19389, 2025

Abs arXiv Code

The global issue of overcrowding in emergency departments (ED) necessitates the analysis of patient flow through ED to enhance efficiency and alleviate overcrowding. However, traditional analytical methods are time-consuming and costly. The healthcare industry is embracing process mining tools to analyse healthcare processes and patient flows. Process mining aims to discover, monitor, and enhance processes by obtaining knowledge from event log data. However, the availability of event logs is a prerequisite for applying process mining techniques. Hence, this paper aims to generate an event log for analysing processes in ED. In this study, we extract an event log from the MIMIC-IV-ED dataset and name it MIMICEL. MIMICEL captures the process of patient journey in ED, allowing for analysis of patient flows and improving ED efficiency. We present analyses conducted using MIMICEL to demonstrate the utility of the dataset. The curation of MIMICEL facilitates extensive use of MIMIC-IV-ED data for ED analysis using process mining techniques, while also providing the process mining research communities with a valuable dataset for study.
arXiv

Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data

Zhipeng He, Alexander Stevens, Chun Ouyang, Johannes De Smedt, Alistair Barros, and Catarina Moreira

arXiv preprint arXiv:2507.10998, 2025

Abs arXiv Code

Adversarial attacks on tabular data present fundamental challenges distinct from image or text domains due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define \emphimperceptible modifications. Additionally, traditional gradient-based methods prioritise \ell_p-norm constraints, often producing adversarial examples that deviate from the original data distributions, making them detectable. We propose a latent space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate imperceptible adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We specify \emphIn-Distribution Success Rate (IDSR) to measure the proportion of adversarial examples that remain statistically indistinguishable from the input distribution. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches. Our comprehensive analysis includes hyperparameter sensitivity, sparsity control mechanisms, and generative architectural comparisons, revealing that VAE-based attacks depend critically on reconstruction quality but offer superior practical utility when sufficient training data is available. This work highlights the importance of on-manifold perturbations for realistic adversarial attacks on tabular data, offering a robust approach for practical deployment.

2022

KBS

Building interpretable models for business process prediction using shared and specialised attention mechanisms

Bemali Wickramanayake, Zhipeng He, Chun Ouyang, Catarina Moreira, Yue Xu, and Renuka Sindhgatta

Knowledge-Based Systems, 2022

Abs DOI Code

Predictive process analytics, often underpinned by deep learning techniques, is a newly emerged discipline dedicated for providing business process intelligence in modern organisations. Whilst accuracy has been a dominant criterion in building predictive capabilities, the use of deep learning techniques comes at the cost of the resulting models being used as ‘black boxes’, i.e., they are unable to provide insights into why a certain business process prediction was made. So far, little attention has been paid to interpretability in the design of deep learning-based process predictive models. In this paper, we address the ‘black-box’ problem in the context of predictive process analytics by developing attention-based models that are capable to inform both what and why is a process prediction. We propose i) two types of attentions—event attention to capture the impact of specific events on a prediction, and attribute attention to reveal which attribute(s) of an event influenced the prediction; and ii) two attention mechanisms—shared attention mechanism and specialised attention mechanism to reflect different design decisions between whether to construct attribute attention on individual input features (specialised) or using the concatenated feature tensor of all input feature vectors (shared). These lead to two distinct attention-based models, and both are interpretable models that incorporate interpretability directly into the structure of a process predictive model. We conduct experimental evaluation of the proposed models using real-life dataset and comparative analysis between the models for accuracy and interpretability, and draw insights from the evaluation and analysis results. The results demonstrate that i) the proposed attention-based models can achieve reasonably high accuracy; ii) both are capable of providing relevant interpretations (when validated against domain knowledge); and iii) whilst the two models perform equally in terms of prediction accuracy, the specialised attention-based model tends to provide more relevant interpretations than the shared attention-based model, reflecting the fact that the specialised attention-based model is designed to facilitate better interpretability.
MIMIC

MIMICEL: MIMIC-IV Event Log for Emergency Department

Jia Wei, Zhipeng He, Chun Ouyang, and Catarina Moreira

Physionet, 2022

Version 1.0.0

Abs DOI Code

In this work, we extract an event log from the MIMIC-IV-ED dataset by adopting a well-established event log generation methodology, and we name this event log MIMICEL. The data tables in the MIMIC-IV-ED dataset relate to each other based on the existing relational database schema, and each table records the individual activities of patients along their journey in the emergency department (ED). While the data tables in the MIMIC-IV-ED dataset catch snapshots of a patient journey in the ED, the extracted event log MIMICEL aims to capture an end-to-end patient journey process. This will enable us to analyse the existing patient flows, thereby improving the efficiency of an ED process.

2021

Honours

Investigating the Impact of Event Logs on Deep Learning-based Process Prediction Performance

Zhipeng He

Queensland University of Technology, 2021

Honours Thesis

Abs PDF

Business process predictive analytics exploit historical process execution logs, known as event logs, to generate predictions of running cases of a business process, such as next event or remaining time. In the state-of-the-art approaches, deep learning algorithms have attracted increasing attention and as a result deep learning-based prediction models become the mainstream of the research. Often encoding methods for event logs and neural network architectures have been considered as two factors that would impact models’ prediction performance. In fact, an event log, as the input data for prediction, also plays an important role in the predictive pipeline and should not be overlooked. However, there is no recent research concerning with the potential influence of event logs on prediction performance. This thesis aims to investigate how different event logs affect the performance of deep learning-based process prediction models. We propose and implement a benchmark on two different encoding methods and three Long Short-Term Memory (LSTM) models with seven real-life event logs for predicting next activity, next resource and next interval time. Based on the above benchmark, this thesis explores and analyses some key characteristics of event logs and extracts findings on relationships between the characteristics of event logs and performance of process prediction models.