Targeted Deep Learning System Boundary Testing
Oliver Weißl, Amr Abdellatif, Xingcheng Chen, Giorgi Merabishvili, Vincenzo Riccio, Severin Kacianka, Andrea Stocco
This paper introduces a novel boundary testing approach for deep learning systems, combining SUT feedback with controlled latent space manipulations. Unlike prior methods, Mimicry uses style-based GANs to enable precise, goal-driven test case generation.
Abstract:
Evaluating the behavioral boundaries of deep learning (DL) systems is crucial for understanding their reliability across diverse, unseen inputs. Existing solutions fall short as they rely on untargeted random, model- or latent-based perturbations, due to difficulties in generating controlled input variations. In this work, we introduce Mimicry, a novel black-box test generator for fine-grained, targeted exploration of DL system boundaries. Mimicry performs boundary testing by leveraging the probabilistic nature of DL outputs to identify promising directions for exploration. It uses style-based GANs to disentangle input representations into content and style components, enabling controlled feature mixing to approximate the decision boundary. We evaluated Mimicry’s effectiveness in generating boundary inputs for five widely used DL image classification systems of increasing complexity, comparing it to two baseline approaches. Our results show that Mimicry consistently identifies inputs closer to the decision boundary. It generates semantically meaningful boundary test cases that reveal new functional (mis)behaviors, while the baselines produce mainly corrupted or invalid inputs. Thanks to its enhanced control over latent space manipulations, Mimicry remains effective as dataset complexity increases, maintaining competitive diversity and higher validity rates, confirmed by human assessors.