HyperNet-Adaptation for Diffusion-Based Test Case Generation

1Technical University of Munich, 2fortiss GmbH, 3University of Udine
Teaser image showing hynea for differen tasks.

Inputs generated by HyNeA, that induce failures to different SUTs
(Multi-Class Classification, Multi-Class Binary Classification, Object Detection)

Abstract

The increasing deployment of deep learning systems requires systematic evaluation of their reliability in real-world scenarios. Traditional gradient-based adversarial attacks introduce small perturbations that rarely correspond to realistic failures and mainly assess robustness rather than functional behavior. Generative test generation methods offer an alternative but are often limited to simple datasets or constrained input domains. Although diffusion models enable high-fidelity image synthesis, their computational cost and limited controllability restrict their applicability to large-scale testing. We present HyNeA, a generative testing method that enables direct and efficient control over diffusion-based generation. HyNeA provides dataset-free controllability through hypernetworks, allowing targeted manipulation of the generative process without relying on architecture-specific conditioning mechanisms or dataset-driven adaptations such as fine-tuning. HyNeA employs a distinct training strategy that supports instance-level tuning to identify failure-inducing test cases without requiring datasets that explicitly contain examples of similar failures. This approach enables the targeted generation of realistic failure cases at substantially lower computational cost than search-based methods. Experimental results show that HyNeA improves controllability and test diversity compared to existing generative test generators and generalizes to domains where failure-labeled training data is unavailable.

Testing vision models with realistic failures

Many vision testing methods create failures that look unrealistic to humans. Traditional adversarial attacks rely on tiny pixel changes, while newer generative methods aim for realism but often introduce visible artifacts or unintended changes. This limits their usefulness for understanding how and why models fail in practice. Qualitative comparisons show that HyNeA produces more realistic failure-inducing images than prior generative approaches such as GiftBench and Mimicry. GiftBench often distorts the overall structure of an image during its search process, making results harder to interpret. Mimicry can generate diverse outputs, but these frequently change multiple aspects of an image at once, leading to ambiguous failures. In contrast, HyNeA preserves the original image structure and semantics while making focused changes that reliably trigger model errors. Because the generated images remain visually coherent and easy to understand, the resulting failures are more informative for debugging and analysis. This makes HyNeA better suited for functional testing, where realistic and interpretable failures are more valuable than synthetic or heavily distorted examples.

Figure 2

Comparison of Generated Origin - Target Pairs for Different Methods
(HyNeA, GIFTBench, Mimicry)

Instance-level control through hypernetwork adaptation

HyNeA enables targeted failure generation by directly optimizing the image generation process against the system under test. It builds on diffusion models for high image quality, but introduces a hypernetwork that controls generation at the instance level rather than relying on pre-trained conditioning data. For each test case, HyNeA generates an image using a frozen diffusion model and evaluates it with the system under test. The model’s output is used to compute an objective that captures both the desired failure behavior and visual fidelity. This objective is then used to update only the hypernetwork parameters for that specific instance. The diffusion model itself remains unchanged. This design allows HyNeA to apply precise, task-specific control while preserving realism. Because control is learned per instance, HyNeA does not require paired datasets or predefined control signals. The same mechanism can be used across different vision tasks, such as classification, attribute prediction, and object detection, by adjusting only the objective function.

Figure 1

Component Interaction in HyNeA

Executive summary

HyNeA is a diffusion-based test generation framework that produces realistic images optimized to expose functional weaknesses in vision systems. Its key contribution is per-instance hypernetwork adaptation, which enables targeted control of model behavior without paired control data or retraining the diffusion model. Experiments show that HyNeA consistently induces intended failures across multiple vision tasks while preserving visual quality and semantic structure. Compared to generative and search-based baselines, HyNeA provides stronger and more precise control over failure outcomes and requires fewer interactions with the system under test. Human evaluations confirm that HyNeA’s outputs are more realistic and easier to interpret. Overall, HyNeA demonstrates how diffusion models can be used as an active testing tool, enabling realistic, controlled, and efficient functional testing of vision models in settings that better reflect real-world behavior.

BibTeX

@article{weissl2026hypernet,
  title={HyperNet-Adaptation for Diffusion-Based Test Case Generation},
  author={Wei{\ss}l, Oliver and Riccio, Vincenzo and Kacianka, Severin and Stocco, Andrea},
  journal={arXiv preprint arXiv:2601.15041},
  year={2026}
}