PhD Thesis Defenses 2022
PhD thesis defenses are a public affair and open to anyone who is interested. Attending them is a great way to get to know the work going on by your peers in the various research groups. On this page you will find a list of upcoming and past defense talks.
Please go here for electronic access to most of the doctoral dissertations from Saarbrücken Computer Science going back to about 1990.
Dezember
Michael HEDDERICH
(Advisor: Prof. Dietrich Klakow)
Weak Supervision and Label Noise Handling for Natural Language Processing in Low-Resource Scenarios
Tuesday, 20.12.22, 11:00h
Building C7 4, conference room
Also per video conference
Abstract:
The lack of large amounts of labeled data is a significant factor blocking many low-resource languages and domains from catching up with recent advancements in natural language processing. To reduce this dependency on labeled instances, weak supervision (semi-)automatically annotates unlabeled data. These labels can be obtained more quickly and cheaply than manual, gold-standard annotations. They also, however, contain more errors. Handling these noisy labels is often required to leverage the weakly supervised data successfully.
In this dissertation, we study the whole weak supervision pipeline with a focus on the task of named entity recognition. We develop a tool for automatic annotation, and we propose an approach to model label noise when a small amount of clean data is available. We study the factors that influence the noise model’s quality from a theoretic perspective, and we validate this approach empirically on several different tasks and languages. An important aspect is the aim for a realistic evaluation. We perform our analysis, among others, on several African low-resource languages. We show the performance benefits that can be achieved using weak supervision and label noise modeling. But we also highlight open issues that the field still has to overcome. For the low-resource settings, we expand the analysis to few-shot learning. For classification errors, we present a novel approach to obtain interpretable insights of where classifiers fail.
Alexander KAMPMANN
(Advisor: Prof. Andreas Zeller)
Leveraging Input Features for Testing and Debugging
Monday, 19.12.22, 11:00h
Building E9 1, room 0.01
Also per video conference
Abstract:
When debugging software, it is paramount to understand why the program failed in the first place. How can we aid the developer in this process, and automate the process of understanding a program failure? The behavior of a program is determined by its inputs. A common step in debugging is to determine how the inputs influences the program behavior. In this thesis, I present Alhazen, a tool which combines a decision tree and a test generator into a feedback loop to create an explanation for which inputs lead to a specific program behavior. This can be used to generate a diagnosis for a bug automatically. Alhazen is evaluated for generating additional inputs, and recognizing bug-triggering inputs. The thesis makes first advances towards a user study on whether Alhazen helps software developers in debugging. Results show that Alhazen is effective for both use cases, and indicate that machine-learning approaches can help to understand program behavior. Considering Alhazen, the question what else can be done with input features follows naturally. In the thesis I present Basilisk, a tool which focuses testing on a specific part of the input. Basilisk is evaluated on its own, and an integration with Alhazen is explored. The results show promise, but also some challenges. All in all, the thesis shows that there is some potential in considering input features.
Tobias ALT-VEIT
(Advisor: Prof. Joachim Weickert)
Connecting Mathematical Models for Image Processing and Neural Networks
Thursday, 15.12.22, 16:15h
Per video conference
Abstract:
This thesis deals with the connections between mathematical models for image processing and deep learning. While data-driven deep learning models such as neural networks are flexible and well performing, they are often used as a black box. This makes it hard to provide theoretical model guarantees and scientific insights. On the other hand, more traditional, model-driven approaches such as diffusion, wavelet shrinkage, and variational models offer a rich set of mathematical foundations. Our goal is to transfer these founda-tions to neural networks. To this end, we pursue three strategies. First, we design trainable variants of traditional models and reduce their parameter set after training to obtain transparent and adaptive models. Moreover, we investigate the architectural design of numerical solvers for partial differential equations and translate them into building blocks of popular neural network architectures. This yields criteria for stable networks and inspires novel design concepts. Lastly, we present novel hybrid models for inpainting that rely on our theoretical findings. These strategies provide three ways for combining the best of the two worlds of model- and data-driven approaches. Our work contributes to the overarching goal of closing the gap between these worlds that still exists in performance and understanding.
Heiko BECKER
(Advisor: Dr. Eva Darulová, now Uppsala)
Verified Compilation and Optimization of Floating-Point Kernels
Wednesday, 14.12.22, 10:15h
Building E1 5, room 002
Also per video conference
Abstract:
When verifying safety-critical code on the level of source code, we trust the compiler to produce machine code that preserves the behavior of the source code. Trusting a verified compiler is easy. A rigorous machine-checked proof shows that the compiler correctly translates source code into machine code. Modern verified compilers (e.g. CompCert and CakeML) have rich input languages, but only rudimentary support for floating-point arithmetic. In fact, state-of-the-art verified compilers only implement and verify an inflexible one-to-one translation from floating-point source code to machine code. This translation completely ignores that floating-point arithmetic is actually a discrete representation of the continuous real numbers.
This thesis presents two extensions improving floating-point arithmetic in CakeML. First, the thesis demonstrates verified compilation of elementary functions to floating-point code in: Dandelion, an automatic verifier for polynomial approximations of elementary functions; and libmGen, a proof-producing compiler relating floating-point machine codeto the implemented real-numbered elementary function. Second, the thesis demonstrates verified optimization of floating-point code in: Icing, a floating-point language extending standard floating-point arithmetic with optimizations similar to those used by unverified compilers, like GCC and LLVM; and RealCake, an extension of CakeML with Icing into the first fully verified optimizing compiler for floating-point arithmetic.
Jana HOFMANN
(Advisor: Prof. Bernd Finkbeiner)
Logical Methods for the Hierarchy of Hyperlogics
Monday, 12.12.22, 16:00h
Building E1 1, room 407
Abstract:
In this thesis, we develop logical methods for reasoning about hyperproperties. Hyperproperties describe relations between multiple executions of a system. Unlike trace properties, hyperproperties comprise relational properties like noninterference, symmetry, and robustness. While trace properties have been studied extensively, hyperproperties form a relatively new concept that is far from fully understood. We study the expressiveness of various hyperlogics and develop algorithms for their satisfiability and synthesis problems.
In the first part, we explore the landscape of hyperlogics based on temporal logics, first-order and second-order logics, and logics with team semantics. We establish that first-order/second-order and temporal hyperlogics span a hierarchy of expressiveness, whereas team logics constitute a radically different way of specifying hyperproperties. Furthermore, we introduce the notion of temporal safety and liveness, from which we obtain fragments of HyperLTL (the most prominent hyperlogic) with a simpler satisfiability problem.
In the second part, we develop logics and algorithms for the synthesis of smart contracts. We introduce two extensions of temporal stream logic to express (hyper)properties of infinite-state systems. We study the realizability problem of these logics and define approximations of the problem in LTL and HyperLTL. Based on these approximations, we develop algorithms to construct smart contracts directly from their specifications.
Gerrit GROßMANN
(Advisor: Prof. Verena Wolf)
Stochastic Spreading on Complex Networks
Monday, 12.12.22, 10:00h
Building E1 5, room 029
Abstract:
Complex interacting systems are ubiquitous in nature and society. Computational modeling of these systems is of great relevance for science and engineering. Complex networks are common representations of these systems (e.g., friendship networks or road networks). Dynamical processes (e.g., virus spreading, traffic jams) that evolve on these networks are shaped and constrained by the underlying connectivity. This thesis provides numerical methods to study stochastic spreading processes on complex networks. We consider the processes as inherently probabilistic and analyze their behavior through the lens of probability theory. The contributions include methods for efficient simulations, controlling the spreading using interventions, and inferring the latent interaction graph.
Janis KALOFOLIAS
(Advisor: Prof. Jilles Vreeken)
Subgroup Discovery for Structured Target Concepts
Thursday, 07.12.22, 10:30h
Building C9 3, conference room
Abstract:
In this work we study subgroup discovery, a theoretical framework for finding subgroups in data —i.e., named sub-populations— whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset.This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. Namely, we show how to efficiently discover representative subgroups, that ensure ensure fairness of the sub-population with regard to a sensitive trait; for entities with additional pairwise relational information we introduce a novel measure of connectedness beyond established measures, and use it to discover named robustly connected sub-populations. Further, we introduce kernelised subgroup discovery, which generalises this task to entities of arbitrary structure and with the additional flexibility of only requiring a Gramian appropriate for the given entities. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel, which allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, and propose a meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel-based method.
Josenildo COSTA DA SILVA
(Advisor: PD Dr. Matthias Klusch)
Privacy-Preserving Distributed Data Mining
Wednesday, 07.12.22, 14:00h
Video conference
Abstract:
This thesis is concerned with privacy-preserving distributed data mining algorithms. The main challenges in this setting are inference attacks and the formation of collusion groups. The inference problem is the reconstruction of sensitive data by attackers from non-sensitive sources, such as intermediate results, exchanged messages, or public information. Moreover, in a distributed scenario, malicious insiders can organize collusion groups to deploy more effective inference attacks. This thesis shows that existing privacy measures do not adequately protect privacy against inference and collusion. Therefore, in this thesis, new measures based on information theory are developed to overcome the identified limitations. Furthermore, a new distributed data clustering algorithm is presented. The clustering approach is based on a kernel density estimates approximation that generates a controlled amount of ambiguity in the density estimates and provides privacy to original data. Besides, this thesis also introduces the first privacy-preserving algorithms for frequent pattern discovery in a distributed time series. Time series are transformed into a set of n-dimensional data points and finding frequent patterns reduces to finding local maxima in the n-dimensional density space. The proposed algorithms are linear in the size of the dataset with low communication costs, validated by experimental evaluation using different datasets.
Marcel STEINMETZ
(Advisor: Prof. Jörg Hoffmann)
Conflict-Driven Learning in AI Planning State-Space Search
Monday, 05.12.22, 17:00h
Building E1 1, Room 4.01
Abstract:
Many combinatorial computation problems in computer science can be cast as a reachability problem in an implicitly described, potentially huge, graph: the state space. State-space search is a versatile and widespread method to solve such reachability problems, but it requires some form of guidance to prevent exploring that combinatorial space exhaustively. Conflict-driven learning is an indispensab-le search ingredient for solving constraint satisfaction problems (most prominent-ly, Boolean satisfiability). It guides search towards solutions by identifying con-flicts during the search, i.e., search branches not leading to any solution, learning from them knowledge to avoid similar conflicts in the remainder of the search.
In this thesis, we adapt the conflict-driven learning methodology to goal-reachability objectives in classical and probabilistic planning. The canonical form of conflicts in this context are dead-end states, i.e., states from which the desired goal property cannot be reached. We pioneer methods for learning sound and generalizable dead-end knowledge from conflicts encountered during forward sta-te-space search. We develop search methods to identify conflicts in deterministic and probabilistic state spaces, and we develop suitable refinement methods for the different unsolvability detectors so to recognize these and similar dead-end states. Arranged in a depth-first search, our techniques approach the elegance of conflict-driven learning in constraint satisfaction, featuring the ability to learn to refute search subtrees, and intelligent backjumping to the root cause of a con-flict. On the side, we contribute and explore a large design space of probabilistic state-space search methods, establishing new search algorithms, admissible sta-te-space reduction techniques, and goal-probability bounds suitable for heuristic state-space search.
November
Jie HUANG
(Advisor: Prof. Michael Backes)
Mitigating Security and Privacy Threats from Untrusted Application Components on Android
Thursday, 28.11.22, 12:00h
The defense will take place via video conference. Please inquire about the link from the research group.
Abstract:
While Android’s data-intensive and open-source nature, combined with its less-than-strict market approval process, has allowed the installation of flawed and even mali-cious apps, its coarse-grained security model and update bottleneck in the app ecosystem make the platform’s privacy and security situation more worrying.
This dissertation introduces a line of works that mitigate privacy and security threats from untrusted app components. The first work presents a compiler-based library com-partmentalization solution that utilizes privilege separation to establish a strong trust-worthy boundary between the host app and untrusted lib components, thus protecting sensitive user data from being compromised by curious or malicious ad libraries. While for vulnerable third-party libraries, we then build the second work that implements an API-compatibility-based library update framework using drop-in replacements of outdated libraries to minimize the open vulnerability window caused by libraries and we perform multiple dynamic tests and case studies to investigate its feasibility. Our latest work fo-cuses on the misusing of powerful accessibility (a11y) features in untrusted apps. We pre-sent a privacy-enhanced a11y framework that treats the a11y logic as a pipeline com-posed of multiple modules running in different sandboxes. We further enforce flow con-trol over the communication between modules, thus reducing the attack surface from abusing a11y APIs while preserving the a11y benefits.
Patrick FERBER
(Advisors: Profs. Jörg Hoffmann, UdS, & Malte Hemert, Uni Basel)
Machine Learning for Classical Planning: Neural Network Heuristics, Online Portfolios, and State Space Topologies
Thursday, 17.11.22, 17:00h
The defense will take place in Basel. Attendance via video conference is also possible.
Abstract:
Many real world problems can be solved by state space search. We uses machine learning to improve the state of the art in planning in three directions. Informed heuristics are a key component of most state space searches. In the first part, we train neural networks to become well-informed heuristics for a group of similar problems. There exists no single best planning algorithm. Also our trained heuristics exhibit complementary strengths. A priori it is unknown which algorithm to use on a new problem. Thus, we present new state of the art techniques for algorithm selection for state space search. Even if we select the right algorithm, some problems are too difficult for all available algorithms. The search space of a problem together with a heuristic forms a topology. Exploiting this topology greatly improves search, but the topology can only be computed after the search terminated. We present a new method to generalize the topology from a simple problem to similar, but harder problems.
Maryam GHAREGHANI
(Advisor: Prof. Tobias Marschall, now Düsseldorf)
Single-cell strand sequencing for structural variant analysis and genome assembly
Wednesday, 02.11.22, 15:00h
Building E2 1, room 0.01
Abstract:
Rapid advances of DNA sequencing technologies and development of computational tools to analyze sequencing data has started a revolution in the field of genetics. DNA sequencing has applications in medical research, disease diagnosis and treatment, and population genetic studies. Different sequencing techniques have their own advantages and limitations, and they can be used together to solve genome assembly and genetic variant detection. The focus of this thesis is on a specific single-cell sequencing technology, called strand sequencing. With its chromosome and haplotype-specific strand information, this technique has very powerful signals for discovery of genomic structural variations, haplotype phasing, and chromosome clustering. We developed statistical and computational tools to exploit this information from strand sequencing technology. I first present a computational framework for detecting structural variations in single cells using strand sequencing data. These variations and genomic rearrangements have been observed in cancer, therefore the discovery of such events within cell populations can lead to a more accurate picture of cancer genomes and help in diagnosis. In the remainder of this defense, I elaborate on two computational pipelines for clustering long DNA sequences by their original chromosome and haplotype in the absence of a reference genome. These pipelines are developed to facilitate genome assembly and de novo haplotype phasing in a fast and accurate manner. The resulting haplotype assemblies can be useful in studying genomic variations with no reference bias, gaining insights in population genetics, and detection of compound heterozygosity.
October
André ZENNER
(Advisor: Prof. Antonio Krüger)
Advancing Proxy-Based Haptic Feedback in Virtual Reality
Friday, 28.10.22, 15:00h
Building D3 2, „Reuse“ seminar room
Abstract:
This thesis advances haptic feedback for Virtual Reality (VR). Our work is guided by Sutherland’s 1965 vision of the ultimate display, which calls for VR systems to control the existence of matter. To push towards this vision, we build upon proxy-based haptic feedback, a technique characterized by the use of passive tangible props. The goal of this thesis is to tackle the central drawback of this approach, namely, its inflexibility, which yet hinders it to fulfill the vision of the ultimate display. Guided by four research questions, we first show-case the applicability of proxy-based VR haptics by employing the technique for data exploration. We then extend the VR system’s control over users‘ haptic impressions in three steps. First, we contribute the class of Dynamic Passive Haptic Feedback (DPHF) alongside two novel concepts for conveying kinesthetic properties, like virtual weight and shape, through weight-shifting and drag-changing proxies. Conceptually orthogonal to this, we study how visual-haptic illusions can be leveraged to unnoticeably redirect the user’s hand when reaching towards props. Here, we contribute a novel perception-inspired algorithm for Body Warping-based Hand Redirection (HR), an open-source framework for HR, and psychophysical insights. The thesis concludes by proving that the combination of DPHF and HR can outperform the individual techniques in terms of the achievable flexibility of the proxy-based haptic feedback.
Vinh Thinh Ho
(Advisor: Prof. Gerhard Weikum)
Entities with Quantities: Extraction, Search, and Ranking
Friday, 18.10.22, 10:30h
Building E1 4, Room 0.24
Abstract:
Quantities are more than numeric values. They denote measures of the world’s entities such as heights of buildings, running times of athletes, energy efficiency of car models or energy production of power plants, all expressed in numbers with associated units. Entity-centric search and question answering (QA) are well supported by modern search engines. However, they do not work well when the queries involve quantity filters, such as searching for athletes who ran 200m under 20 seconds or companies with quarterly reve-nue above $2 Billion. State-of-the-art systems fail to understand the quantities, including the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.). QA systems based on structured knowledge bases (KBs) also fail as quantities are poorly covered by state-of-the-art KBs. In this dissertation, we developed new methods to advance the state-of-the-art on quantity knowledge extraction and search.
Patrick SPEICHER
(Advisor: Prof. Michael Backes)
Simulated Penetration Testing and Mitigation Analysis
Tuesday, 18.10.22, 11:00h
Building E9 1, Room 0.01
Abstract:
As corporate networks and Internet services are becoming increasingly more complex, it is hard to keep an overview over all deployed software, their potential vulnerabilities, and all existing security protocols. Simulated penetration testing was proposed to extend regular penetration testing by transferring gathered information about a network into a formal model and simulate an attacker in this model. Having a formal model of a network enables us to add a defender trying to mitigate the capabilities of the attacker with their own actions. We name this two-player planning task Stackelberg planning. The goal behind this is to help administrators, penetration testing consultants, and the management level at finding weak spots of large computer infrastructure and suggesting cost-effective mitigations to lower the security risk.
In this thesis, we first lay the formal and algorithmic foundations for Stackelberg planning tasks. By building it in a classical planning framework, we can benefit from well-studied heuristics, pruning techniques, and other approaches to speed up the search, for example symbolic search. We then design a theory for privilege escalation and demonstrate the applicability of our framework to local computer networks and to Internet-wide scenarios by investigating the robustness of both the email infrastructure and the web.
Attila KINALI-DOGAN
(Advisor: Dr. Christoph Lenzen)
On Time, Time Synchronization and Noise in Time Measurement Systems
Wednesday, 12.10.22, 10:00h
Building E1 4, Room 0.24
Abstract:
Time plays an important role in our modern lives. Having accurate time means having clocks synchronized to other clocks with as little time difference between them. We will look into the properties of noise in time keeping and measurement, introducing a variant of the Fourier Transform for power limited signals. We will then continue to noise propagation in electronics and the degradation of signal-to-noise ratio due to non-linearities. And finally we will look into the effect of metastabil-ity in distributed clock synchronization systems.
Ahmed SALEM
(Advisor: Prof. Michael Backes)
Adversarial Inference and Manipulation of Machine Learning Models
Tuesday, 04.10.22, 11:00h
Abstract:
Machine learning (ML) has established itself as a core component for various critical applications. However, with this increasing adoption rate of ML models, multiple attacks have emerged targeting different stages of the ML pipeline. Abstractly, the ML pipeline is divided into three phases, including training, updating, and inference.
In this thesis, we evaluate the privacy, security, and accountability risks of the three stages of the ML pipeline. Firstly, we explore the inference phase, where the adversary can only access the target model after deployment. In this setting, we explore one of the most severe attacks against ML models, namely the membership inference attack (MIA). We relax all the MIA’s key assumptions, thereby showing that such attacks are broadly applicable at low cost and thereby pose a more severe risk than previously thought. Secondly, we study the updating phase. To that end, we propose a new attack surface against ML models, i.e., the change in the output of an ML model before and after being updated. We then introduce four attacks, including data reconstruction ones, against this setting. Thirdly, we explore the training phase, where the adversary interferes with the target model’s training. In this setting, we propose the model hijacking attack, in which the adversary can hijack the target model to provide their own illegal task. Finally, we propose different defense mechanisms to mitigate such identified risks.
September
Jonas SCHNEIDER-BENSCH
(Advisor: Prof. Michael Backes)
New approaches to Privacy Preserving Signatures
Monday, 26.09.22, 16:30h
Abstract:
In this thesis we advance the theory and practice of privacy preserving digital signatures. Privacy preserving signatures such as group and ring signatures enable signers to hide in groups of potential signers. We design a cryptographic primitive called signatures with flexible public keys, which allows for modular construction of privacy preserving signatures. Its core is an equivalence relation between verification keys, such that key repre-sentatives can be transformed in their class to obscures their origin. The resulting con-structions are more efficient than the state of the art, under the same or weaker assump-tions. We show an extension of the security model of fully dynamic group signatures, which are those where members may join and leave the group over time. Our contribu-tion here, which is facilitated by the new primitive, is the treatment of membership status as potentially sensitive information. In the theory of ring signatures, we show a construc-tion of ring signatures which is the first in the literature with logarithmic signature size in the size of the ring without any trusted setup or reliance on non-standard assumptions. We show how to extend our techniques to the derived setting of linkable ring signatures, where different signatures of the same origin may be publicly linked. Here, we further revisit the notion of linkable anonymity, offering a significant strengthening compared to previous definitions.
Michael BACKENKÖHLER
(Advisor: Prof. Verena Wolf)
Analysis of Markovian Population Models
Tuesday, 20.09.22, 14;00h
Building E1 7, room 001
Abstract:
Markovian population models are a powerful paradigm to describe processes of stochastically interacting agents. Their dynamics is given by a continuous-time Markov chains over the population sizes. Such large state-spaces make their analysis challenging.
In this thesis we develop methods for this problem class leveraging their structure. We derive linear moment constraints on the expected occupation measure and exit probabilities. In combination with semi-definite constraints on moment matrices, we obtain a convex program. This way, we are able to provide bounds on mean first-passage times and reaching probabilities. We further use these linear constraints as control variates to improve Monte Carlo estimation of different quantities. Two different algorithm for the construction of efficient variate sets are presented and evaluated.
Another set of contributions is based on a state-space lumping scheme that aggregates states in a grid structure. Based on the probabilities of these approximations we iteratively refine relevant and truncate irrelevant parts of the state-space. This way, the algorithm learns a well-justified finite-state projection for different scenarios.
August
Jozef Hladký
(Advisor: Prof. Hans-Peter Seidel)
Latency Hiding through High Fidelity Novel View Synthesis on Thin Clients using Decoupled Streaming Rendering from Powerful Servers
Tuesday, 16.08.2022, 15:00 h
Building E1 4, Room 0.24
Abstract:
Highly responsive 3D applications with state-of-the-art visual fidelity have always been associated with heavy immobile workstation hardware. By offloading demanding computations to powerful servers in the cloud, streaming 3D content from the data center to a thin client can deliver high fidelity responsive experience that is indistinguishable from the content computed locally on a powerful workstation. We introduce methods suitable for this scenario that enable network latency hiding.
In the first part, we introduce a novel high-dimensional space—the camera offset space—and show how it can help to identify an analytical potentially visible set of geometry valid for a range of camera translational and rotational offsets. We demonstrate an efficient parallel implementation of the visibility resolution algorithm which leads to a first-ever method for computing a PVS that is valid for an analytical range of camera offsets, is computable in real-time without the need of pre-processing or spatial data structure construction and requires only raw triangle stream as an input.
In the second part of the thesis, we focus on capturing the scene appearance into structures that enable efficient encoding and decoding, transmission, low memory footprint, and high-fidelity high-framerate reconstruction on the client. Multiple strategies for shading sample distribution and texture atlas packing layouts are presented and analyzed for shading reconstruction quality, packing and compression efficiency.
The third part of the thesis presents a data structure that jointly encodes both appearance and geometry into a texture atlas.
The scene G-Buffer is processed to construct coarse low-resolution geometric proxies which capture the scene appearance and simple planar surfaces. These proxies can be locally augmented with high resolution data to capture complex geometry in sufficient detail, achieving efficient sample distribution and allocation. Capturing the scene from multiple views enables disocclusion support and allows network latency hiding on a thin client device.
July
Jonas FISCHER
(Advisor: Prof. Jilles Vreeken)
More than the sum of its parts – pattern mining, neural networks and how they complement each other
Thursday, 28.07.2022, 13:00 h
Building E1 4, Room 0.24
Abstract:
In this thesis we explore pattern mining and deep learning. Often seen as or-thogonal, we show that these fields complement each other and propose to combine them to gain from each other’s strengths. We, first, show how to ef-ficiently discover succinct and non-redundant sets of patterns that provide insight into data beyond conjunctive statements. We leverage the interpreta-bility of such patterns to unveil how and which information flows through neural networks, as well as what characterizes their decisions. Conversely, we show how to combine continuous optimization with pattern discovery, pro-posing a neural network that directly encodes discrete patterns, which allows us to apply pattern mining at a scale orders of magnitude larger than previ-ously possible. Large neural networks are, however, exceedingly expensive to train for which ‘lottery tickets’ – small, well-trainable sub-networks in ran-domly initialized neural networks – offer a remedy. We identify theoretical limitations of strong tickets and overcome them by equipping these tickets with the property of universal approximation. To analyze whether limitations in ticket sparsity are algorithmic or fundamental, we propose a framework to plant and hide lottery tickets. With novel ticket benchmarks we then conclude that the limitation is likely algorithmic, encouraging further developments for which our framework offers means to measure progress.
David STUTZ
(Advisors: Prof. Bernt Schiele and Prof. Matthias Hein, now Tübingen)
Understanding and Improving Robustness and Uncertainty Estimation in Deep Learning
Friday, 22.07.2022, 14:00 h
Building E1 4, Room 0.24 (and via video conference: https://zoom.us/j/92313014265?pwd=UUxaVXcvVEZVaUhucEt5dUlGVXVBdz09)
Abstract:
Deep learning is becoming increasingly relevant for many high-stakes applications such as medical diagnosis or autonomous driving where wrong decisions can have massive impact on human lives. Unfortunately, deep neural networks are typically assessed solely based on generalization, e.g., accuracy on a fixed test set. However, this is clearly insufficient for safe deployment as potential malicious actors and distribution shifts or the effects of quantization and unreliable hardware are disregarded. In such settings, it is also important to obtain reasonable estimates of the model’s confidence alongside its predictions.
This thesis studies robustness and uncertainty estimation in deep learning along three main directions: First, we consider so-called adversarial examples, slightly perturbed inputs causing severe drops in accuracy. Second, we study weight perturbations, focusing particularly on bit errors in quantized weights. This is relevant for deploying models on special-purpose hardware for efficient inference, so-called accelerators. Finally, we address uncertainty estimation to improve robustness and provide meaningful statistical performance guarantees for safe deployment.
Lukas LANGE
(Advisor: Prof. Dietrich Klakow)
Robust Input Representations for Low-Resource Information Extraction
Friday, 22.07.2022, 10:00 h
Building B3 1, Auditorium 0.11
Abstract:
Recent advances in the field of natural language processing were achieved with deep learning models. This led to a wide range of new research questions concerning the stabil-ity of such large-scale systems and their applicability beyond well-studied tasks and da-tasets, such as information extraction in non-standard domains and languages, in particu-lar, in low-resource environments.
In this work, we address these challenges and make important contributions across fields such as representation learning and transfer learning by proposing novel model architec-tures and training strategies to overcome existing limitations, including a lack of training resources, domain mismatches and language barriers.
In particular, we propose solutions to close the domain gap between representation mod-els. We explore domain-adaptive pre-training of multilingual languages models. Further-more, we present a novel meta-embedding architecture that creates joint representa-tions of multiple embedding methods.
Our broad set of experiments demonstrates state-of-the-art performance of our methods for various sequence tagging and classification tasks and highlight their robustness in chal-lenging low-resource settings across languages and domains.
Adwait SHARMA
(Advisor: Prof. Jürgen Steimle)
Design and Recognition of Microgestures for Always-Available Input
Thursday, 21.07.2022, 15:00 h
The defense will take place via video conference: https://cs-uni-saarland-de.zoom.us/j/82822036987
Abstract:
Gestural user interfaces for computing devices most commonly require the user to have at least one hand free to interact with the device, for example, moving a mouse, touching a screen, or performing mid-air gestures. Consequently, users find it difficult to operate compu-ting devices while holding or manipulating everyday objects. This limits the users from interac-ting with the digital world during a significant portion of their everyday activities, such as, using tools in the kitchen or workshop, carrying items, or workout with sports equipment.
This thesis pushes the boundaries towards the bigger goal of enabling always-available input. Microgestures have been recognized for their potential to facilitate direct and subtle interac-tions. However, it remains an open question how to interact using gestures with computing de-vices when both of the user’s hands are occupied holding everyday objects. We take a holistic approach and focus on three core contributions: i) To understand end-users preferences, we present an empirical analysis of users’ choice of microgestures when holding objects of diverse geometries. Instead of designing a gesture set for a specific object or geometry and to identify gestures that generalize, this thesis leverages the taxonomy of grasp types established from prior research. ii) We tackle the critical problem of avoiding false activation by introducing a novel gestural input concept that leverages a single-finger movement, which stands out from everyday finger motions during holding and manipulating objects. Through a data-driven ap-proach, we also systematically validate the concept’s robustness with different everyday ac-tions. iii) While full sensor coverage on the user’s hand would allow detailed hand-object inter-action, minimal instrumentation is desirable for real-world use. This thesis addresses the prob-lem of identifying sparse sensor layouts. We present the first rapid computational method, a-long with a GUI-based design tool that enables iterative design based on the designer’s high-level requirements. Furthermore, we demonstrate that minimal form-factor devices, like smart rings, can be used to effectively detect microgestures in hands-free and busy scenarios.
Overall, the presented findings will serve as both conceptual and technical foundations for enabling interaction with computing devices wherever and whenever users need them.
June
Andrea HORNAKOVA
(Advisor: Dr. Paul Swoboda)
Lifted Edges as Connectivity Priors for Multicut and Disjoint Paths
Wednesday, 29.06.2022, 14:00 h
Building E1 5, Room 0.29
Abstract:
We study two graph decompositions problems and their representation by 0/1 labeling of edges. The first is multicut (MC) which represents decompositions of undirected graphs (clustering of nodes into connected components). The second is disjoint paths (DP) in di-rected acyclic graphs where the clusters correspond to node-disjoint paths. Our main in-terest is to study connectivity priors represented by so-called lifted edges in the two prob-lems. We call the resulting problems lifted multicut (LMC) and lifted disjoint paths (LDP). Our study of lifted multicut concentrates on partial LMC represented by labeling of a sub-set of (lifted) edges. Given partial labeling, some NP-hard problems arise.
The main focus of the talk is LDP problem. We prove that this problem is NP-hard and propose an optimal integer linear programming (ILP) solver. The solver uses linear ine-qualities that produce a high-quality LP relaxation. LDP is a convenient model for multiple object tracking (MOT) because DP naturally leads to trajectories of objects and lifted ed-ges help to prevent id switches and re-identify persons. Our tracker using the optimal LDP solver was a leading tracker on three benchmarks of the MOT challenge MOT15/16/17, improving significantly over state-of-the-art at the time of its publication. In order to sol-ve even larger instances of a challenging dataset MOT20, we introduce an approximate LDP solver based on Lagrange decomposition. The new tracker achieved on all the four standard MOT benchmarks performance comparable or better than state-of-the-art me-thods (at the time of publication).
Michaela KLAUCK
(Advisor: Prof. Holger Hermanns)
On the Connection of Probabilistic Model Checking, Planning, and Learning for System Verification
Tuesday, 28.06.2022, 11:30 h
Building E1 7, Room 0.01
Abstract:
This thesis presents approaches using techniques from the model checking, planning, and learning community to make systems more reliable and perspicuous. First, two heuristic search and dynamic programming algorithms are adapted to be able to check extremal reachability probabilities, expected accumulated rewards, and their bounded versions, on general Markov decision processes (MDPs). Thereby, the problem space originally solvable by these algorithms is enlarged considerably. In a comprehensive case study it is shown that the implementation, called Modysh, is competitive with state-of-the-art model checkers and even outperforms them on very large state spaces. Second, Deep Statistical Model Checking (DSMC) is introduced, usable for quality assessment and learning pipeline analysis of systems incorporating trained decision-making agents, like neural networks (NNs). The idea of DSMC is to use statistical model checking to assess NNs resolving nondeterminism in systems modeled as MDPs. In a comprehensive scalability study it is demonstrated that DSMC is a lightweight technique tackling the complexity of NN analysis in combination with the state space explosion problem.
Gilles NIES
(Advisor: Prof. Holger Hermanns)
Mastering Satellite Operation: On Model-based and Data-driven Optimal Battery-Aware Scheduling
Wednesday, 22.06.2022, 17:00 h
Building E1 7, Room 0.01
Abstract:
Rechargeable batteries as a power source have become omnipresent. Especi-ally the lithium-ion variant powers virtually every contemporary portable device, it constitutes the central energy source in the domains of e-mobility, autonomous drones and robots, most earth-orbiting spacecraft like satellites and many more.
In this work, we take the perspective of the kinetic battery model, an intuitive analytic model, that keeps track of a battery’s charge as it is strained with a sequence of loads over time. We provide the mathematical foundation of the battery model, extend it with capacity limits and realistic charging behavior, as well as uncertainty in its initial state and the load it is strained with. We derive efficient energy budget analysis algorithms in the form of discre-tization and analytical bisection schemes, discuss their efficiency, empirically analyze their performance and identify their individual strengths and weaknesses. We show how our techniques can be used during design time and in-flight, in the context of the two nanosatellite missions GomX-1 and GomX-3.
Ben WIEDERHAKE
(Advisor: Dr. Christoph Lenzen)
Pulse Propagation, Graph Cover, and Packet Forwarding
Tuesday, 14.06.2022, 10:00 h
Building E1 4, Room 0.24
Abstract:
We study distributed systems, with a particular focus on graph problems and fault tolerance.
Fault-tolerance in a microprocessor or even System-on-Chip can be improved by using a fault-tolerant pulse propagation design. The existing design TRIX achieves this goal by being a distributed system consisting of very simple nodes. We show that even in the typical mode of operation without faults, TRIX performs significantly better than a regular wire or clock tree: Statistical evaluation of our simulated experiments show that we achieve a skew with standard deviation of O(log log H), where H is the height of the TRIX grid.
The distance-r generalization of classic graph problems can give us insights on how distance affects hardness of a problem. For the distance-r dominating set problem, we present both an algorithmic upper and unconditional lower bound for any graph class with certain high-girth and sparseness criteria. In particular, our algorithm achieves a O(r · f(r))-approximation in time O(r), where f is the expansion function, which correlates with density. For constant r, this implies a constant approximation factor, in constant time. We also show that no algorithm can achieve a (2r + 1 − δ)-approximation for any δ > 0 in time O(r), not even on the class of cycles of girth at least 5r. Furthermore, we ex-tend the algorithm to related graph cover problems and even to a different execution model.
Furthermore, we investigate the problem of packet forwarding, which address-es the question of how and when best to forward packets in a distributed sys-tem. These packets are injected by an adversary. We build on the existing algo-rithm OED to handle more than a single destination. In particular, we show that buffers of size O(log n) are sufficient for this algorithm, in contrast to O(n) for the naive approach.
Maximilian SCHWENGER
(Advisor: Prof. Bernd Finkbeiner)
Statically Analyzed Stream Monitoring for Cyber-Physical Systems
Monday, 13.06.2022, 16:00 h
Building E9 1, Room 0.01
Abstract:
Cyber-physical systems are digital systems interacting with the physical world. Even though this induces an inherent complexity, they are responsible for safety-critical tasks like governing nuclear power plants or controlling autonomous vehicles. To preserve trust into the safety of such systems, this thesis presents a runtime verification approach designed to generate trustworthy monitors from a formal specification. These monitors are responsible for observing the cyber-physical system during runtime and ensuring its safety. As underlying language, I present the asynchronous real-time specification language RTLola. It contains primitives for arithmetic properties and grants precise control over the timing of the monitor. With this, it enables specifiers to express properties relevant to cyber-physical systems. The thesis further presents a static analysis that identifies inconsistencies in the specification and provides insights into the dynamic behavior of the monitor. As a result, the resource consumption of the monitor becomes predictable. The generation of the monitor produces either a hardware description synthesizable onto programmable hardware, or Rust code with verification annotation. These annotations allow for proving the correctness of the monitor with respect to the semantics of RTLola. Last, I present the construction of a conservative hybrid model of the underlying system using information extracted from the specification. This model enables further verification steps.
Anna GUIMARAES
(Advisor: Prof. Gerhard Weikum)
Data Science Methods for the Analysis of Controversial Social Media Discussions
Friday, 10.06.2022, 13:00 h
Abstract:
Social media communities like Reddit and Twitter allow users to express their views on topics of their interest, and to engage with other users who may share or oppose these views. This can lead to productive discussions towards a consensus, or to contended debates, where disagreements frequently arise. Prior work on such settings has primarily focused on identifying notable instances of antisocial behavior such as hate-speech and „trolling“, which represent possible threats to the health of a community. These, however, are exceptionally severe phenomena, and do not encompass controversies stemming from user debates, differences of opinions, and off-topic content, all of which can naturally come up in a discussion without going so far as to compromise its development.
This dissertation proposes a framework for the systematic analysis of social media discussions that take place in the presence of controversial themes, disagreements, and mixed opinions from participating users. We start by building a feature model to characterize adversarial discussions surrounding political campaigns on Twitter, with a focus on the factual and sentimental nature of their topics and the role played by different users involved. We then extend our approach to Reddit discussions, leveraging community feedback signals to define a new notion of controversy and to highlight conversational archetypes that arise from frequent and interesting interaction patterns. We use our feature model to build logistic regression classifiers that can predict future instances of controversy in Reddit communities centered on politics, world news, sports, and personal relationships. Finally, our model also provides the basis for a comparison of different communities in the health domain, where topics and activity vary considerably despite their shared overall focus.
May
Preethi LAHOTI
(Advisor: Prof. Gerhard Weikum)
Operationalizing Fairness for Responsible Machine Learning
Friday, 20.05.2022, 16:15 h
Abstract:
As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This dissertation brings ML fairness closer to real-world applications by developing new models and methods that address key challenges in operationalizing fairness in practice.
First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset.
Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation.
Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs to capture nuanced expert knowledge on which individuals should be treated similarly with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data.
Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen failure risks. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty.
Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.
Lassaad CHEIKHROUHOU
(Advisor: PD Dr. Werner Stephan)
Inductive Verification of Cryptographic Protocols Based on Message Algebras – Trace and Indistinguishability Properties
Thursday, 19.05.2022, 16:30 h
Building E1 4, room 0.24
Abstract:
Seit 1981 wurden zahlreiche formale Methoden zur Analyse kryptographischer Protokolle entwickelt und erfolgreich angewendet. Trotz vieler Verbesserungen, beschränkt sich der Anwendungsbereich gerade induktiver Verfahren auf das einfache enc-dec Szenario (Entschlüsseln hebt Verschlüsseln ab) und auf Standardeigenschaften (Vertraulichkeit und Authentifizierung).
In dieser Arbeit erweitern wir den Anwendungsbereich der werkzeugunterstützten induktiven Methode auf Protokolle mit algebraisch spezifizierten kryptografischen Primitiven und auf Ununterscheidbarkeitseigenschaften wie die Resistenz gegen Offline-Testen.
Eine Axiomatisierung von Nachrichtenstrukturen, abgeleitet aus einem konstruktiven Modell (Termersetzung), liefert die Basis für die Definition rekursiver Funktionen und induktives Schließen (partielle Ordnungen, Fallunterscheidungen). Eine neue Beweistechnik
für Vertraulichkeitseigenschaften verwendet rekursive Testfunktionen, die beweisbar korrekt bzgl. eines induktiv definierten Angreifermodells sind. Die Formalisierung von Ununterscheidbarkeitseigenschaften durch generische Ableitungen und ein zentrales Theorem erlauben eine Reduktion auf Trace-Eigenschaften.
Die allgemeinen Aspekte unserer Techniken werden zusammen mit zwei vollständig ausgearbeiteten realen Fallstudien, PACE und TC-AMP, diskutiert, die für den deutschen Personalausweis entwickelt wurden. TC-AMP gehört sicher zu den komplexesten algebraisch
spezifizierten Protokollen, die formal verifiziert wurden. Insbesondere, sind uns keine Ansätze bekannt, die vergleichbare Fälle behandeln.
André NUSSER
(Advisor: Prof. Karl Bringmann)
Fine-Grained Complexity and Algorithm Engineering of Geometric Similarity Measures
Friday, 13.05.2022, 16:00 h
Abstract:
Point sets and sequences are fundamental geometric objects that arise in any application that considers movement data, geometric shapes, and many more. A crucial task on these objects is to measure their similarity.
Therefore, this thesis presents results on algorithms, complexity lower bounds, and algorithm engineering of the most important point set and sequence similarity measures like the Fréchet distance, the Fréchet distance under translation, and the Hausdorff distance under translation. As an extension to the mere computation of similarity, also the approximate near neighbor problem for the continuous Fréchet distance on time series is considered and matching upper and lower bounds are shown.
Nico HERBIG
(Advisor: Prof. Antonio Krüger)
Multi-Modal Post-Editing of Machine Translation
Thursday, 12.05.2022, 10:00 h
Abstract:
As MT quality continues to improve, more and more translators switch from traditional translation from scratch to PE of MT output, which has been shown to save time and reduce errors. Instead of mainly generating text, translators are now asked to correct errors within otherwise helpful translation proposals, where repetitive MT errors make the process tiresome, while hard-to-spot errors make PE a cognitively demanding activity. Our contribution is three-fold: first, we explore whether interaction modalities other than mouse and keyboard could well support PE by creating and testing the MMPE translation environment. MMPE allows translators to cross out or hand-write text, drag and drop words for reordering, use spoken commands or hand gestures to manipulate text, or to combine any of these input modalities. Second, our interviews revealed that translators see value in automatically receiving additional translation support when a high CL is detected during PE. We therefore developed a sensor framework using a wide range of physiological and behavioral data to estimate perceived CL and tested it in three studies, showing that multi-modal, eye, heart, and skin measures can be used to make translation environments cognition-aware. Third, we present two multi-encoder Transformer architectures for APE and discuss how these can adapt MT output to a domain and thereby avoid correcting repetitive MT errors.
Maximilian ALTMEYER
(Advisor: Prof. Antonio Krüger)
Understanding Personal and Contextual Factors to Increase Motivation in Gamified Systems
Tuesday, 03.05.2022, 15:00 h
Abstract:
Gamification, the use of game elements in non-game contexts, has been shown to help people reaching their goals, affect people’s behavior and enhance the users‘ experience within interactive systems. However, past research has shown that gamification is not always successful. In fact, literature reviews revealed that almost half of the interventions were only partially successful or even unsuccessful. Therefore, understanding the factors that have an influence on psychological measures and behavioral outcomes of gamified systems is much in need. In this thesis, we contribute to this by considering the context in which gamified systems are applied and by understanding personal factors of users interacting with the system. Guided by Self-Determination Theory, a major theory on human motivation, we investigate gamification and its effects on motivation and behavior in behavior change contexts, provide insights on contextual factors, contribute knowledge on the effect of personal factors on both the perception and effectiveness of gamification elements and lay out ways of utilizing this knowledge to implement personalized gamified systems. Our contribution is manifold: We show that gamification affects motivation through need satisfaction and by evoking positive affective experiences, ultimately leading to changes in people’s behavior. Moreover, we show that age, the intention to change behavior, and Hexad user types play an important role in explaining interpersonal differences in the perception of gamification elements and that tailoring gamified systems based on these personal factors has beneficial effects on both psychological and behavioral outcomes. Lastly, we show that Hexad user types can be partially predicted by smartphone data and interaction behavior in gamified systems and that they can be assessed in a gameful way, allowing to utilize our findings in gamification practice. Finally, we propose a conceptual framework to increase motivation in gamified systems, which builds upon our findings and outlines the importance of considering both contextual and personal factors. Based on these contributions, this thesis advances the field of gamification by contributing knowledge to the open questions of how and why gamification works and which factors play a role in this regard.
April
Cuong Xuan CHU
(Advisor: Prof. Gerhard Weikum)
Knowledge Extraction from Fictional Texts
Monday, 25.04.2022, 10:00 h
Abstract:
Knowledge extraction from text is a key task in natural language processing, which involves many sub-tasks, such as taxonomy induction, named entity recognition and typing, relation extraction, knowledge canonicalization and so on. By constructing structured knowledge from natural language text, knowledge extraction becomes a key asset for search engines, question answering and other downstream applications. However, current knowledge extraction methods mostly focus on prominent real-world entities with Wikipedia and mainstream news articles as sources. The constructed knowledge bases, therefore, lack information about long-tail domains, with fiction and fantasy as archetypes. Fiction and fantasy are core parts of our human culture, spanning from literature to movies, TV series, comics and video games. With thousands of fictional universes which have been created, knowledge from fictional domains are subject of search-engine queries – by fans as well as cultural analysts. Unlike the real-world domain, knowledge extraction on such specific domains like fiction and fantasy has to tackle several key challenges:
– Training data: Sources for fictional domains mostly come from books and fan-built content, which is sparse and noisy, and contains difficult structures of texts, such as dialogues and quotes. Training data for key tasks such as taxonomy induction, named entity typing or relation extraction are also not available.
– Domain characteristics and diversity: Fictional universes can be highly sophisticated, containing entities, social structures and sometimes languages that are completely different from the real world. State-of-the-art methods for knowledge extraction make assumptions on entity-class, subclass and entity-entity relations that are often invalid for fictional domains. With different genres of fictional domains, another requirement is to transfer models across domains.
– Long fictional texts: While state-of-the-art models have limitations on the input sequence length, it is essential to develop methods that are able to deal with very long texts (e.g. entire books), to capture multiple contexts and leverage widely spread cues.
This dissertation addresses the above challenges, by developing new methodologies that advance the state of the art on knowledge extraction in fictional domains.
– The first contribution is a method, called TiFi, for constructing type systems (taxonomy induction) for fictional domains. By tapping noisy fan-built content from online communities such as Wikia, TiFi induces taxonomies through three main steps: category cleaning, edge cleaning and top-level construction. Exploiting a variety of features from the original input, TiFi is able to construct taxonomies for a diverse range of fictional domains with high precision.
– The second contribution is a comprehensive approach, called ENTYFI, for named entity recognition and typing in long fictional texts. Built on 205 automatically induced high-quality type systems for popular fictional domains, ENTYFI exploits the overlap and reuse of these fictional domains on unseen texts. By combining different typing modules with a consolidation stage, ENTYFI is able to do fine-grained entity typing in long fictional texts with high precision and recall.
– The third contribution is an end-to-end system, called KnowFi, for extracting relations between entities in very long texts such as entire books. KnowFi leverages background knowledge from 142 popular fictional domains to identify interesting relations and to collect distant training samples. KnowFi devises a similarity-based ranking technique to reduce false positives in training samples and to select potential text passages that contain seed pairs of entities. By training a hierarchical neural network for all relations, KnowFi is able to infer relations between entity pairs across long fictional texts, and achieves gains over the best prior methods for relation extraction.
Ivan PRYVALOV
(Advisor: Prof. Michael Backes)
Cryptography with Anonymity in Mind
Monday, 11.04.2022, 14:00 h
Abstract:
Advances in information technologies gave a rise to powerful ubiquitous computing devices, and digital networks have enabled new ways of fast communication, which immediately found tons of applications and resulted in large amounts of data being transmitted. For decades, cryptographic schemes and privacy-preserving protocols have been studied and researched in order to offer end users privacy of their data and implement useful functionalities at the same time, often trading security properties for cryptographic assumptions and efficiency. In this plethora of cryptographic constructions, anonymity properties play a special role, as they are important in many real-life scenarios. However, many useful cryptographic primitives lack anonymity properties or imply prohibitive costs to achieve them.
In this thesis, we expand the territory of cryptographic primitives with anonymity in mind. First, we define Anonymous RAM, a generalization of a single-user Oblivious RAM to multiple mistrusted users, and present two constructions thereof with different trade-offs between assumptions and efficiency. Second, we define an encryption scheme that allows to establish chains of ciphertexts anonymously and verify their integrity. Furthermore, the aggregatable version of the scheme allows to build a Parallel Anonymous RAM, which enhances Anonymous RAM by supporting concurrent users. Third, we show our technique for constructing efficient non-interactive zero-knowledge proofs for statements that consist of both algebraic and arithmetic statements. Finally, we show our framework for constructing efficient single secret leader election protocols, which have been recently identified as an important component in proof-of-stake cryptocurrencies.
Marten OLTROGGE
(Advisor: Prof. Michael Backes)
TLS on Android – Evolution over the last decade
Thursday, 07.04.2022, 10:30 h
Abstract:
Smart devices and mobile platforms are omnipresent. Android OS has evolved to become the most dominating mobile operating system on the market with billions of devices and a platform with millions of apps. Apps increasingly offer solutions to everyday problems and have become an indispensable part of people’s daily life. Due to this, mobile apps carry and handle more and more personal and privacy-sensitive data which also involves communication with backend or third party services. Due to this, their network traffic is an attractive target for MitMAs. In my talk, I will present a line of research addressing security on Android with a main focus on the network security of Android apps. This work covers various approaches for improving network security on Android and investigates their efficacy as well as it puts findings in context with the general evolution of network security in a larger perspective.
March
Nikolas HAVRIKOV
(Advisor: Prof. Andreas Zeller)
Grammar-Based Fuzzing Using Input Features
Friday, 04.03.2022, 10:00 h
Abstract:
In grammar-based fuzz testing, a formal grammar is used to produce test inputs that are syntactically valid in order to reach the business logic of a program under test.
In this setting, it is advantageous to ensure a high diversity of inputs to test more of the program’s behavior. How can we characterize features that make inputs diverse and associate them with the execution of particular parts of the program?
We present a measure of input coverage called k-path coverage, which makes it possible to efficiently express, assess, and achieve input diversity. We demonstrate and evaluate how to systematically attain k-path coverage, how it correlates with code coverage and can thus be used as its predictor. By automatically inferring explicit associations between k-path features and the coverage of individual methods we further show how to generate inputs that specifically target the execution of given code locations.
Franziska LICHTBLAU
(Advisor: Prof. Anja Feldmann)
From the Edge to the Core: Towards Informed Vantage Point Selection for Internet Measurement Studies
Tuesday, 01.03.2022, 11:00 h
Abstract:
Since the early days of the Internet, measurement scientists are trying to keep up with the fast-paced development of the Internet. As the Internet grew organically over time and without build-in measurability, this process requires many workarounds and due diligence. As a result, every measurement study is only as good as the data it relies on. Moreover, data quality is relative to the research question—a data set suitable to analyze one problem may be insufficient for another. This is entirely expected as the Internet is decentralized, i.e., there is no single observation point from which we can assess the complete state of the Internet. Because of that, every measurement study needs specifically selected vantage points, which fit the research question. In this thesis, we present three different vantage points across the Internet topology — from the e.g., to the Internet core. We discuss their specific features, suitability for different kinds of research questions, and how to work with the corresponding data. The data sets obtained at the presented vantage points allow us to conduct three different measurement studies and shed light on the following aspects: (a) The prevalence of IP source address spoofing at a large European Internet Exchange Point (IXP), (b) the propagation distance of BGP communities, an optional transitive BGP attribute used for traffic engineering, and (c) the impact of the global COVID-19 pandemic on Internet usage behavior at a large Internet Service Provider (ISP) and three IXPs.
February
Harini P. HARIHARAN
(Advisor: Prof. Thorsten Herfet)
Non-Disruptive Use of Light Fields in Image and Video Processing
Wednesday, 23.02.2022, 10:00 h
Abstract:
In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-to-reality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.
Kireeti BODDUNA
(Advisor: Prof. Joachim Weickert)
Structure-aware Image Denoising, Super-resolution and Enhancement Methods
Tuesday, 22.02.2022, 16:15 h
Abstract:
Denoising, super-resolution and structure enhancement are lassical image processing applications. The motive behind their existence is to aid our visual analysis of raw digital images. Despite tremendous progress in these fields, certain difficult problems are still open to research. For example, denoising and super-resolution techniques which possess all the following properties, are very scarce: They must preserve critical structures like corners, should be robust to the type of noise distribution, avoid undesirable artefacts, and also be fast. The area of structure enhancement also has an unresolved issue: Very little efforts have been put into designing models that can tackle anisotropic deformations in the image acquisition process. In this thesis, we design novel methods in the form of partial differential equations, patch-based approaches and variational models to overcome the aforementioned obstacles. In most cases, our methods outperform the existing approaches in both quality and speed, despite being applicable to a broader range of practical situations.
Anna TIGUNOVA
(Advisor: Prof. Gerhard Weikum)
Extracting personal information from conversations
Friday, 18.02.2022, 11:00 h
Abstract:
Personal knowledge is a versatile resource that is valuable for a wide range of downstream applications, such as chatbot assistants, recommendation models and personalized search. A Personal Knowledge Base, populated with personal facts, such as demographic information, interests and interpersonal relationships, is a unique endpoint for storing and querying personal knowledge, providing users with full control over their personal information. To alleviate users from extensive manual effort to build such personal knowledge base, we can leverage automated extraction methods applied to the textual content of the users, such as dialogue transcripts or social media posts. Such conversational data is abundant but challenging to process and requires specialized methods for extraction of personal facts.
In this dissertation we address the acquisition of personal knowledge from conversational data. We propose several novel deep learning models for inferring speakers‘ personal attributes, such as demographic facts, hobbies and interpersonal relationships. Experiments with various conversational texts, including Reddit discussions and movie scripts, demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.
Aditya NITTALA
(Advisor: Prof. Jürgen Steimle)
From Wearable Towards Epidermal Computing: Soft Wearable Devices for Rich Interaction on the Skin
Friday, 11.02.2022, 17:30 h
Abstract:
Human skin provides a large, always available, and easy to access real-estate for interaction. Recent advances in new materials, electronics, and human-computer interaction have led to the emergence of electronic devices that reside directly on the user’s skin. These conformal devices referred to as Epidermal Devices, have mechanical properties compatible with human skin: they are very thin, often thinner than human hair; they elastically deform when the body is moving, and stretch with the user’s skin.
Firstly, this thesis provides a conceptual understanding of Epidermal Devices in the HCI literature. We compare and contrast them with other technical approaches that enable novel on-skin interactions. Then, through a multi-disciplinary analysis of Epidermal Devices, we identify the design goals and challenges that need to be addressed for advancing this emerging research area in HCI. Following this, our fundamental empirical research investigated how epidermal devices of different rigidity levels affect passive and active tactile perception. Generally, a correlation was found between the device rigidity and tactile sensitivity thresholds as well as roughness discrimination ability. Based on these findings, we derive design recommendations for realizing epidermal devices.
Secondly, this thesis contributes novel Epidermal Devices that enable rich on-body interaction. SkinMarks contributes to the fabrication and design of novel Epidermal Devices that are highly skin-conformal and enable touch, squeeze, and bend sensing with co-located visual output. These devices can be deployed on highly challenging body locations, enabling novel interaction techniques and expanding the design space of on-body interaction. Multi-Touch Skin enables high-resolution multi-touch input on the body. We present the first non-rectangular and high-resolution multi-touch sensor overlays for use on skin and introduce a design tool that generates such sensors in custom shapes and sizes. Empirical results from two technical evaluations confirm that the sensor achieves a high signal-to-noise ratio on the body under various grounding conditions and has a high spatial accuracy even when subjected to strong deformations.
Thirdly, Epidermal Devices are in contact with the skin, they offer opportunities for sensing rich physiological signals from the body. To leverage this unique property, this thesis presents the rapid fabrication and computational design techniques for realizing Multi-Modal Epidermal Devices} that can measure multiple physiological signals from the human body. Devices fabricated through these techniques can measure ECG (Electrocardiogram), EMG (Electromyogram), and EDA (Electro-Dermal Activity). We also contribute a computational design and optimization method based on underlying human anatomical models to create optimized device designs that provide an optimal trade-off between physiological signal acquisition capability and device size. The graphical tool allows for easily specifying design preferences and to visually analyze the generated designs in real-time, enabling designer-in-the-loop optimization. Experimental results show high quantitative agreement between the prediction of the optimizer and experimentally collected physiological data.
Finally, taking a multi-disciplinary perspective, we outline the roadmap for future research in this area by highlighting the next important steps, opportunities, and challenges. Taken together, this thesis contributes towards a holistic understanding of textit{Epidermal Devices}: it provides an empirical and conceptual understanding as well as technical insights through contributions in DIY (Do-It-Yourself), rapid fabrication, and computational design techniques.
Maximilian FICKERT
(Advisor: Prof. Jörg Hoffmann)
Adaptive Search Techniques in AI Planning and Heuristic Search
Friday, 11.02.2022, 16:00 h
Abstract:
Heuristic state-space search is a common approach to solve problems appearing in artificial intelligence and other subfields of computer science. This is typically viewed as a static process: the heuristic function guiding the search is assumed to be unchanged, and its resulting values are directly used for guidance without applying any further reasoning to them. Yet critical aspects of the task may only be discovered during the search, e.g., regions of the state space where the heuristic does not yield reliable values.
Our work here aims to make this process more dynamic, allowing the search to adapt to such observations, e.g., by detecting weaknesses in the heuristic function and refining it accordingly during the search. We also consider settings that inherently require adaptation: In online replanning, a plan that is being executed must be amended for changes in the environment. Similarly, in real-time search, an agent must act under strict time constraints with limited information.
Our results show that the flexibility of these algorithms makes them more robust than traditional approaches on a wide range of benchmarks, and they often yield substantial improvements over current state-of-the-art planners.
January
Simon MOLL
(Advisor: Prof. Sebastian Hack)
Vectorization System for Unstructured Codes with a Data-parallel Compiler IR
Tuesday, 25.01.2022, 10:00 h
Abstract:
Data-parallel programming is an established programming model for Single Instruction Multiple Data (SIMD) CPUs, which underlies loop vectorization, function vectorization and explicitly data-parallel languages. SIMD programs can only follow one path of execution, which makes the vectorization of data-parallel code with control flow challenging. For the issue of control flow, state of the art vectorization systems either only operate on structured code or will not give any guarantees on unstructured codes. This thesis specifically addresses the reliable vectorization of unstructured data-parallel code with contributions in their representation, analysis and transformation. The novel P-LLVM intermediate representation provides a well-defined semantics to the data-parallel region at every stage of the vectorization pipeline.
Partial control-flow linearization is a new algorithm to remove divergent control flow with linear time and guarantees on the control flow retained. We present a new divergence analysis that considers all divergence effects of P-LLVM programs. Our new approach to identifying points of joining divergent control flow is optimal on directed acyclic graphs. We extend it further to a quadratic-time algorithm for arbitrary reducible CFGs. As we show earlier approaches are either less precise or incorrect.
Finally, we generalize single-dimensional vectorization of outer loops to multi-dimensional tensorization of loop nests. SIMD targets benefit from tensorization through more opportunities for re-use of loaded values and more efficient memory access behavior.
The techniques were implemented in the Region Vectorizer (RV) for vectorization and TensorRV for loop-nest tensorization. Our evaluation validates that the general-purpose RV vectorization system matches the performance of more specialized approaches. RV performs on par with the ISPC compiler, which only supports its structured domain-specific language, on a range of tree traversal codes with complex control flow. RV is able to outperform the loop vectorizers of state-of-the-art compilers, as we show for the SPEC2017 nab_s benchmark and the XSBench proxy application.