Roman Kern > Thesis

Thesis

Options

There are mainly three options for the thesis:

Choose a topic, or propose an own topic and work on your own (see list below for options) (this is the default option)
Collaboration with local start-ups, or companies (paid Master’s thesis)
Work together with a research partner organisation

Template

Feel free to use the Latex thesis template (based on input from Karl Voit and Keith Andrews):

Template, and preview

Topics

Analysis and Simulation of Driver Behaviour

Given trajectories of vehicles (incl. lane changes) the aim is to derive prototypical driver behaviour (e.g., aggressive drivers). The work is supported by experts by the Institute of Highway Engineering and Transport Planning. They also supply the datasets and provide access to simulations software. The technical implementation could be a simulation, or an analysis, based on the interest.

Excellence in the Austrian Research Ecosystem: A Quantitative Citation Analysis

This master thesis analyses the excellence of the Austrian research ecosystem by means of a quantitative citation analysis. Citation data for articles from various institutions, including universities, universities of applied sciences and non-university research institutions, are meticulously collected and analysed via Scopus and OpenCitations APIs. The analysis provides insights into the influence and quality of the research output of these institutions. This research requires knowledge in Python and a keen interest in science of science.

Evaluating the Impact of Scientific Publications: A Citation Context Analysis Using Large Language Models

Citations are crucial for assessing the impact of scientific publications, yet quantitative analyses often overlook the varied contributions of individual citations. To address these limitations, this thesis introduces an automated classification of citation context using Large Language Models (LLMs), offering a nuanced approach that combines the strengths of quantitative and qualitative analyses. This method aims to enhance the understanding of a publication’s impact by accurately identifying the intent behind each citation. This research requires knowledge in Python and a keen interest in LLMs and science of science.

Web Application for Machine Learning

The goal is to develop a web application for sensitivity analysis and uncertainty quantification of computational model data. The work is in conjunction with UQtab. This topic can easily be combined with many scientific aspects.

Analyse the Research Groups in Austria

In an initial step a crawler is to be build to crawl information regarding Austria research institutions. Code for the crawler technology is already available. The work is in conjunction with the an EU project. Next, the topic is made available to an LLMs via a RAG system.

Technolgy Scouting

Similar to the previous topic, here the goal is to crawl web sites discussing certain technologies. Again, the goal is to provide a user friendly UI to operate an LLM, in combination with a RAG.

Information Extraction from Historical Text

The 19th century is known for its big changes in politics, technology and society - and also called long nineteenth century. The goal of the thesis is to use NLP to support historians in their work to better understand events including their causes and how they have been perceived. To this end, sources like historical news papers are to be collected, and information extraction methods should be applied. In particular, the extraction of cause and effects related to historical events are of interest.

For example, collect text from archive.org and analyse the change in language (frequency of words, frequency of phrases and/or grammar).

Talking Head

Extend the social robot Furhat with the ability to interactively listen and respond in a fluent conversation. Here the goal is to make use of models such as SpiritLM in combination with the furhat robot. The outcome is expected to study a selected conversational setting. Optionally, the Furhat robot can be extended

Few-Shot LLM Pruning

Develop a few-shot prompt for a given task, e.g., named entity recognition, classification, etc. Next, analyse the network activation and in a succeeding step prune away neuron that are not used for the target tasks. Study the impact of the level of pruning on the performance of the network.

LLMs and Counterfactual Inference

Are LLMs able to understand causality? In an existing dataset a number of “old” LLMs are analysed. The goal is to update this work and collect data with contemporary LLMs. In particular it is interesting to find out, which test do not work well. In the base case, one can find patterns where even contemporary LLMs struggle.

Change in Authorship Style (e.g., Llama)

The writing style is unique to a person, but it is also subject to change. A long text might be authored by multiple people (or a generative language model, such as Llama). The goal is to detect, where these changes in style happen.

Shift in Reporting

How are newspapers reporting about certain topic and when do they use certain words? Are articles written differently if they use “Europe” vs. articles using “European Union”? Are there event that change the way, how these are reported?

Causal Expression Extraction with LLMs

Develop prompt engineering techniques for the extraction of causal expressions from text, e.g., papers. In a next step infer more general concepts and relations with LLMs. Finally, convert the extracted causal information into knowledge graphs.

Causal Expressions in IPCC Reports

Build a automatic extraction of causal expressions and apply this to longer documents, such as the IPCC report. The causal expressions are then collected and aggregated to give a quick overview of the main arguments.

Causal Inference in NLP

Measure the strength of causal effect via textual resources. How much does an event change the way people write about a topic? The event here could be a governmental intervention, a natural disaster, an accident, a personal experience. Part of this project is to collect data via controlled experiments.

Extraction of Causal Patterns for Knowledge Base Completion

Extract causal knowledge from a specific domain and transform the extracted information in structured form. The goal is to build (or extend) a knowledge graph. Here the domain can be freely chosen.

Privacy-Preservation

Based on a dataset, define some sensitive attribute x_s, which has some predictive power on a target variable y. First, compute the part of x_s, which is helpful for prediction (correlation between x_s and y). Next, inject this information into 1) a new variable, 2) an existing variable (change the values), 3) all other variables (change all values a tiny bit). Finally, the effectiveness of the methods needs to be evaluated (i.e., same classification performance, but no more correlation between x_s and y) and the shift in the distribution (of the other variables).

Analyse Contracts

Analyse textual contracts in the German language and extract legal relevant terms and phrases from the text. This also includes dates and time ranges. In addition, one can create a reference list with phrases that are expected and then match against an existing text. Finally, this can be used to create a classification scheme for the contracts.

Climate Change Efforts

There are many local efforts in addressing climate change on regional level, which is reported on respective web pages. The goal is to systematically crawl regional web sites and identify climate change actions taken by local governments. This topic combines the engineering required for web crawling and NLP tools for information extraction.

Dataset Augmentation for Tabular Data

Based on a paper on causal GANs, reimplement the algorithm and evaluate on own datasets.

Split Features into Neighbourhood and Similarity

Many machine learning and data science tasks assume the features to semantically equal. The idea is to split the feature set into two sets, the first representing features encoding the closeness of instances, and a second set encoding the similarity between instances. This approach can then implemented in for example Local Outlier Factor or other methods.

Custom Loss for Privacy-Preservation via Causality

Develop a loss function when training e.g. a Variational Autoencoder to additionally include a loss term for the “leak” of sensitive information.

Causal Outlier/Anomaly Detection

Goals: 1) Given a dataset (including potentially unlabelled outliers) and a causal structure, research to which extend does the knowledge of the causal structure help to identify outliers. 2) Given a dataset and labelled outliers, research to which extend this helps for causal discovery.

Privacy-Preservation/Fairness via Causality

Based on existing datasets, define some sensitive attributes x_s, where we want to protect the relationship between these attributes and the target attribute y (e.g., impact of gender on salary). Based on the knowledge about the dataset, derive a causal model, e.g., a causal graph. Research methods to remove the correlation between x_s and y (e.g., via introducing a new synthetic confounder attribute).

Web App for Patent RAG System

Goal: Develop a web application for a patent retrieval and analysis platform. The app allows the user to search for specific patents, or to make an in-depth analysis of the (temporal) changes of patents. In the backend various techniques are used, including LLMs running in cluster environments. For this topic, only the frontend will be developed.