OML - Research

Here you get an insight into the current research topics of the OML project

Fevziye Irem Yaman
I developed a system to check the COVID-19 regulations. For the face
mask and face-hand interaction tasks, we published two datasets.
Currently, I am working on pointing gesture recognition to detect
which object is pointed by a user. This enhances the human-robot
interaction by predicting the hand & body poses of the user and
utilizing the dialog input.
Link to the paper



Dogucan Yaman

I am currently working on object-based image manipulation to add new
objects, remove or modify existing ones by controlling the styles,
namely, color, material, shape, with text input. The goal is to
generate new artificial images with slight modifications in order to
provide training data as well as represent the states of action.



Leonard Bärmann

For humans, narrating past events and experiences to other people is a key part of social interaction. Thus, to become accepted by potential users, a humanoid robot must be able to verbalize its observations and actions in a similarly natural way. For instance, consider a person coming home and asking her/his autonomous kitchen robot “What did you do when I was at work?”. By responding with “I cleaned up the table and made the dishes. You’re welcome!”, the robot makes its actions transparent and thereby increases the user’s trust in it. Furthermore, robot experience verbalization is crucial in case of failures. If the robot fails to execute some action (or worse, causes damage, e. g. by dropping a plate), it must be able to tell the human that something went wrong, and ideally also provide details about how and why it happened. 

To realize such functionality, a robot must be equipped with a component resembling the human episodic memory, storing and processing information about past events and their spatio-temporal relations. Particularly, the robotic system needs to process, filter and store data from a highly multi-modal, continuous stream of experiences, including camera images, kinematic data, symbolic ex- ecution information, and more. Verbalization then is the process of understanding a natural language query provided by a user, retrieving the relevant information from the robot’s episodic memory, and synthesizing a natural language answer to fulfill the users request.

While previous work on episodic memory verbalization (EMV) used rule-based procedures for recording episodic memories as well as generating natural language representations thereof, we propose to use data-driven methods, tackling the EMV task as an end-to-end learning problem. Research challenges include creating, storing, compressing and reading episodic memory, natural language understanding and generation as well as multimodal, memory-based reasoning.

Further information is provided in the published paper and a video on the subject.


Stefan Constantin, KIT 

I’m researching ways to detect user error corrections and to use these error corrections to correct the errors. The user can correct via speech (see video) or multi-modal with speech and pointing gestures. An example for such a correction is the user utterance “we are meeting at the kid” and the user correction “it’s k i t”. The corrected utterance is “we are meeting at the k i t”. In addition, the defective elements (reparandum) and the correct elements (repair) can be extracted, in our example, the reparandum is “kid” and the repair “k i t”. The extracted pairs of reparandum and repair can be used in a life-long learning component.

An demonstration can be found here.


Oier Mees, Uni Freiburg
The goal of our work is to control a robot to perform tabletop manipulation tasks via natural language instructions. Our approach is able to segment objects in the scene, locate the objects referred to in language expressions, solve ambiguities through dialog and place objects in accordance with the spatial relations expressed by the user.