Institute of Media Studies

The Answering Machine

Social bots - machines that interact for us as social partners - are increasingly encountered in our everyday lives. How does this affect us as interaction partners? And how will our interactions with the machines
develop? The project uses the theater stage as an artistic-scientific experimental platform to address these questions from the perspective of applied anthropomorphism - in this case, the attribution of human
characteristics to the machine. On stage, the four collaborating disciplines - psychology, computational linguistics, theater studies, and media studies - come together to test specific questions and create sensitivity
to the challenges that the co-existence of humans and AI holds in the future. Actors will interact with social bots in a four-year series of experiments and performances. The team aims to identify emotional, behavioral,
and cognitive patterns to understand in detail the conditions of anthropomorphization. The findings will be used in psychological trainings, the media theoretical implications will be reflected upon, and simulations for
a general human-machine co-evolution will be created and brought into public discourse.

Prof. Dr. Susanne Marschall

University of Tübingen
Media Studies
Wilhelmstr. 50
Room 018
D-72074 Tübingen

 +49 (0) 7071 29-72354
Fax: +49 (0) 7071 29-5149
susanne.marschallspam prevention@uni-tuebingen.de

Maximilian J. Zhang

Eberhard Karls University of Tübingen
Institute of Media Studies
Wilhelmstraße 50
72074 Tübingen
Room 210

 +49 7071 29-72327
maximilian.zhangspam prevention@uni-tuebingen.de

Other Project Participants

University of Stuttgart: Computational Linguistics

Prof. Dr. Jonas Kuhn (PI)

Simone Beckmann Escandón (Doctoral Candidate)

Zurich University of Arts: Theater Studies

Dr. Gunter Lösel (PI)

Work Plan

Year 1: Setting up the social bot: Data collection and pipelines

MS1 will focus on the voice-based interplay of human-machine interactions and concepts of liveness and presence for invisible forms of embodiment on stage. By creating a framework for capturing the performances in a 2-dimensional and a volumetric way the MS will provide high quality material which will be used for an extensive analysis of human-machine encounters on stage.  For this purpose, an elaborated framework for studying the perception of “Applied Anthropomorphism'' and as an interplay of TTS algorithms, synthetic voices and concepts of presence and liveness on stage will be established. To elaborate on the concept of “Applied Anthropomorphism” on a voice-based level from various perspectives we will prepare a publication on “Media Voice, Presence and Liveness” in MS2. This book will combine findings from various disciplines on the core aspects of voice-based communication with non-human entities.  MS will create and conduct qualitative interviews about the audience’s perception of the performance. In a close exchange with PSY, TS and CL the conception of the anthology on “Media Voice, Presence and Liveness” will be initiated to deepen the interrelations between all areas. Through the exploration of the historical background in the field of media studies and the role of the synthetic voice and affective computing a close exchange with TS for staging the human-machine encounters will be established. MS will furthermore explore and adjust state-of-the-art TTS software and explore synthetic voice software e.g., sonatic.io and aravoice for the performance situation in year 3.

Output: qualitative research framework for analyzing liveness, presence and the voice-based interplay of HMI, theoretical embedding of synthetic voice into the concept of "Applied Anthropomorphism", 2D and 3D performance capturing.

Year 2: Staging the social bot: Rules and formalisms

MS2 will introduce an interdisciplinary concept on the mediality of AI based chatbots and synthetic voices called “Invisible embodiment”. This concept will extend applied anthropomorphism on a speech-based level for analyzing the role of synthetic voice modalities for voice-based HMI. Furthermore, the anthology “Media Voice, Presence and Liveness” with a focus on the socio-historical meaning of artificial and synthetic voices e.g. talking puppets, voice-body interrelation, meaning of the voice as the first medium to communicate and to hear, audio related Uncanny Valley effect (aUV) will be published. Depending on the findings from MS1 the performance capture methods and the synthetic voice parameters will be adjusted. By analyzing the data from MS1 and the rule-based chatbots modalities we will adjust the study framework in order to check the rule compliances from CL2, TS2 during the performance. We will modify the synthetic voice datasets in order to customize the voice - chatbot dataset appearances according to the findings of year 1.

Output: qualitative research framework for deepening theoretical concepts on the mediality of AI, theoretical concept of "Invisible embodiment" for embedding the aUV, 2D and 3D performance capturing, publication of the anthology on "Media Voice, Presence and Liveness".

Year 3: Refining the social bot: Facets of being human

MS3 will focus on the voice-related Uncanny Valley effect and the impact of social cues on the audience’s perception of presence on stage. For exploring the limitations of the “invisible embodiment” concept and for deepening the analytical aspects regarding the perception of AI-based HMI we will pursue a strong focus on social cues, voice pitch, speed, and the sound space in our interviews. These synthetic voice-based modalities will be further analyzed for exploring gender-based perception biases of synthetic voices. In order to facilitate science communication discussions about the ethical limitations of voice-based AI interactions we will also focus on comparing the vocal scopes of natural and synthetical voices in our performance settings. The datasets for TTS and its spatial arrangements on the stage and during performance capturing will be adjusted and incorporated into the performance. For science communication reasons there will be expert interviews conducted and the captured performances from year 1, 2, and 3 placed as juxtapositions. Furthermore, the final publication on the whole research project will be conceptualized and initiated by MS.

Output: Theoretical concepts on synthetic voice by incorporating modifications of voice perception, adjustments of TTS and synthetic voice frameworks for HMI, 2D and 3D performance capturing, concept for project publication.

Year 4: Co-evolution with social bots: projections, needs and affordances

MS4 will deepen the role of social cues and the limits of voice anthropomorphism for the final project publication. After finalizing the data acquisition, interdependencies from TS, CL, PSY and MS will be brought together and published in the final project publication. Furthermore, there will be a practice-based PhD thesis on the topic of „Creative Artificial Intelligence on Stage” completed which will sum up and present the potential of staging voice-based human-AI encounters using hermeneutics as a qualitative research approach.

Output: Contribution to project publication, dissertation.