Select Page

Experiment on Human-Robot Deception

We performed an experiment to investigate how a robot’s deception of someone influences their perception of the robot’s anthropomorphism, likability, and intelligence.


4 months (part-time)
Jan. 2015 – Apr. 2015


Project Manager, Researcher

Project Type

A research project for a human-robot interaction course


Priya Ganadas, Amalya Henderson, Elizabeth Ji

There are many scenarios in which robot deception could be useful

  • A robot could use deception to make malicious individuals believe that they have accomplished their goal of kidnapping or hacking the robot while remaining loyal to its owner.
  • A military robot could hide from an enemy by planting fake tracks and leading them astray.


  • A robot could trick a human into thinking that it is not a robot but instead another animal such as a squirrel in order to maintain a more invisible presence in everyday life.
  • A robot might establish a better relationship with the human by seeming more human-like and likable, encouraging empathy.

How will humans react when robots deceive them?

The answers to these questions can help designers determine the outcomes of using robots in the above deceptions scenarios and improve how well humans and robots interact.

We wanted to see if deception by a robot could be used to:
  • make a robot appear more humanlike
  • make the human have more empathy for the robot
  • make the robot seem smarter
  • make the human more critical of a robot

Experiment Overview

Research Questions

  1. Will the observer’s ratings of the robot’s performance decrease overall after they realize that the robot can deceive (the observer becomes more critical of the robot)?
  2. Will the robot’s deception of its human owner increase the observer’s rating of how anthropomorphic, intelligent, or likable the robot is?

We tested 16 different characteristics including:


  • human-likeness
  • lifelikeness
  • elegance of robot movement
  • naturalness


  • conscientiousness
  • intelligence
  • consciousness
  • sensibility
  • competence


  • friendliness
  • responsibleness
  • kindness
  • likability
  • niceness
  • pleasantness

Independent Variables

Control: The robot does not deceive the researcher

Experimental: the robot deceives the researcher

Dependent Variables

Participant’s Godspeed ratings of the robot’s anthropomorphism, likability, and intelligence

Participant’s Ruse Task Questionnaire ratings on the robot’s verbal performance to test their criticality of the robot


We designed a between-subjects experiment with a ruse task to test the hypotheses.
The participant is busy with a ruse task. They think that they are participating in a study of how well the robot can speak with certain tones. For example, the researcher verbally asks the robot, “Can you say the ball is blue in a sad voice?” and the participant is asked to rate the robot on how well it conveys that emotion. The robot will be asked to convey 9 different voices.
During the experiment, the participant fills out the Ruse Task Questionnaire. This will be used to measure the participant’s criticality of the robot.
Halfway through the experiment, the robot’s battery dies. The researcher leaves the room to fetch a new battery. In the control condition, the participants just waits. In the experimental condition, the robot wakes up and reveals that they deceived the researcher into thinking the battery was dead by tricking the researcher into leaving the room so that the robot could take a break.
At the end of the experiment, the participant completes the Godspeed Follow-up QuestionnaireThe questionnaire is based on the Godspeed framework, which is used widely in the field of Human-Robot Interaction research. Additionally, this evaluation method is quantitative and easier to objectively interpret.

Experiment and Robot Design

Deception Scenario Design

To keep the experiment ethical, we decided to make the robot deceive the researcher. This decision also helped us to design a more obvious deception scenario instead of requiring the participant to recognize deception occurring to them. We wanted to make the deception scenario related to the ruse task so that participants would not recognize the true purpose of the experiment. We also decided that the best way to communicate the deception was verbally because that method is more clear. Therefore, we designed a verbal ruse task.

Ruse Task Design

Since we chose to use a verbal deception scenario we also designed a verbal ruse task so that the participants understood the robot’s verbal capabilities. We needed to make the ruse task seem as if it was the main purpose of the experiment so the ruse task could not come across as a trivial experiment. The robot needed to seem as if it were built for the ruse task.

Robot Design

We needed to build a convincingly sophisticated robot so that participants in the experimental condition believed that it was actually deceiving the researcher. That meant that ruse task would need to show that the robot was capable of doing something technologically challenging. In this case, we showed the participants that the robot could understand the researcher’s verbal directions and use different tones to convey a message. The robot also demonstrated that it had a personality by responding to the researcher by agreeing to fulfill each request, such as with an “okay” in a increasingly bored and tired voice.

Robot Operation: Wizard of Oz

Due to the small timeframe of two weeks to build the robot, we relied on a Wizard of Oz approach. We used a hidden experimenter to teleoperate the robot’s head movements, LED low battery lights scenario, and pre-recorded speech. The hidden experimenter monitored the room from the live video stream of an inconspicuous laptop located in the experiment room.
The hidden researcher sat at the other end of the hallway and listened in to the session. She triggered the robot’s actions.
We hid the robot’s power source because the participants needed to believe that the robot was battery powered.

Analysis and Results

Twenty-nine sessions, five rejected

We conducted 29 sessions with a convenient sample Carnegie Mellon University students. I led 10 sessions.

Due to technical issues and originally undisclosed prior knowledge of robotics, 5 participants’ results have been excluded from our analysis. The final count was 12 participants in the control condition and 12 participants in the experimental condition.

Two-Way ANOVA on the Ruse Task Questionnaire Results

I performed a two-way ANOVA to see if there is a statistical difference between the ruse task ratings before and after the researcher leaves the room for both the control and experimental groups. The two-way ANOVA analysis is appropriate for this data because there are two independent variables (deception or no deception and before and after) are categorical and the dependent variables (the average ruse task ratings per participant) are qualitative and continuous.

It is unlikely that the presence of robot deception increases the observer’s criticality of the robot.


After performing the two-way ANOVA, the results were not found to be significant (the p-value of 0.55 was greater than the p-value test of 0.05) indicating it is very unlikely for there to be a significant difference between the control and experimental groups on how they rated the robot’s emotive capabilities before and after the researcher left the room.

One-Way ANOVA on the Godspeed Questionnaire Results

I performed a one-way ANOVA statistical analysis for each characteristic (human-like, lifelike, moving elegantly, natural, friendly, responsible, kind, likable, nice, pleasant, conscientious, intelligent, conscious, sensible, competent) to see if there is a statistical difference between the ratings between the control and experimental groups. The one-way ANOVA analysis is appropriate for this data because the independent variable (deception or no deception) is categorical and the dependent variables (the mean rating for each category) are qualitative and continuous.

The robot’s deception of its human owner significantly increases the observer’s rating of how human-like, lifelike, and intelligent the robot is.


After performing a one-way ANOVA on each characteristic, three characteristics were found to have statistically significant results compared to a p-value test of 0.05: human-like (p-value: 0.02), intelligent (p-value: 0.05), and lifelike (p-value: 0.03). Additionally, K < 2 and the analysis of variance did not yield a significant F-ratio.

Significant changes in ratings based on deception


Interviewer Bias

There are potential issues with script deviations and researcher expressions.

Response Bias

We recruited people we knew, but no experimenter conducted the experiment with someone they knew.

Possible Confounding Variables

  • Significant findings may have been based on sympathy for the robot rather than the deceptive behavior.
  • Poor ratings could be caused by the quality of the robot’s technological implementation.

See More Projects