How to control a robot with your hands: merging iOS Vision and ROS

19 Jul 2020

How to control a robot with your hands: merging iOS Vision and ROS

Often while developing and describing robot movements with colleagues, I find myself moving my hands to mimick the pose and movement of a manipulator. It's just the most organic way to describe movement as humans, and this is where I had the idea to create this demo: What if I could move a robot manipulator by just using my hands' movement?. In this post I'll outline some of the techniques and resources employed in this demo.

Controlling the PRBT with hand gestures

High-Level Architecture

High level overview of the system

The main components in this setup are:

  • Sensor & Vision: iPad Pro 2018
  • Robot system & control: ROS PC

They are communicated via the roslibjs library, thanks to the great RBSManager Swift framework.

Vision System: iOS 14

The vision system leverages the new iOS 14 Vision APIs. To get started with them, I recommend watching this year's WWDC session.

Hand Pose landmarks, from WWDC 2020.

The interesting part here is the ability to detect specific landmarks within a hand. In this demo I decided to control the robot in the following manner:

  • Open hand -> No robot control, landmark trackers in red color.
  • Closed fist -> Control robot. I chose this because it was both easy to develop and very intuitive, like grabbing the robot directly. Trackers in green color.

In order to detect these different hand poses I recieved the handLandmarkKeyWrist (for the wrist), and handLandmarkKeyLittleTIP for the little finger's tip.

 // Get points for wrist and index finger.
let wristPoints = try observation.recognizedPoints(forGroupKey: .all)
let indexFingerPoints = try observation.recognizedPoints(forGroupKey: .handLandmarkRegionKeyLittleFinger)

// Look for specific landmarks.
guard let wristPoint = wristPoints[.handLandmarkKeyWrist], let indexTipPoint = indexFingerPoints[.handLandmarkKeyLittleTIP] else {

Applying some heuristics, I was able to detect the hand closed/open status with relative reliability.

The next step is computing the difference in position with the help of a local variable, and finally sending It to the roslibjs client as a Twist ROS message.

wristCGPoint = CGPoint(x: wristPoint.location.x, y: 1 - wristPoint.location.y)
let oldWristvalues = CGPoint(x: currentWristPoint?.x ?? 0.0, y: currentWristPoint?.y ?? 0.0)
currentWristPoint = wristCGPoint
if gestureProcessor.state == .pinched {
    // Note the multipliers at the end, in order to convert from pixels to reasonable Twist values.            
    let dx = ((currentWristPoint?.x ?? 0.0) - (oldWristvalues.x)) * 3550
    let dy = ((currentWristPoint?.y ?? 0.0) - (oldWristvalues.y)) * 3450
    // Assigning x and y depends on screen orientation, definitely a WIP             
    self.sendTwistMessage(x: Float(-dy), y: 0, z: Float(-dx), roll: 0, pitch: 0, yaw: 0)

At this point all that is needed is properly setting up the robot system to expect Twist messages.

Robot System : ROS & PRBT Robot

The robot I'm most familiar with is the PRBT, for its native ROS functionality and simple driver (full disclosure: It's what I do for a living). In this demo, I used the pilz_teach package for teleoperation capabilities.

The PRBT robot shown in RViz

Once the pilz_teach package is compiled, you need to run the prbt_jog_arm_support package to bring up the robot with the control interface ready for jogging. This is important, as the standard prbt_moveit_config method will not work for jogging.

roslaunch prbt_jog_arm_support prbt_jog_arm_sim.launch

Lastly, you need to also bring up the teleoperation nodes:

roslaunch pilz_teleoperation key_teleop.launch

And for communication purposes, don't forget to run the roslibjs websocket:

roslaunch rosbridge_server rosbridge_websocket.launch

And that's about it! You can now feel like a proper Jedi controlling your robot with hand movements 🧠.

The code in this article was generated using Codye