An eye for an eye-track!

9 December 2016 / Tim Holmes

Last week we had the Acuity Christmas party and as usual we closed the office down for the day to immerse ourselves in a range of activities as a well-earned break from all our research into visual attention and decision making.   It turns out that it’s kind of hard to stop thinking about eye-movements and attention when you’re hurtling around a go-kart track with your co-workers trying to sneak past you on either side, or when you’re in a former cold-war bunker trying to target your laser gun on just the right part of someone’s body suit, or even when you’re standing on top of the O2 and surveying London from the most open and panoramic viewing platform in town.

Now, I’m not just listing everything from our fun day out as a way to solicit job applications from intelligent fun-seekers (although feel free to send to me a CV if we sound like your kind of company), I’m listing these activities because they have a couple of things in common. Firstly they all place specific demands on your visual system and secondly they all represent real-world examples of the type experiences designers and marketers are currently attempting to create in immersive virtual reality. They all have something else in common too, they require you to move your eyes AND your head and represent great examples of why head-tracking alone is insufficient if you really want to know how someone is engaging with, and making decisions about, an environment, whether that’s a first person shooter game, a 360-degree video tour of a potential house purchase or a journey through a VR supermarket.

Before I talk about head and eye-movements, I should probably clarify what I mean by immersive virtual reality. For years eye-tracking was pretty much confined to screen based set-ups, from small mono-chromatic CRTs in dusty university labs to the fancy large screen projected displays used in retail research facilities. Although the quality and ecological validity of these set-ups varies, they were all a form of virtual reality because even if the images were life-size and photo-realistic they still weren’t the real-thing and, most importantly, the edges of the image were usually quite visible meaning that viewers could look outside of the image or scene. Moreover, research suggests that screen presentation introduces a range of biases in eye-movements, most notably a central bias, and so this virtual reality should NEVER be confused with real-world eye-tracking!

With headsets like the Oculus Rift, HTC Vive, Fove and Daydream we can now experience VR in a very different way. The images pretty much fill the field of view, and in most cases the scene appears to wrap around the headset wearer, fully enclosing them in a virtual world. This wrap-around effect is, of course, just an effect, because the images remain on the screen in-front of the viewer’s eyes, and head-movement sensors are used to change that image as the viewer turns their head up and down or from left to right to look at what’s around or behind them. It is this combination of filling the field of view and being able to look around a 360-degree scene that generates the sense of immersion, and from then on it’s up to the content provider to engage the wearer with that environment.

This ability to know which region of the 360-degree scene a person is oriented towards from just their head movement alone can easily be used to suggest where a person might be looking, in the same way point-of-view camera glasses like the new Snapchat Spectacles can. But, to pick up on a point I made in my last post, it is just that – a suggestion of where they MIGHT be looking. To be more accurate it gives you an indication of the field of view or the parts of the scene that are even candidates for being looked at.

Our eyes can move independently of our heads, and to be honest it’s a good job they can, because if they didn’t we’d probably spend most of waking hours feeling quite nauseous as we constantly move our heads from side to side to explore the world around us. Our visual system uses eye-movements to compensate for those head movements, keeping objects of interest stable on the retina allowing us to continue to interact with the world as we move through it. This independence of movement also means we can of course move our eyes without moving our head at all, something we do all the time when we’re watching TV and checking Twitter for example.

The point here is that tracking head-movement simply gives you a “potential to be seen” rather than actual “looked at” measure, in rather the same way that a website visit gives an ad a potential to be seen, but in no way guarantees that the ad will actually be looked at or clicked on by the user. Because we need to use the centre of visual field, the fovea, to examine objects in detail, we need to know where in the visual scene the fovea is being directed in order to know what is actually being looked at, and for that you need to track the eyes. The fovea has a diameter of around 2-degrees, or about the size of your thumbnail at arm’s length, whereas the human visual field without head-movement is around 120-140 degrees across, leaving an awful lot of room for error if you’re relying on head position alone. This is really important for anyone using head-mounted VR as a research tool or generating content for immersive VR, and presumably it’s the reason that Oculus recently issued an open call to universities to collect data about eye-movements in both the real and virtual worlds. In order to know if your 360-degree content is actually going to achieve its goal, you need to know whether you have built in sufficient cues to direct visual attention to the right part of the scene and this requires an understanding of both the head AND eye-movements.

In my previous post, I talked about the way predictive attention models can be sometimes used to look like eye-tracking, and of course the same is true with head-movements. If you make the, clearly false, assumption that attention is always located at the centre of the visual field then you can use the head-movement data to position a cursor and generate a heat-map. But you are potentially missing a big part of the story, not least of all any measure of how well elements in the periphery are attracting attention, something the eye-movements will give you a clear indication of. The assumption that field-of-view tracking can accurately report whether a product was seen in a 360-degree video or a virtual store is simply wrong and we only have to look at all the real-world shopper behaviour we have captured using eye-tracking glasses to know this. Virtual reality might seem super cool and sexy right now, but it’s not a reason to ignore what we know to be true: if you want to know where someone looked, you need to track eyes.

If you want to keep up to date with all things eye-tracking or VR related, follow me on Twitter for much, much more, and I’ll be returning to the subject of eye-tracking in VR next year when we’ve had a chance to play with the Fove!

Leave a Reply