Object recognition in humans and deep neural networks via Kate Storrs
LipNet artificial intelligence model for lipreading

Talk

Lipreading and visual brains

24 May 2017

Event times

6.30pm-8.30pm

Cost of entry

Free

IDEA London

London
England, United Kingdom

Address

69 Wilson Street
London
England
EC2A 2BB
United Kingdom

Directions via Google Maps Directions via Citymapper

Event map

Kate Storrs (University of Cambridge) will talk about closing the loop between biological and artificial vision, followed by Yannis Assael and Brendan Shillingford (University of Oxford), who will present LipNet, an artificial intelligence technique for lipreading.

About

The event is part of a series designed to bring together artists, developers, designers, technologists and industry professionals to discuss the applications of artificial intelligence in the creative industries.

Kate Storrs, Postdoctoral Scientist, MRC Cognition and Brain Sciences Unit, Cambridge

"Closing the loop between biological and artificial vision"

The layout of modern deep neural networks was inspired by the hierarchical visual systems found in mammalian brains. Where neuroscientists provided the inspiration, engineers perfected the implementation, so that in the last few years we finally have a working demonstration of how complex object recognition might be done in the brain. As a visual neuroscientist, I'll talk about some of the possibilities deep learning opens up, from providing testable models of human vision, to optimising visual images to create specific patterns of brain activity.

Kate Storrs is a visual neuroscientist, working towards a fully explicit computational model of what happens between the moment light hits your eye, and the moment you consciously recognise an aardvark, zebra, etc. She recently finished a postdoctoral internship in the Magic Pony machine learning team at Twitter in London, and is an eager science communicator and artist.

----

Yannis M. Assael, Brendan Shillingford, PhD Students at University of Oxford

"LipNet: End-to-End Sentence-level lipreading"

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rather than sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).

Yannis Assael graduated from the Dept. of Applied Informatics, University of Macedonia, Greece, in 2013. He was awarded a full-scholarship to study for an MSc in Computer Science at the University of Oxford, where he received the Tony Hoare Prize for the best overall performance in 2014. In 2015, he continued for an MRes in ML at Imperial College London under the HiPEDS Scholarship. Having obtained the second highest mark he went back to the University of Oxford to study for a DPhil degree in ML under the Oxford - Google DeepMind Graduate Scholarship. Throughout his studies he has participated in more than 50 freelance and consulting projects. His Machine Learning research has focused on differentiable multi-agent communication, and improving speech recognition with lipreading.

Brendan Shillingford is currently studying for a DPhil specializing in deep learning in the Department of Computer Science at the University of Oxford, where he studies as a Clarendon Scholar. Previously, Brendan completed his studies at University of British Columbia (Vancouver, Canada) in statistics and computer science. His most recent work focuses on various aspects of recurrent neural networks, and most recently, automated lipreading.

----

The schedule for the evening will be as follows.

6.30pm - 7pm Arrive

7pm - 7.10pm Introduction

7.10pm-7.50pm First talk (Kate Storrs)

7.50pm-8.30pm Second talk (Yannis Assael, Brendan Shillingford)

#LDNcreativeAI

@elluba

What to expect? Toggle

CuratorsToggle

Luba Elliott

Add an Event

Add an Organisation

Keyword search

Location search

Talk

Lipreading and visual brains

24 May 2017

Event times

Cost of entry

People who have saved this event:

IDEA London

Address

About

What to expect? Toggle

CuratorsToggle

Related events

Comments

About Toggle

What you can do Toggle

Plans & Pricing Toggle

Resources Toggle

Copyright and terms Toggle