Place and Time: Abingdon, Thursday 17 August 2017 from 19:00 for 19:30
Barn Room, Crown and Thistle (18 Bridge St, Abingdon OX14 3HS)
Recent years has seen the explosion of Artificial Intelligence (AI)-based technologies, along with buzz words like “machine learning”, “computer vision” and “self-driving cars”.
These leaps and bounds can largely be attributed to the neural network, an algorithmic structure which takes inspiration from the human brain, a biological tissue which functions through the synchronous and asynchronous activity of networks of elementary units or neurons. In this talk I will step through how computer vision scientists are using neural networks to teach computers to automatically understand images and videos. The goal of my PhD, however, is not just to get a computer to understand images, but to enable it to converse with a human, and for this, the computer must also have a grasp of language.
I will demonstrate how neural networks are being used to parse questions and produce meaningful and correct answers based on the images that the computer “sees”. My ultimate application is to use these advances to develop a smart AI-based chatbot with computer vision “eyes” that will make the lives of visually-impaired people easier: from navigating them around cities and new environments, to guiding them around their homes, helping them make a cup of tea and engaging with them in some chit-chat.
Speaker: Daniela Massiceti
Daniela is currently a D.Phil student in the Engineering Science department and a graduate member of Pembroke College at the University of Oxford, United Kingdom. She is part of Torr Vision Group, under the joint supervision of Professor Philip Torr and Dr Stephen Hicks.
Prior to this, she completed a M.Sc in Neuroscience also at the University of Oxford, where she graduated with distinction. She obtained her undergraduate degree, a B.Sc in Electrical and Computer Engineering, at the University of Cape Town, South Africa.
Her current research interests include using vision and language models for semantic scene parsing, in particular toward developing visual prostheses to aid the visually-impaired.