Multimodal Deep Learning group aims to develop deep learning methods that observe and process multimodal input coming from the environment, make further connections through inference and communicate the system output to the user. The group's current research focus is exploring the interplay between vision and language for several tasks. The research of Multimodal Deep Learning group is broadly divided into three subfields: zero-shot learning, conditional image synthesis and deeply explainable artificial intelligence.
Zero-Shot Learning: Zero-shot learning for image classification involves learning a model about the environment given a set of observations that belongs to a certain set of classes. The main challenge of zero-shot learning task is that the set of classes at training and at test time are disjoint. As, the classic supervised learning algorithms that rely on the full set of class labels can not be employed for the zero-shot learning task, we use language as auxiliary information to build a structure in the label space. This work has been published in CVPR 2015, CVPR 2016 and CVPR 2017.
Conditional Image Synthesis: As deep learning frameworks have been getting deeper and more expressive, learning representations that can be used to visualize what the network is learning has become an increasingly developing topic. Generative image synthesis is a side-product of these attempts where the aim is to use deep networks to generate images from scratch that a human might mistake for real. The focus of the group on this topic is automatic image synthesis or image feature synthesis conditioned on language expressed as detailed visual descriptions. This work has been published in ICML 2016 and NIPS 2016.
Deeply Explainable Artificial Intelligence: One aspect of deep learning that has been getting increasingly popular is to understand the internal decision process of a network. A deep network that can explain its own thought and decision process would be more trustable. These explanations either come in the form of language or visual justifications such as machine attention. The group focuses on generating visual explanations and pointing to the evidence for a classification decision of a deep multimodal learning framework. This work has been published in ECCV 2016.
Members
- Dr. Zeynep Akata, Senior Researcher
- Mr. Yongqin Xian, PhD student
Publications
![]() |
Gaze Embeddings for Zero-Shot Image Classification Nour Karessli, Zeynep Akata, Bernt Schiele, Andreas Bulling IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (Spotlight Presentation) |
![]() |
Zero-Shot Learning – The Good, the Bad and the Ugly Yongqin Xian, Bernt Schiele, Zeynep Akata IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 |
![]() |
Exploiting saliency for object segmentation from image level labels Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 |
![]() |
Learning What and Where to Draw Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele and Honglak Lee Neural Information Processing Systems, NIPS 2016 (Oral Presentation) |
![]() |
Generating Visual Explanations Lisa Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele and Trevor Darrell European Conference of Computer Vision, ECCV 2016 |
![]() |
Generative Adversarial Text to Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Honglak Lee and Bernt Schiele International Conference of Machine Learning, ICML 2016 (Oral Presentation) |
![]() |
Learning Deep Representations of Fine-Grained Visual Descriptions Scott Reed, Zeynep Akata, Honglak Lee and Bernt Schiele IEEE Conference of Computer Vision and Patter Recognition, CVPR 2016 (Spotlight Presentation) |
![]() |
Multi-Cue Zero-Shot Learning with Strong Supervision Zeynep Akata, Mateusz Malinowski, Mario Fritz and Bernt Schiele IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 (Spotlight Presentation) |
![]() |
Latent Embeddings for Zero-Shot Classification Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein and Bernt Schiele IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 (Spotlight Presentation) |
![]() |
Evaluation of Output Embeddings for Image Classification Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee and Bernt Schiele IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 |
![]() |
Label Embeddings for Image Classification Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol:38, No:7, July 2016Label-Embedding for Attribute-Based Classification, Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia Schmid, IEEE Computer Vision and Pattern Recognition (CVPR) 2013 |