Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh. **VQA: Visual Question Answering**. The IEEE International Conference on Computer Vision (ICCV), pp. 2425-2433, 2015.
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. **Hierarchical question-image co-attention for visual question answering**. Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Daniel D. Lee, Ulrike von Luxburg, Roman Garnett, Masashi Sugiyama, and Isabelle Guyon (Eds.). Curran Associates Inc., USA, 289-297, 2016.
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross B. Girshick. **Inferring and Executing Programs for Visual Reasoning**. ICCV 2017: 3008-3017.