|
Research
My research interest revolves around Machine Learning and its applications. At present, I am working on diffusion based image generative models with the amazing colleagues at Brain. I have also been actively working non-autoregressive generative models for text. When I was an undergrad, I interned at MILA and worked on analysing sample complexity of learning algorithms for grounded language learning.
|
|
Cascaded Diffusion Models for High Fidelity Image Generation
Jonathan Ho*, Chitwan Saharia*, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans
Pre-print
Paper / Page
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary
image classifiers to boost sample quality. We outperform BigGAN-deep and VQVAE-2 on FID and classification accuracy scores (CAS).
|
|
Image Super-Resolution via Iterative Refinement
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi
Pre-print
arXiv / Page
We adapt score matching based diffusion models for the image super-resolution. We achieve a fool rate of 50% on face super-resolution, and 40% on ImageNet super-resolution. We cascaded multiple super-resolution
models to efficiently generate 1024x1024 unconditional faces, and 256x256 class conditional natural images.
|
|
Non-Autoregressive Machine Translation with Latent Alignments
Chitwan Saharia*, William Chan*, Saurabh Saxena, Mohammad Norouzi
Empirical Methods in Natural Language Processing (EMNLP), 2020
arXiv / Talk
We apply latent alignment based models for non-autoregressive machine translation. We achieve SOTA for WMT14 EnDe for single step generation using CTC, and SOTA for iterative generation using Imputer.
|
|
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly
International Conference on Machine Learning (ICML), 2020
arXiv / Talk / Media
We introduce a semi-autoregressive model for speech recognition that uses a tractable dynamic programming algorithm to approximately marginalize over all latent alignments and generation orders.
|
|
Combating False Negatives in Adversarial Imitation Learning
Konrad Zolna*, Chitwan Saharia*, Leonard Boussioux*, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio
Association for the Advancement of Artificial Intelligence (AAAI) Student Abstract, 2020
(Extended Version) NeuRIPS Deep RL Workshop, 2020
arXiv / Poster
We study the impact of False Negatives in GAIL algorithm, and present a method diagnose it. We further present a solution - Fake Conditioning and improve upon sample complexity of human demonstrations by an order of magnitude compared to Behavioral Cloning.
|
|
A Tale of Two Modalities for Video Captioning
Pankaj Joshi*, Chitwan Saharia*, Vishwajeet Singh, Digvijaysingh Gautam, Ganesh Ramakrishnan, Preethi Jyothi
Workshop on Multi-modal Video Analysis and Moments in Time Challenge, ICCV, 2019
Paper / Poster
We study the impact audio and visual modalities in learning models for Video Captioning.
|
|
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio
International Conference on Learning Representations (ICLR), 2019
arXiv / Code / Poster
We present a platform to study sample efficiency of grounded language learning. We include a numer of tasks with varying complexity and present a rigid sample complexity benchmark on each task.
|
Huge thanks to Jon Barron for the template!
|
|