Chitwan Saharia

I am a Research Engineer at Google Brain, Toronto.

I finished my undergrad from Indian Institute of Technology (IIT), Bombay with a major in Computer Science and Engineering . I was advised by Prof. Preethi Jyothi for my Bachelor Thesis Project on Analysing the Impact of Modalities on Video Captioning. I previously interned at Mila, Montreal under the supervision of Prof. Yoshua Bengio , and Dzmitry Bahdanau .

Email  /  CV  /  Google Scholar  /  Twitter

profile photo
Research

My research interest revolves around Machine Learning and its applications. At present, I am working on diffusion based image generative models with the amazing colleagues at Brain. I have also been actively working non-autoregressive generative models for text. When I was an undergrad, I interned at MILA and worked on analysing sample complexity of learning algorithms for grounded language learning.

Selected Samples Cascaded Diffusion Models for High Fidelity Image Generation
Jonathan Ho*, Chitwan Saharia*, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans
Pre-print
Paper / Page

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classifiers to boost sample quality. We outperform BigGAN-deep and VQVAE-2 on FID and classification accuracy scores (CAS).

Cascaded Generation Image Super-Resolution via Iterative Refinement
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi
Pre-print
arXiv / Page

We adapt score matching based diffusion models for the image super-resolution. We achieve a fool rate of 50% on face super-resolution, and 40% on ImageNet super-resolution. We cascaded multiple super-resolution models to efficiently generate 1024x1024 unconditional faces, and 256x256 class conditional natural images.

Imputer Architecture for Machine Translation Non-Autoregressive Machine Translation with Latent Alignments
Chitwan Saharia*, William Chan*, Saurabh Saxena, Mohammad Norouzi
Empirical Methods in Natural Language Processing (EMNLP), 2020
arXiv / Talk

We apply latent alignment based models for non-autoregressive machine translation. We achieve SOTA for WMT14 EnDe for single step generation using CTC, and SOTA for iterative generation using Imputer.

Imputer Decoding Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly
International Conference on Machine Learning (ICML), 2020
arXiv / Talk / Media

We introduce a semi-autoregressive model for speech recognition that uses a tractable dynamic programming algorithm to approximately marginalize over all latent alignments and generation orders.

Model Architecture for GAIL Combating False Negatives in Adversarial Imitation Learning
Konrad Zolna*, Chitwan Saharia*, Leonard Boussioux*, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry BahdanauYoshua Bengio
Association for the Advancement of Artificial Intelligence (AAAI) Student Abstract, 2020
(Extended Version) NeuRIPS Deep RL Workshop, 2020
arXiv / Poster

We study the impact of False Negatives in GAIL algorithm, and present a method diagnose it. We further present a solution - Fake Conditioning and improve upon sample complexity of human demonstrations by an order of magnitude compared to Behavioral Cloning.

Model Architecture for Video Captioning A Tale of Two Modalities for Video Captioning
Pankaj Joshi*, Chitwan Saharia*, Vishwajeet Singh, Digvijaysingh Gautam, Ganesh Ramakrishnan, Preethi Jyothi
Workshop on Multi-modal Video Analysis and Moments in Time Challenge, ICCV, 2019
Paper / Poster

We study the impact audio and visual modalities in learning models for Video Captioning.

Example BabyAI Level BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio
International Conference on Learning Representations (ICLR), 2019
arXiv / Code / Poster

We present a platform to study sample efficiency of grounded language learning. We include a numer of tasks with varying complexity and present a rigid sample complexity benchmark on each task.


Huge thanks to Jon Barron for the template!