Giannis Daras

I am a post-doc at MIT CSAIL, working with Prof. Costis Daskalakis and Prof. Antonio Torralba.

Prior to that, I spent 4 wonderful years at UT Austin, working with Prof. Alexandros Dimakis.

I received my undergraduate degree in ECE from the National Technical University of Athens.

I work on practical and theoretical questions revolving around deep generative models. A central thrust of my research is developing principled algorithms for training and sampling generative models in the presence of data corruption.

I am currently on the academic job market, looking for tenure-track faculty positions.

Email: gdaras [at] mit [dot] edu.

News

Rising star in AI: I was nominated a “Rising Star in AI” by the university of Michigan. I will be giving a talk at the 2025 Michigan AI Symposium.
NeurIPS Spotlight! Ambient Omni got accepted as a spotlight to NeurIPS 2025.
NeurIPS Spotlight! Ambient Proteins got accepted as a spotlight to NeurIPS 2025.
Best Contribution Award: Best Contribution Award at the International Biomedical and Astronomical Signal Processing (BASP) Frontiers Conference 2025.

Internships

NVIDIA Research (2023) with Arash Vahdat.
Google Research (2022) with Peyman Milanfar, Mauricio Delbracio and Hossein Talebi.
Google Research (2022) with Vincent Chu and Abhishek Kumar.

Talks

The most representative talk of my latest research is the one given at Columbia Engineering, as part of the Workshop on Emerging Trends in AI. Please watch the talk here.

For a full list of the talks that I have given over the years, see below.

MIT, ML Tea Seminar Talk

2025

Mila - Quebec AI Institute

2025

Ben-Gurion University

2025

Runway ML

2025

Applied Inverse Problems Conference (AIP)

2025

NTU Singapore

2025

Simons Institute Berkeley
Youtube Video

2025

Columbia University
Youtube Video

2025

Biomedical and Astronomical Signal Processing (BASP) Conference

2025

Harvard University

2024

Google DeepMind, London Office

2024

Grundfest Lecture series (UCLA + Caltech)
Youtube Video

2024

Learning on Graphs and Geometry (LoGG) Reading Group
Youtube Video

2024

Aalto University

2024

University of Wisconsin-Madison, MLOPT Idea Seminar
Youtube Video

2023

UT Austin, GenAI IFML Workshop

2023

EleutherAI Diffusion Reading Group
Youtube Video

2023

Uppsala University

2023

Archimedes Research Unit

2023

NeurIPS Workshop Oral Presentation

2022

Rice University, Imaging and Vision Seminar

2022

Teaching

Diffusion Models: From Theory to Practice (6.S982): Spring 2025, CSAIL, MIT

Co-designed and co-taught this graduate MIT class with Prof. Costis Daskalakis.

Link to class website.

Syllabus:

Lecture 1: Introduction to generative models and their applications (GANs, VAEs, Flows, Diffusion Models, and Inverse Problems).
Lecture 2: Deep dive in Diffusion Models (definition of the forward process, Itô integral, Itô formula, FP equation, reversibility, deterministic samplers, Tweedie's formula, Denoising Score Matching).
Lecture 3: Diffusion models discretization error analysis.
Lecture 4: Part I: Learning diffusion models from corrupted data, Part II: Likelihoods and Latent Diffusion.
Lecture 5: Flow Matching.
Lecture 6: Diffusion models and inverse problems.
Lecture 7: Schrödinger bridges.

Advanced Machine Learning: Topics in Unsupervised Learning: Spring ‘23, ECE, UT Austin

I was teaching assistant for this graduate UT Austin class and I gave several lectures.

Class description: This is an advanced class focusing on topics in unsupervised learning. We will cover classical and modern techniques for modeling high-dimensional distributions, including directed and undirected graphical models, learning graphical models using feature selection, submodularity, polytopes and combinatorial optimization, and inference in graphical models. Deep generative models covered include: Variational Autoencoders, Generative Adversarial Networks, Autoregressive models, Normalizing Flows, Diffusions, and score-based models. We will also explore problems in learning causal models and adversarial attacks.

Publications

For a more comprehensive list of publications, please visit my Google Scholar page.

Ambient Proteins: Training Diffusion Models on Low Quality Structures

Published as an Spotlight in NeurIPS 2025 [Paper] [Code]

Citation: Giannis Daras*, Jeffrey Ouyang-Zhang*, Krithika Ravishankar, William Daspit, Costis Daskalakis, Qiang Liu, Adam Klivans, Daniel J. Diaz, "Ambient Proteins: Training Diffusion Models on Low Quality Structures", NeurIPS 2025

Ambient Diffusion Omni: Training Good Models with Bad Data

Published as an Spotlight in NeurIPS 2025 [Paper] [Code]

Citation: Giannis Daras*, Adrian Rodriguez-Munoz*, Adam Klivans, Antonio Torralba, Constantinos Daskalakis, "Ambient Diffusion Omni: Training Good Models with Bad Data", NeurIPS 2025

How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion

Published in ICLR 2025 [Paper] [Code]

Citation: Giannis Daras, Yeshwanth Cherapanamjeri, Constantinos Daskalakis, "How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion", ICLR 2025

Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Published in ICLR 2025 [Paper]

Citation: Negin Raoof, Litu Rout, Giannis Daras, Sujay Sanghavi, Constantine Caramanis, Sanjay Shakkottai, Alexandros G. Dimakis, "Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models", ICLR 2025

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data

Published in ICLR 2025 [Paper] [Code]

Citation: Asad Aali, Giannis Daras, Brett Levac, Sidharth Kumar, Alexandros G. Dimakis, Jonathan I. Tamir', "Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data", ICLR 2025

DataComp-LM: In search of the next generation of training sets for language models

Published in NeurIPS 2024 Track Datasets and Benchmarks [Paper] [Code]

Citation: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar, "DataComp-LM: In search of the next generation of training sets for language models", NeurIPS 2024 Track Datasets and Benchmarks

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

Published in ICML 2024 [Paper] [Code]

Citation: Giannis Daras, Alexandros G. Dimakis, Constantinos Daskalakis, "Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data", ICML 2024

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Published in NeurIPS 2023 [Paper] [Code]

Citation: Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai, "Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models", NeurIPS 2023

Ambient Diffusion: Learning Clean Distributions from Corrupted Data

Published in NeurIPS 2023 [Paper] [Code]

Citation: Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alexandros G. Dimakis, Adam Klivans, "Ambient Diffusion: Learning Clean Distributions from Corrupted Data", NeurIPS 2023

DataComp: In search of the next generation of multimodal datasets

Published as an Oral in NeurIPS 2023 [Paper] [Code]

Citation: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt, "DataComp: In search of the next generation of multimodal datasets", NeurIPS 2023

Giannis Daras

News

Internships

Talks

Teaching

Diffusion Models: From Theory to Practice (6.S982): Spring 2025, CSAIL, MIT

Advanced Machine Learning: Topics in Unsupervised Learning: Spring ‘23, ECE, UT Austin

Publications

Ambient Proteins: Training Diffusion Models on Low Quality Structures

Ambient Diffusion Omni: Training Good Models with Bad Data

How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion

Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data

DataComp-LM: In search of the next generation of training sets for language models

Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Ambient Diffusion: Learning Clean Distributions from Corrupted Data

DataComp: In search of the next generation of multimodal datasets

Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis for DDIM-Type Samplers

Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be Consistent

Multiresolution Textual Inversion

Soft Diffusion: Score Matching for General Corruptions

Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems

Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve

Discovering the Hidden Vocabulary of DALLE-2

Solving Inverse Problems with NerfGANs

Robust Compressed Sensing MRI with Deep Generative Priors

Intermediate Layer Optimization for Inverse Problems using Deep Generative Models

SMYRF: Efficient Attention using Asymmetric Clustering

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models