I am a post-doc at MIT CSAIL, working with Prof. Costis Daskalakis and Prof. Antonio Torralba.

Prior to that, I spent 4 wonderful years at UT Austin, working with Prof. Alexandros Dimakis.

I received my undergraduate degree in ECE from the National Technical University of Athens.

I work on practical and theoretical questions revolving around deep generative models. A central thrust of my research is developing principled algorithms for training and sampling generative models in the presence of data corruption.

I am currently on the academic job market, looking for tenure-track faculty positions.

Email: gdaras [at] mit [dot] edu.

News

Internships

Talks

The most representative talk of my latest research is the one given at Columbia Engineering, as part of the Workshop on Emerging Trends in AI. Please watch the talk here.

For a full list of the talks that I have given over the years, see below.

MIT, ML Tea Seminar Talk
2025
Mila - Quebec AI Institute
2025
Ben-Gurion University
2025
Runway ML
2025
Applied Inverse Problems Conference (AIP)
2025
NTU Singapore
2025
Simons Institute Berkeley
Youtube Video
2025
Columbia University
Youtube Video
2025
Biomedical and Astronomical Signal Processing (BASP) Conference
2025
Harvard University
2024
Google DeepMind, London Office
2024
Grundfest Lecture series (UCLA + Caltech)
Youtube Video
2024
Learning on Graphs and Geometry (LoGG) Reading Group
Youtube Video
2024
Aalto University
2024
University of Wisconsin-Madison, MLOPT Idea Seminar
Youtube Video
2023
UT Austin, GenAI IFML Workshop
2023
EleutherAI Diffusion Reading Group
Youtube Video
2023
Uppsala University
2023
Archimedes Research Unit
2023
NeurIPS Workshop Oral Presentation
2022
Rice University, Imaging and Vision Seminar
2022

Teaching

Diffusion Models: From Theory to Practice (6.S982): Spring 2025, CSAIL, MIT

Co-designed and co-taught this graduate MIT class with Prof. Costis Daskalakis.

Link to class website.

Syllabus:
  • Lecture 1: Introduction to generative models and their applications (GANs, VAEs, Flows, Diffusion Models, and Inverse Problems).
  • Lecture 2: Deep dive in Diffusion Models (definition of the forward process, Itô integral, Itô formula, FP equation, reversibility, deterministic samplers, Tweedie's formula, Denoising Score Matching).
  • Lecture 3: Diffusion models discretization error analysis.
  • Lecture 4: Part I: Learning diffusion models from corrupted data, Part II: Likelihoods and Latent Diffusion.
  • Lecture 5: Flow Matching.
  • Lecture 6: Diffusion models and inverse problems.
  • Lecture 7: Schrödinger bridges.

Advanced Machine Learning: Topics in Unsupervised Learning: Spring ‘23, ECE, UT Austin

I was teaching assistant for this graduate UT Austin class and I gave several lectures.

Class description: This is an advanced class focusing on topics in unsupervised learning. We will cover classical and modern techniques for modeling high-dimensional distributions, including directed and undirected graphical models, learning graphical models using feature selection, submodularity, polytopes and combinatorial optimization, and inference in graphical models. Deep generative models covered include: Variational Autoencoders, Generative Adversarial Networks, Autoregressive models, Normalizing Flows, Diffusions, and score-based models. We will also explore problems in learning causal models and adversarial attacks.

Publications

For a more comprehensive list of publications, please visit my Google Scholar page.

DataComp-LM: In search of the next generation of training sets for language models

Published in NeurIPS 2024 Track Datasets and Benchmarks [Paper] [Code]

Citation: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar, "DataComp-LM: In search of the next generation of training sets for language models", NeurIPS 2024 Track Datasets and Benchmarks

DataComp: In search of the next generation of multimodal datasets

Published as an Oral in NeurIPS 2023 [Paper] [Code]

Citation: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt, "DataComp: In search of the next generation of multimodal datasets", NeurIPS 2023