Maschinelles Lernen Lernen

More memory-efficiency: conversion of weights for large models

June 29, 2023

Working with large language models has us wanting to make the most on invariably limited hardware. Today we consider converting weights and improve on memory efficiency for that.


Memory-efficient loading of model weights

April 18, 2023

I have been intrigued by what computers can do for most of my life, and these days that includes large language models (LLMs) running on my own computer. It seems that I am not the only one. But one thing about these LLMs is that they are not kidding when they say large. The amounts of GPU and CPU memory to run them is huge. And, to be perfectly honest, I have a hunch that PyTorch, my favourite modelling library, is not always as economical with memory as it could be.


Making model initialization faster

March 4, 2023

Recently, largeish pretrained models are the rage. But these take long to init. What can we do?


A PyTorch videocourse

Sept. 21, 2022

Earlier this year, I started recording a PyTorch course with the idea to finally offer a PyTorch 101 certificate for those interested. Today, I'm giving you the first lecture video and notebook for free.


What does it mean that Deep Learning is Non-Parametric?

March 19, 2022

Most days, I write here about getting my hands dirty with code and keep mathematical insights for my friends and clients. Today, we look at a very high, almost philosophical level, of deep learning instead and discuss a small piece of intuition about deep learning.


Interactive Visualization of PyTorch (JIT) models

March 16, 2022

PyTorch turned five years old recently. On this very day, 5 years ago, I joined the PyTorch forum and wrote my first post. To celebrate, I am sharing a an interactive model visualization in Jupyter Notebooks and GraphViz with you.


PyTorch style advice: Avoid inplace

Oct. 26, 2021

Every now and then, people wonder whether using inplace operations is a good way to reduce the memory consumption of PyTorch models. Let us look at this in some detail.


How many models is ResNet?

Sept. 7, 2021

What is the model when we are using TorchVision's ResNet? And how many?


Book reading group and community resources

Aug. 27, 2021

Sanyam Bhutani organizes a reading group for our book. I had the special honour of being able to chat about community ressources in PyTorch for the kick-off.


Launching my first online course: All about Autograd

July 30, 2021

Today, I'm launching my first course, all about autograd.


TorchDrift and Partial MMD Drift Detection

July 6, 2021

So I have not blogged about TorchDrift yet, even though I did a lot of writing and talking on it since we released it in March.


Inlining Images in Jupyter Notebooks

June 29, 2021

I like Jupyter Notebooks a lot. But so while linking in images in Markdown is quite handy, I sometimes want the Jupyter Notebooks to work on their own without zipping up a lot files. Because I could not find a tool that does this, I wrote a quick and dirty script to (approximately) inline images.


CoViD19 - Kommunikation, Verlässlichkeit und Unwägbarkeiten

Jan. 28, 2021

An Vorschlägen für CoViD19-Strategien mangelt es nicht. Aber wie sollten wir sie kommunizieren?


Exploring Python fallback for the JIT

Jan. 26, 2021

One of the key difficulty of the almost everything can be scripted promises is what to do with functions the JIT doesn't understand. In lieu of re-implementing all of Python we need to fall back to the Python we have selectively. Join me today in looking how that can be done.


Python graph operations in the JIT

Jan. 19, 2021

One part of achieving TorchScript's full potential is improving the developer experience, in particular, I would like to make it more accesible from Python. But to know the gap, we need to assess the (near-future, hopefully) status quo. I invite you to join me in this exploration.


TorchScript is what we have, ScriptTorch is what we need

Jan. 1, 2021

The PyTorch JIT and its TorchScript language cover important parts of PyTorch's core goals. But are we integrating them in the right way into PyTorch? Here is a wish and plan for 2021.


The JIT runtime - Calling a ScriptFunction

Dec. 28, 2020

In our series of PyTorch JIT blog posts, we take a close look at what happens behind the scenes when we call a TorchScript function from Python.


Optimizing models using the PyTorch JIT

Dec. 21, 2020

Today we look at TorchScript, the language implemented by the PyTorch JIT ("Just in Time compiler"), PyTorch's solution for deployment and model optimization.


Programming a Turtle in Python

Oct. 30, 2020

Today we'll do something totally different. We build a turtle graphics class in Python.


Coding Video: PyTorch C++ Anomaly Detection

Oct. 29, 2020

Join me implementing PyTorch Anomaly Detection for C++!


PyTorch Autograd Course

Oct. 28, 2020

Coming soon: The ultimate PyTorch autograd course.


Computed Parameters in PyTorch - a hack

Sept. 17, 2020

Sometimes, we want our neural network's parameters to have contraints, e.g. be positive. PyTorch notoriously doesn't provide an infrastructure for this. We present a gross hack to deliver a neat interface.


PyTorch best practices

Sept. 11, 2020

We look at the some best practices but also try to shed some light at the rationale behind it. Whether this becomes a series or an updated blog post, we will see.


A short guide to using other people's code

July 26, 2020

The other day, I noticed that someone had copied code from me and not cared much about licensing. I found that quite outrageous at first, but it seems that while I have some kind of history with licenses and people caring about it, it seems that not many people are aware of licensing. So here are a few thoughts.


Bridging PyTorch and TVM

July 14, 2020

Today we look at how to bridge PyTorch with TVM, using BERT as an example.


Visualizing PyTorch model structure

June 16, 2020

Today we visualize some common neural network structures.


Schätzung der Reproduktionszahl durch das RKI

May 19, 2020

Eine der aktuell meistdiskutierten Kennzahlen der Epidemie ist die Reproduktionszahl R und deren Schätzung durch das Robert-Koch-Institut. In diesem Artikel vollziehen wir - so gut wie uns möglich - die zugrundeliegende Schätzung der Neuerkrankungszeitreihe. Das erlaubt es uns auch, analoge Kennzahlen für Bundesländer zu berechnen.


Den Rückgang der Epidemie messen

April 26, 2020

Je nach Perspektive waren diese Woche die Lockerungsdiskussionsorgien oder aber das Wort der Aufreger.

In den meisten Ländern Westeuropas sind wir glücklicherweise, aber leider wohl auch erst mal über die Phase exponentiellen Wachstums der Fallzahlen hinaus. Für den aufsteigenden Trend der Epidemie haben die Zeitungs- und Online-Redakteure mit der Zeit anschauliche Kennzahlen und Graphiken aufgenommen oder entwickelt. Wie könnte man für die Phase der Reduktion jetzt sinnvoll die Lage und Entwicklung messen?
Kann man das nur mit Modellen oder auch relativ direkt auf Basis der Fallzahlen?


The long tail of the epidemic

April 21, 2020

Thankfully, the measures taken to reduce the transmission of the coronavirus seem to work.
In Germany (and elsewhere) we're seeing a discussion on lifting restrictions.
It would be good to inform the discussion with some quantitative picture of the trade-off here.


Epidemiology - Modeling the Spread of Diseases

March 18, 2020

Epidemiology is the science of the distribution, patterns, and determinants of diseases. We are interested in the distribution here. Like in many scientific fields mathematical modeling plays a large role. We look a a basic model for the spread of disease.


Epidemiologie - Modellieren der Ausbreitung von Krankheiten

March 17, 2020

Epidemiologie ist die Wissenschaft von der Ausbreitung von Krankheiten. Wie in vielen wissenschaftlichen Disziplinen gibt es mathematische Modelle, mit denen man versuchen kann, die Ausbreitung zu beschreiben. Wir werfen heute einen Blick auf ein grundlegendes Modell.



Feb. 10, 2020

I have been more than usually quiet here. This is not because I have been writing little, but because I have been writing much. I'm very thrilled to work with Eli Stevens and Luca Antiga on our book, Deep Learning with PyTorch.


Sponsored Open Source Report Oct/Nov 2019

Nov. 22, 2019

I'm experimenting with low-barrier sponsoring of some of my work via GitHub Sponsors. Here is the first report for subscribers.


Building PyTorch on ROCm

Nov. 15, 2019

Today we look at how to build PyTorch on AMD's ROCm.


LibreOffice Translate

Nov. 8, 2019

An extension providing offline Neural Machine Translation in LibreOffice Writer.


Keeping an Eye on the AI

Oct. 2, 2019

Explaining AI outputs has been a topic I have worked on implementing quite a bit. Last May I gave a talk Der KI auf die Finger geschaut (Keeping an eye on the AI) to mathematicians and actuaries at the University of Göttingen.


An efficient implementation of the Sinkhorn algorithm for the GPU

July 5, 2019

Today we look at the Sinkhorn iteration for entropy-regularised Wasserstein distances as a loss function between histograms.


Traceable and Differentiable Extensions with PyTorch

June 26, 2019

Three of the most liked features of PyTorch are the extensible autograd mechanism, the ability to extend PyTorch with C++ efficiently, and the tracing/scripting mechanism. Which leads to the natural question - can we have all at the same time?
In this post, we dive into the autograd internals and come out with a solution.


Tutorial case study: Fixing your first PyTorch bug

June 7, 2019

PyTorch is a great project and I have only met very helpful people when contributing to it. However, the code base can be quite intimidating. Here we look at fixing a simple bug in detail and see that it is a less daunting task than it might seem at first.


Fischertechnik TXT Proxy

May 24, 2019

And now for something completely different: In order to access the Fischertechnik Robotics TXT's camera functions under Wine, one needs to cope with the camera port being opened slowly. We provide a small Python proxy to solve this.


Exponentially weighted moving average and batch size

April 12, 2019

Exponentially weighted moving averages are used in several places in machine learning (often under the header of momentum). We look at the connection between batch size and momentum.


Optimiziation using Specialization of Arguments in the PyTorch JIT

April 8, 2019

In a second very technical PyTorch JIT article, we look at graphs, specialization, and the impact on optimizations in the JIT.


Fast LSTMs in PyTorch

March 16, 2019

Implementing fast recurrent neural networks is a challenging task. This is not only a hassle for training existing architectures - sometimes optimized implementations such as CuDNN's LSTM help there. More gravely, it also limits experimentation with new architectures.


PyTorch, JIT, Android

Dec. 14, 2018

This week, we had a PyTorch Meetup in Munich at Microsoft.
It was great to see more than 90 people visit for the two talks and PyTorch chat over Pizza and drinks afterwards! Piotr Bialecki gave a talk on semantic search on the PyTorch forums, and I had the honor of talking about PyTorch, the JIT, and Android.


PyTorch and Android

Oct. 27, 2018

Recently, I discussed the use of PyTorch on Mobile / IoT-like devices. Naturally, the Caffe2 Android tutorial was a starting point. Getting it to work with Caffe2 from PyTorch and recent Android wasn't trivial, though. Apparently, other people have not had much luck, I easily got a dozen questions about it on the first day after mentioning it in a discussion.

This should be easier. Here is how.


A selective excursion into the internals of PyTorch

July 28, 2018

The beauty of PyTorch is that it makes its magic so conveniently accessible from Python. But how does it do so? We take a peek inside the gears that make PyTroch tick.
(Note that this is a work in progress. I'd be happy to hear your suggestions for additions or corrections.)


Handwriting Generation using RNNs

June 26, 2018

Today I gave a talk on Alex Graves's classic RNN paper and what I took away from implementing the handwriting generation model in PyTorch. To me, the density of insights combined with the almost complete absence of mechanical bits as well as the relatively short training time, makes this a very worthwhile exercise that I can heartily recommend to anyone interested in RNNs.


Debugging CUDA device-side assert in PyTorch

June 15, 2018

The beautiful thing of PyTorch's immediate execution model is that you can actually debug your programs.
Sometimes, however, the asynchronous nature of CUDA execution makes it hard. Here is a little trick to debug your programs.


German LM for the Fast AI model zoo (work in progress)

June 4, 2018

At the excellent fast.ai course and website, they are training a language model zoo.

It's a charming idea and here is (not quite complete yet) code and model I got for German.


2D Wavelet Transformation in PyTorch

Oct. 29, 2017

The other day I got a question how to do wavelet transformation in PyTorch in a way that allows to compute gradients (that is gradients of outputs w.r.t. the inputs, probably not the coefficients). I like Pytorch and I happen to have a certain fancy for wavelets as well, so here we go.


More Improved Training of Wasserstein GANs and DRAGAN

May 29, 2017

This is following up on my post on improved and semi-improved training of Wasserstein GANs. A few days ago, Kodaldi et al published How to Train Your DRAGAN. They introduce an algorithmic game theory approach and propose to apply the gradient penalty only close to the real-data manifold. We take a look at their objective function, offer a new possible interpretation and also consider what might be wrong in Improved Training objective.
While doing so we introduce PRODGAN and SLOGAN.


Geometric Intuition on Improved Wasserstein GANs

April 13, 2017

We look at Improved Training of Wasserstein GANs and describe some geometric intuition on how it improves over the original Wasserstein GAN article.

Updated: We also introduce Semi-Improved Training of Wasserstein GANs, a variant that is simpler to implement as it does not need second derivatives.