Lernapparat

Did you ever wish to get a concise picture of your PyTorch model's structure and found that too hard to get?

Recently, I did some work that involved looking at model structure in some detail⊕More on this very soon!. For my write-up, I wanted to get a diagram of some model structures. Even though it is a relatively common model, searching for a diagram didn't turn up something in the shape what I was looking for.

So how do can we get model structure for PyTorch models? The first stop probably is the neat string representation that PyTorch provides for nn.Modules - even without doing anything, it'll also cover our custom models pretty well. It is, however not without shortcomings.

m = torchvision.models.resnet18()
m.layer1[0]

BasicBlock(
  (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

So we have two convs and two batch norms. But how are things connected? Is there one ReLU?

Looking at the forward method⊕You can get this using Python's inspect module (or ?? in IPython) print(inspect.getsource(m.layer1[0].forward))., we see some important details not in the summary:

def forward(self, x):
    identity = x

    out = self.conv1(x)
    out = self.bn1(out)
    out = self.relu(out)

    out = self.conv2(out)
    out = self.bn2(out)

    if self.downsample is not None:
        identity = self.downsample(x)

    out += identity
    out = self.relu(out)

    return out

So we missed the entire residual bit. Also, there are two ReLUs.⊕Arguably, it is wrong to re-use stateless modules like this. It'll haunt you when you do things like quantization (because it becomes stateful then due to the quantization parameters) and it's mixing things too much. If you want stateless, use the functional interface.

So I looked at bti around and not finding something doing what I wanted, I made up a small function producing a graph like this:⊕Made with make_graph(getattr(traced_model.layer1, "0")). I should say that I'm making the assumption that all interesting things happens in sub-modules and only show operation nodes when they are "junctions", i.e. has several inputs with distinct predecessors in the graph.

A high level representation look like this:⊕Produced with make_graph(traced_model, classes_to_visit={'Sequential'}).

More advanced vision models

We can also do this with TorchVision's FCN ResNet50 for semantic segmentation:⊕The graph can be made with make_graph(traced_model, classes_to_visit={'IntermediateLayerGetter', 'FCNHead'}), but tracing needs strict=False.

Segmentation models are also fun, here FasterRCNN:⊕This needed a trick to make tracing work, I wrapped it in a model taking apart the output dictionary, also tracing needed check_trace=False. Then the graphs were made with make_graph(traced_model, classes_to_visit={}) and make_graph(traced_model, classes_to_visit={'RegionProposalNetwork', 'RoIHeads'}).

BERT

Finally, we aren't restricted to vison models. Taking BERT from HuggingFace's great transformers library, we can make an overview⊕I used make_graph(traced_model, classes_to_visit={'BertEncoder'})

and zoom into a BertLayer:⊕make_graph(getattr(traced_model.encoder.layer, "0"), classes_to_visit={'BertAttention', 'BertSelfAttention'}) - I should send a PR to PyTorch to enable indexing there...

Conclusion

I'm having way too much fun with this. Some things where one could reconsider or extend the design:

The code for the visualizations

The code was on the wish list of my single github sponsor when I asked a while ago what code he would like to be published, so with all its limitations it is now available.

PyTorch training and consulting

Do you want to get some help getting models to do awesome things or generally give your PyTorch and Deep Learning skills a boost? I offer consulting and inhouse and public workshops for beginner, intermediate and PyTorch expert levels. If you are in near Munich (say, in Europe) and need PyTorch training, I love to hear from you! I also do bespoke development.

I hope this blog post is useful to you, I appreciate and read every mail you send to tv@lernapparat.de.

Visualizing PyTorch model structure

More advanced vision models

BERT

Conclusion

The code for the visualizations

PyTorch training and consulting