# Inlining Images in Jupyter Notebooks

June 29, 2021

I like Jupyter Notebooks a lot. But so while linking in images in Markdown is quite handy, I sometimes want the Jupyter Notebooks to work on their own without zipping up a lot files. Because I could not find a tool that does this, I wrote a quick and dirty script to (approximately) inline images.

So Jupyter notebooks are saved as JSONNot that a text-only primary format wouldn't be much nicer, but that is for another day..

import json
import re
import mimetypes
import os
import base64
import copy



This gives us a top-level dict, in which the cells entry is a list of cells. Each is represented by a dict, with cell_type telling us whether it is "markdown". Then source is a list of lines(?) with markdown.

Now, properly parsing markdown is hard and brittle, but we only need approximate and so we will use regular expressions. Matching ![text](image.img) is reasonably easy, and the regexp we use is r'\!$([^$]*)\]$$([^$$]*)\)'.

But now, my markdown sometimes has HTML img tags, too. And these are extremely hard to parseActually, for that other day, I also have a HTML5 with Markdown-Like and Mathmode parser., but we try to get by with a regexp by assuming noone will use an escaped or quoted > before we find the src= attribute. Now the value of the src could be quoted in single or double quotes or not quoted. Also, space around the = is allowed. We are not necessarily interested in getting the entire tag, so ending after the source attribute is OK. Also, I don't believe I need to handle escaped quotation chars. This let's me think I might work with r"""(<img[^>]*src\s*=\s*)([^\s'"]+|"[^"]*"|'[^']*')""". Putting these in re.sub gets me something like

newjson = copy.deepcopy(origjson)
for c in newjson['cells']:
if c.get('cell_type') == 'markdown':
IMG_MD_RE = r'\!$([^$]*)\]$$([^$$]*)\)'
IMG_HTML_RE = r"""(<img[^>]*src\s*=\s*)([^\s'"]+|"[^"]*"|'[^']*')"""
re.sub(IMG_HTML_RE, replace_html_img, par)) for par in c['source']]

json.dump(newjson, open(os.path.splitext(fn)[0]+'_inlined.ipynb', 'w'))


Now we just need the replace_... functions. We take the groups from the two regular expressions, see if it doesn't start with data: (in which case it is already inlined) and the file is there. We check that we have the file and guess the mime-type. If everything works out, we create an img tag with data:-source and base64-encoded content. Being a quick and dirty script, I didn't care about abstracting the common bits. Of course, if you want to copy-paste this code, you would need to put it above.

def replace_md_link(match):
else:
if enc is None:
data = base64.b64encode(data)
enc = 'base64'
# sometimes it seems to want a newline to get the following paragraph right
return f'''<img src="data:{typ};{enc},{data.decode()}" alt="{txt}" />\n'''

def replace_html_img(match):
prefix, src = match.groups()
if src[:1] in '"'"'": # this string prints as "'
src = src[1:-1]
if src.startswith("data:"):
return match.group(0)
if os.path.exists(src):