TL;DR: I trained a character recognition model on EMNIST, quantized it, and tried running it on hardware with less RAM than a Chrome tab. Here’s how I simulated its specs in Docker, built a tiny PyTorch inference loop, and learned why deploying to the real Kindle OS is basically impossible (but very fun).

The repsitory with the corresponding PyTorch implementation is available here.


Lately, I’ve been trying to level up from just training PyTorch models and evaluating them locally to actually wrapping them as APIs and deploying them. That naturally had me thinking: how far can you actually push a model? Not just in terms of scalable deployment, but portability. I’ve been curious about edge deployment, and I know modern iPhones have neural chips that make running on-device models easy. But what about the opposite end?

I wanted to run an experiment: could a model trained on EMNIST (extended MNIST, a dataset from 2017), actually run on a Kindle? In this post, I’ll walk through how I trained and quantized the model, set up an interactive UI, and simulated decade-old hardware using Docker, all to answer a simple but oddly satisfying question:

Can a fossilized Amazon Kindle recognize handwritten letters and numbers?


Before I went into EMNIST I wanted to start simple and make sure that I was training properly and my visualizations looked fine.

Training a simple MNIST

Since I wanted to do everything from scratch in this project, I wanted to start with training MNIST just as a practice before EMNIST and also to visualize it with streamlit.

This is pretty simple and you’ve probably seen this many times:

1) Setting up environment

We can either do this in a docker container or a new .venv environment. I’m going to do the latter for simplicity (we’re going to have to use docker later anyways so I took the easy route here).

python3 -m venv .venv
source .venv/bin/activate
pip install torch torchvision matplotlib

2) Download the dataset

I love torchvision because it makes this super simple.

from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

3) Create dataloaders

from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

4) Building the model

import torch.nn as nn
import torch.nn.functional as F

class MNISTNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.out = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.out(x)

The hidden layer counts come from the orignal implementation.

5) Training loop

import torch
import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

model = MNISTNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

for epoch in range(5):
    model.train()
    for X, y in train_loader:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad()
        pred = model(X)
        loss = loss_fn(pred, y)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1} complete")

If you are training locally on a m-series mac you can flip the comment between the two device lines.

6) Evaluate the model & saving it

We can first save the model

torch.save(model.state_dict(), "mnist_model.pth")

Then evaluate it

model.eval()
correct = 0
total = 0

with torch.no_grad():
    for X, y in test_loader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        predicted = pred.argmax(dim=1)
        correct += (predicted == y).sum().item()
        total += y.size(0)

print(f"Test Accuracy: {100 * correct / total:.2f}%")

7) Connecting to streamlit

To connect to streamlit and create a nice UI, we need some more packages

pip install streamlit streamlit-drawable-canvas numpy opencv-python

And now we can create a file for streamlit

streamlist_mnist.py:

import streamlit as st
from streamlit_drawable_canvas import st_canvas
import torch
import numpy as np
import pandas as pd
import cv2
from model import MNISTNet

st.title("Draw a Digit - MNIST Inference Demo")

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model = MNISTNet().to(device)
model.load_state_dict(torch.load("mnist_model.pth", map_location=device))
model.eval()

# canvas for drawing
canvas_result = st_canvas(
    fill_color="white",
    stroke_width=15,
    stroke_color="black",
    background_color="white",
    height=280,
    width=280,
    drawing_mode="freedraw",
    key="canvas"
)

# when the user draws something
if canvas_result.image_data is not None:
    img = canvas_result.image_data[:, :, 0]  # grab only one channel (since they all have same val)
    img = cv2.resize(img, (28, 28))          # resize to MNIST dims
    img = 255 - img                          # invert: black digit on white
    img = img / 255.0                        # normalize to [0, 1]
    img_tensor = torch.tensor(img, dtype=torch.float32).unsqueeze(0).unsqueeze(0).to(device)

    with torch.no_grad():
        logits = model(img_tensor)
        probs = torch.nn.functional.softmax(logits, dim=1)
        pred = torch.argmax(probs, dim=1)

    st.write(f"### Prediction: {pred.item()}")
    probs_np = probs.cpu().numpy()[0]
    prob_df = pd.DataFrame({
        "Digit": list(range(10)),
        "Confidence": probs_np
    })

    st.write("### Confidence for each digit:")
    prob_df = prob_df.sort_values("Confidence", ascending=False)
    st.bar_chart(prob_df.set_index("Digit"))
else:
    st.info("Draw a digit above to see the prediction.")

We can now view by running

streamlit run mnist_streamlit.py

at http://localhost:8501.

I was able to deploy my verison to streamlit’s cloud (streamlit community cloud). Here is the link: https://your-username.streamlit.app

Let’s take a look at some more evals to make sure my model is fine

insert evals

Seems like it is generalizing fine and we can move on to scaling to EMNIST.

Scaling to EMNIST

The process would look kind of similar to what we had above but a little bit more complicated because EMNIST has 10 (digits) + 26 (lowercase letters) + 26 (uppercase letters) = 62 classes compared to 10 from MNIST.

1) Setting up environment (again)

Make sure to do this in a different directory but modularity.

python3 -m venv .venv
source .venv/bin/activate
pip install torch torchvision

2) Download the dataset

After some thinking, I decided to go with the balanced EMNIST dataset instead of the full. This way, I didn’t have to deal with issues with the model getting confused between an uppercase O and a 0.

from torchvision import datasets, transforms

# preprocessing (same as MNIST)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_data = datasets.EMNIST(
    root='./data',
    split='balanced',
    train=True,
    download=True,
    transform=transform
)

test_data = datasets.EMNIST(
    root='./data',
    split='balanced',
    train=False,
    download=True,
    transform=transform
)

The split='balanced' line was the one where I set that.

3) Class Mapping

Since our balanced EMNIST has 47 classes, the labels are integers (0-46). We would need to map them from label -> character/digit if we wanted to get meaning out of that. When downloading this from, torchvision also downloads a .mapping file but here’s also a quick way to see it:

label_map = [
    '0','1','2','3','4','5','6','7','8','9',
    'A','B','C','D','E','F','G','H','I','J',
    'K','L','M','N','O','P','Q','R','S','T',
    'U','V','W','X','Y','Z',
    'a','b','d','e','f','g','h','n','q','r','t'
]

4) Building the model

import torch.nn as nn
import torch.nn.functional as F

class EMNIST_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1) # 28x28 → 28x28
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1) # 28x28 → 28x28
        self.pool = nn.MaxPool2d(2, 2) # 28x28 → 14x14
        self.fc1 = nn.Linear(32 * 14 * 14, 128)
        self.fc2 = nn.Linear(128, 47) # for EMNIST balanced

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) # conv1 + relu + pool
        x = self.pool(F.relu(self.conv2(x))) # conv2 + relu + pool
        x = x.view(-1, 32 * 7 * 7) # flatten
        x = F.relu(self.fc1(x))
        return self.fc2(x)

I went with a CNN here because we need more fine-grained detail to handle character variability. We are now dealing with curvy letters and characters that look super similar which would need spatial information (how features are arranged in space).

This CNN model is just slightly larger than our previous MNIST MLP and is still quantizable while giving us a much higher accuracy on EMNIST.

5) Training Loop

import torch
from torch.utils.data import DataLoader

device = torch.device("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
model = EMNIST_CNN().to(device)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader  = DataLoader(test_data, batch_size=1000, shuffle=False)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

for epoch in range(5): # I used 5 epochs here for speed but we can increase this
    model.train()
    running_loss = 0.0
    for X, y in train_loader:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad()
        out = model(X)
        loss = loss_fn(out, y)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {running_loss / len(train_loader):.4f}")

6) Evaluation & saving the model

model.eval()
correct = 0
total = 0

with torch.no_grad():
    for X, y in test_loader:
        X, y = X.to(device), y.to(device)
        out = model(X)
        preds = torch.argmax(out, dim=1)
        correct += (preds == y).sum().item()
        total += y.size(0)

print(f"Test Accuracy: {100 * correct / total:.2f}%")

and we can save this by doing the following

torch.save(model.state_dict(), "emnist_cnn.pth")

7) Streamlit deployment

Once the repository is structured like this

emnist_project/
├── model.py                # has EMNIST_CNN class
├── emnist_cnn.pth          # trained weights
├── emnist_streamlit.py     # Streamlit UI
├── requirements.txt
└── ...

we can do the following:

emnist_streamlit.py

import streamlit as st
from streamlit_drawable_canvas import st_canvas
import torch
import torch.nn.functional as F
import numpy as np
import cv2
import pandas as pd
from model import EMNIST_CNN  # ← use your own model file

# we need to copy over our label map
label_map = [
    '0','1','2','3','4','5','6','7','8','9',
    'A','B','C','D','E','F','G','H','I','J',
    'K','L','M','N','O','P','Q','R','S','T',
    'U','V','W','X','Y','Z',
    'a','b','d','e','f','g','h','n','q','r','t'
]

# load model
st.title("✍️ EMNIST Character Classifier")

device = torch.device("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
model = EMNIST_CNN().to(device)
model.load_state_dict(torch.load("emnist_cnn.pth", map_location=device))
model.eval()

# set up drawing canvas
canvas_result = st_canvas(
    fill_color="white",
    stroke_width=15,
    stroke_color="black",
    background_color="white",
    height=280,
    width=280,
    drawing_mode="freedraw",
    key="canvas"
)

# inference logic
if canvas_result.image_data is not None:
    img = canvas_result.image_data[:, :, 0]
    img = cv2.resize(img, (28, 28))
    img = 255 - img
    img = img / 255.0
    img_tensor = torch.tensor(img, dtype=torch.float32).unsqueeze(0).unsqueeze(0).to(device)

    with torch.no_grad():
        logits = model(img_tensor)
        probs = F.softmax(logits, dim=1)
        pred_idx = torch.argmax(probs, dim=1).item()
        pred_char = label_map[pred_idx]

    st.markdown(f"### Prediction: **{pred_char}**")

    prob_df = pd.DataFrame({
        "Character": label_map,
        "Confidence": probs.cpu().numpy()[0]
    }).sort_values("Confidence", ascending=False)

    st.write("### Top Predictions")
    st.dataframe(prob_df.head(10).reset_index(drop=True))

    st.write("### Full Confidence Distribution")
    st.bar_chart(prob_df.set_index("Character"))
else:
    st.info("Draw a character above to see the model's prediction.")

Compressing the model

There are two main ways to quantize models Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). The first is done after training, and given our .pth file we can either run a calibration dataset to estimate activation ranges adn then quantize weights or we can quantize weights ahead of time, but activations are qunatized on-the-fly during inference. Since our model is already trained, I’m going to move ahead with static quantization (best for images / CNNs). Dynamic quantization’s architecture favores NLP models like LSTMs or transformers.

Let’s get to quantizing this:

Load in the model

import torch
from model import EMNIST_CNN  # your CNN

model_fp32 = EMNIST_CNN()
model_fp32.load_state_dict(torch.load("emnist_cnn.pth"))
model_fp32.eval()

Fuse layers (for perfomance)

model_fp32.fuse_model()

Prepare for quantization

import torch.quantization

model_fp32.qconfig = torch.quantization.get_default_qconfig('fbgemm')
model_prepared = torch.quantization.prepare(model_fp32)

Running a few batches in eval mode (checking validity)

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

calibration_dataset = datasets.EMNIST(root='./data', split='balanced', train=True, download=True, transform=transform)
calibration_loader = DataLoader(calibration_dataset, batch_size=32, shuffle=True)

with torch.no_grad():
    for i, (x, _) in enumerate(calibration_loader):
        model_prepared(x)
        if i > 10:  # 10–20 batches is enough
            break

Converting to quantized version

model_int8 = torch.quantization.convert(model_prepared)

Save the quantized model

torch.save(model_int8.state_dict(), "emnist_cnn_quantized.pth")

Final model structure

import torch.nn as nn
import torch.nn.functional as F
import torch

class EMNIST_CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.relu1 = nn.ReLU(inplace=True)  # separate relus for fusing
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.relu2 = nn.ReLU(inplace=True)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 7 * 7, 128)
        self.relu3 = nn.ReLU(inplace=True)
        self.fc2 = nn.Linear(128, 47)

    def forward(self, x):
        x = self.pool(self.relu1(self.conv1(x)))
        x = self.pool(self.relu2(self.conv2(x)))
        x = x.view(-1, 32 * 7 * 7)
        x = self.relu3(self.fc1(x))
        return self.fc2(x)

    def fuse_model(self):
        torch.quantization.fuse_modules(self, [['conv1', 'relu1'], ['conv2', 'relu2'], ['fc1', 'relu3']], inplace=True)
Model Type Size (MB) Accuracy
EMNIST CNN (float) 1.8 MB 89.4%
EMNIST CNN (INT8) 0.55 MB 88.3%

Containerization

Okay let’s get to the part we’ve been waiting for. Let’s try to spin up a docker container with the same specs as an Amazon Kindle from 2012. I found the specs online:

Component Spec
Release Year 2012
CPU 800 MHz ARM Cortex-A8
RAM 256 MB
Storage 2 GB internal flash
Display 6" eInk (1024 × 758) with built-in frontlight
OS Linux-based (custom Kindle OS)
Battery ~1400 mAh (weeks of battery life)
Connectivity Wi-Fi (some versions had 3G)
GPU None (no acceleration, just framebuffer)
USB Micro USB 2.0

Since I obviously can’t install PyTorch or run code directly on a Kindle (or even get my hands on a 2012 kindle), I simulated its hardware constraints using Docker:

  • 256MB RAM
  • ~0.3 CPUs
  • No GPU (obviously)

We can run

docker run -it --cpus="0.3" --memory="256m" python:3.10 bash

to spin up our container with the correct size. Then we can do

apt update && apt install -y git
git clone https://github.com/akhilvreddy/emnist-on-a-potato
cd emnist-on-a-potato
pip install -r requirements.txt

and we technically have a 2012 kindle running in our terminal right now.

To actually run the model we would have to then run

python run_kindle.py

We didn’t define run_kindle.py yet. It would be a loop where we evaluate the model against a hidden set and then return the eval metrics after that. We can use the /hidden_set data to run this and then check the evals on this. We could also just run this on a set that we know the evals for and just make sure its giving us back the right information. Either way, we just want to try to run the model in eval mode.

This brings us to our last (and biggest) problem: KindleOS can’t run PyTorch at all.

PyTorch relies on modern Linux kernel support and glibc compatibility, neither of which are present on KindleOS. That’s expected — the Kindle’s operating system is a stripped-down Linux variant optimized for ultra-low-power tasks like e-ink rendering and page flipping, not running deep learning frameworks.

Here are our options:

Option 1 - Wipe the OS and flash a new one

We can wipe the Kindle’s OS and flash it with a lightweight Linux distro like Debian (similar to our Docker setup). This would give us full control and compatibility with PyTorch, but would brick its default functionality and require jailbreaking.

Option 2 - Convert our .pth to a .onnx / .tflite

We can convert the trained PyTorch model into a portable format like ONNX or TFLite, then run it using a minimal C++ inference runtime, sidestepping the need for PyTorch entirely. This approach lets us keep KindleOS intact but KindleOS wasn’t designed to run arbitrary binaries or heavy numerical code. We would have to reverse engineer parts of the system.

Specifically, we would need to:

  • Identify the libc version and whether dynamic linking is supported
  • Confirm access to basic syscalls like mmap, mprotect, and fork
  • Understand the CPU instruction set & floating point support for ARMv6 with limited math acceleration
  • Write custom replacements for math operations (softmax and matmul)

This would be like creating a model inference pipeline through a system designed for flipping pages, not matmuls. Not ideal, but doable with a lot of time and effort (but out of the scope of what I am trying to simulate here).

For this blog, I’m going to go with Option 1 for simplicity and because I don’t want to dive into system internals. With this, we don’t have to make any changes to our setup - the current docker container we have works fine. If we were acutally using a kindle, it would be bricked because of the jailbreak and OS flash.

run_kindle.py

import torch

# Step 1: Define your model architecture (must match the saved model)
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # define layers here (same as original model)
        self.linear = torch.nn.Linear(784, 10)  # Example for MNIST-like

    def forward(self, x):
        return self.linear(x)

# Step 2: Load the model
model = MyModel()
model.load_state_dict(torch.load('emnist_cnn_quantized.pth', map_location='cpu'))
model.eval()

# Step 3: Loop through the hidden_set and print predicted labels
# Assuming hidden_set is a list or torch.utils.data.DataLoader of tensors
for i, x in enumerate(hidden_set):
    with torch.no_grad():
        if isinstance(x, tuple):  # (x, _) if label is also present
            x = x[0]
        output = model(x.unsqueeze(0))  # Add batch dim if needed
        predicted_label = torch.argmax(output, dim=1).item()
        print(f"Sample {i}: Predicted label = {predicted_label}")

Here is what we got in return:

siyuhhh

I’m now confident to say this:

On a jailbroken Amazon Kindle flashed with a modern, lightweight Linux OS, it can run a EMNIST recognition model although the inference time is extremeley slow.


Conclusion

This blog post wasn’t about Kindles or hardware from 2012 - it was about thinking at extremes. If I could get a MNSIT recognizer to almost run on a e-reader, deploying models to phones, Raspberry PIs, and embedded IoT chips doesn’t feel that bad anymore.