Hi, I’m Akhil

I started this page to write about software and machine learning.

I studied Electrical Engineering with minors in Math and Physics at Rutgers University, where I focused on signal processing, digital design, and embedded systems.

I then did my Master’s in Computer Science at Cornell University, specializing in machine learning. There, I developed and deployed both vision and language models while also studying representation learning, optimization techniqiues, and theoretical foundations of modern AI systems.

I’ve previously interned at Johnson & Johnson (Risk Technology), JPMorgan Chase (Asset & Wealth Management), Crewcial Partners (Endowment Fund Management), Herbert J. Sims (Wealth Management), and Clifford Beers Clinic.

I’ve also done research in image noise reductions using VAEs (variational autoencoders), auto-labeling for supervised learning, and animal movement tracking under various environments using computer vision.

I’m currenlty a new grad engineer @ a Fortune 500.

Feel free to reach out to me at akhilvaragamreddy@gmail.com or linkedin.

I like explaining concepts with clarity and making hard ideas intuitive.

→ Read my posts

CUDA really isn't that bad: Tiling, Fusion, and Triton

TL;DR: This post dives deeper into advanced CUDA performance techniques from tiling and pipelining to occupancy tuning, loop unrolling, and warp-aware memory access. I explain how real-world matrix multiplies use shared memory and grid-stride loops to handle massive inputs efficiently, and how tricks like operator fusion and double buffering unlock GPU throughput. We also look at how OpenAI’s Triton gives you Python-first control over writing custom fused GPU kernels. The repsitory with the corresponding PyTorch implementation is available here. ...

CUDA really isn't that bad: Kernel Ops and Memory Hierarchy

TL;DR: This post demystifies the core concepts behind CUDA, walking through how GPU kernels work, how threads and memory hierarchies are structured, and how to write and launch a kernel. It’s a hands-on introduction to CUDA for engineers who want to understand how deep learning frameworks really run under the hood (and how writing a few lines of CUDA can unlock massive speedups). The repsitory with the corresponding PyTorch implementation is available here. ...

Designing & Building an Agentic AI framework from scratch

I’ve seen a lot of people use pre-made agentic frameworks to handle various tasks but I wanted to create a framework from scratch to see the brain, actions, and tools behind these agents.

Tesla's Robotaxi > Google's Waymo: Vision vs. LiDAR

In the coming years, I believe Robotaxi will scale like Uber, while Waymo will scale like Lyft - if at all. This reflects a broader paradigm shift: lean vision neural networks vs. bulky, sensor-heavy autonomy.

Deploying a toy ML model to production

I’ve trained hundreds of models in school and on my own, but barely any of them were served or pushed to production - I wanted to document what usually happens after training.

Fun with LoRA: How low-rank can we go before adjacency matrices break down?

How far can you compress adjacency matrices before everything falls apart? I pushed LoRA to its limits and watched adjacency melt into noise.

Designing AlphaTicTacToe (and AlphaTicTacToeZero)

AlphaGo blew my mind and AlphaGoZero was the cherry on top. Recreating this in a simple environment was a long term dream of mine.

Can a EMNIST model run on an Amazon Kindle from 2012?

What happens when you take a modern ML model, quantize it like crazy, and try to deploy it on decade-old hardware (a potato)? I tested EMNIST on the edge — literally.

Building Transformers from scratch: Multi-Head Attention, LayerNorm, and the brain behind ChatGPT

I felt like I would never understand how LLMs really generate text until I actually create a transformer from scratch and actually code up multi-head self attention with non-modular, functional-style PyTorch.

How did we get to Transformers? The rise of Attention

Attention is really useful but just like any great invention, it was built out of necessity for the problems at hand. In this post, I aim to dive into the issues researchers were dealing with between 2013-2017.

Hi, I’m Akhil#

Hi, I’m Akhil