About Archives Writing Tags

Binary Classification: Intro to Convolutional Neural Nets

I've long felt a gap between my practice as a software engineer and my interest in software engineering.

The Gap Wasn't the Math

When you come out of a master's program in physical chemistry, you leave with a pretty deep relationship with linear algebra and calculus. Eigenvalues aren't abstract to you——they're how you solved the Schrödinger equation, how you decomposed spectra, how you understood the shape of molecular orbitals. Gradient descent isn't a mystery either. It's just optimization, and you've been doing that since your first potential energy surface. So the mathematical foundations of machine learning were never the barrier for me. The barrier was that I became a software engineer instead, and spent years going deep on distributed systems, API design, and infrastructure rather than data science.

I've been following the field, though: reading papers on attention mechanisms, watching the discourse around transformers, keeping an eye on what's actually changing versus what's hype. But following isn't the same as doing. I wanted to be intentional about engaging with data science as a practice, not just as a spectator sport. So I built catdog: a small, end-to-end Python project that trains a convolutional neural network to tell cats from dogs.

Classic problem, I know, but that's exactly the point. I wanted the dataset to be boring so the architecture could be interesting.

Standing on MobileNet's Shoulders

The core idea is transfer learning, and it's what made this project click for me. Instead of training a massive image classifier from scratch (which would've eaten my weekends), I pulled in MobileNet as a frozen base model. MobileNet already knows how to see edges, textures, shapes because someone at Google trained it on ImageNet. I just needed to teach it one new trick: is this furry thing a cat or a dog?

On top of MobileNet, I wired up a small custom head: a GlobalAveragePooling2D to collapse the spatial dimensions, a Dense(64) with ReLU, a Dropout(0.5) to keep things honest, and a final softmax layer for the two-class prediction. That's it. After reading and re-reading chapter 3 of "Practical Deep Learning for Cloud, Mobile, and Edge" by Koul, Kasam, and Ganju, that's it. The whole trainable surface is tiny, which means even with just 500 training samples, you can get a usable model out in minutes.

Turning Knobs, Building Intuition

The data pipeline uses Keras's ImageDataGenerator, which was a good learning exercise in itself. I set up augmentation on the training side——rotations, shifts, zoom——to squeeze more generalization out of a small dataset. The validation generator gets no augmentation, because you want your test conditions to be clean. If physical chemistry taught me anything, it's that your controls matter as much as your experiment.

One thing I appreciated was how tangible the abstractions became once I had to make real choices. After picking an optimizer (Adam), a loss function (categorical crossentropy), a learning rate (0.001), I understood the math behind all of these, but I'd never had to sit with the consequences of choosing one over another. (Maybe, in another post, I can get into the relationship between 'choice of activation function' and 'decision boundary shape') Watching accuracy climb epoch by epoch gave me an intuition for convergence that no amount of reading about loss landscapes ever did. It reminded me of watching a titration curve snap. You know the theory, but seeing it happen in front of you is different.

The Software Engineer Can't Help Himself

I also wrote a predict script and a convert module, because I wanted to understand the full lifecycle——not just training, but inference and model serialization. The conversion script freezes the graph into a protobuf file, which taught me how TensorFlow represents models as computation graphs under the hood. The mental model of weights just being constants baked into a graph was satisfying in the way that a good physical chemistry derivation is satisfying: everything reduces to something concrete.

I packaged the whole thing with setup.py (dated, I know) and proper console scripts so train and predict work as CLI commands. That's the software engineer in me: even a learning project should have a clean interface. If I can't pip install it into a virtualenv and run it from the command line, it's not done.

This project didn't make me a machine learning engineer. But it bridged two parts of my background that had been running in parallel for too long: the math I learned in grad school and the engineering discipline I've built since. Transfer learning, in a way, is a good metaphor for what I was doing: I didn't start from scratch, I brought a pretrained foundation and fine-tuned it for a new domain.

If you're like me, someone with the quantitative background who's been adjacent to data science but never quite in it, I'd encourage you to be deliberate about crossing over. Pick a toy problem, write the pipeline end-to-end, and resist the urge to use a high-level wrapper that hides the interesting parts. The math you already know is in there, waiting for you to recognize it.