I Made AI Sort a Decade of My Bad Photos

 

The Problem

I had an old iPhoto library sitting on an HFS-formatted drive from my Mac days — about 49 GB of photos and videos spanning 2008 to 2014. Wedding photos, vacations to Ireland and Amsterdam, years of phone camera snapshots, screenshots, pet photos, food pics — the full spectrum of a decade of digital life.

I'd already extracted the files from the HFS drive onto my Windows PC using an HFS utility, which preserved the original iPhoto Masters folder structure organized by year. But now I had a new problem: I wanted to upload the photos that actually mattered — the ones with people in them — to cloud storage. And I was staring down 8,271 image files.

Nobody wants to manually click through 8,000 photos to decide "person or landscape?" one at a time. That's easily 10-15 hours of mind-numbing work. There had to be a better way.

The Idea: Let a Vision AI Do the Looking

I'd been running LM Studio on my local machine — it's a desktop app that lets you run open-source AI models locally on your own GPU. I had Google's Gemma 4 e2b model loaded, which is a multimodal reasoning model. That means it can look at images and answer questions about them, and it does its own chain-of-thought reasoning before responding.

The plan was simple:

  1. Iterate over every image in the old iPhoto library
  2. Send each one to Gemma running locally and ask: "Are there any human faces or people visible in this image?"
  3. If yes, copy the photo (with all its original metadata intact) to a new folder
  4. If no, skip it

No cloud APIs. No uploading my personal photos to anyone's servers. Everything running on my own hardware.

The Claude Code Prompt

"I want to have a curated set of photos that include peoples faces. I have years of old photos that are in an old iPhoto format that won't easily convert to iPhoto. I have a local model running "Gemma 4 e2b" that is running in LM Studio on this PC. I want you to solve the problem of curating the images for me. Create a solution that will use the local model, and create output folder for all photos with faces and those that do not. Save the results for me to review later for record counts and summaries. Do all the work, research this PC setup, install any dependencies, and do the work. Report back when done."

The Technical Setup

Hardware: My regular Windows 11 desktop PC with a consumer GPU

Software Stack:

  • LM Studio running Google Gemma 4 e2b (multimodal vision + reasoning model)
  • Python 3.14 with Pillow for image processing and requests for API calls
  • Claude Code (Anthropic's CLI coding assistant) to help write and debug the script

The script was straightforward. For each image:

  1. Open it with Pillow and resize to max 1024px (keeps the API requests fast without losing enough detail to matter for face detection)
  2. Base64-encode the resized image
  3. POST it to LM Studio's local OpenAI-compatible API endpoint at localhost:1234
  4. Parse the YES/NO response
  5. If YES, copy the original full-resolution file using Python's shutil.copy2, which preserves the embedded EXIF metadata, timestamps, and all other file attributes

One important discovery during development: Gemma is a reasoning model. It uses part of its token budget for internal "thinking" before producing a visible answer. My first attempt set max_tokens to 150, and Gemma would use all 150 tokens thinking and return an empty response. Bumping it to 4096 gave the model plenty of room to reason and still produce a clear YES or NO answer.

The script also saved progress to a JSON state file every 25 images, so if it crashed or I needed to stop it, I could resume right where I left off without re-processing anything.

The Solution as Built by Claude Code


The Results

The scan ran for about 14 hours — I kicked it off in the morning and it finished that evening. Here's what it found:

MetricValue
Total images scanned8,271
Photos with people4,964 (60%)
Photos without people3,305 (40%)
Processing errors2 (0.02%)
Average time per image~6 seconds
Total runtime~14 hours


Faces Detected Samples Images


What Got Filtered Out

The 3,305 rejected photos broke down into predictable categories:

CategoryCount%
Landscape & Scenery40512.3%
Animals & Pets2708.2%
Screenshots & Digital2688.1%
Water, Boats & Beaches2678.1%
Buildings & Architecture2457.4%
Gardens, Plants & Trees2046.2%
Vehicles & Roads1594.8%
Food & Drink1564.7%
Interior & Furniture1364.1%
Dark or Blurry Shots1043.1%
Statues & Art762.3%
Cemetery & Memorials240.7%
Other99130.0%


Rejected Photos Sample


This all makes sense given the source material. The Ireland and Amsterdam vacation albums contributed a lot of the architecture, canal, and scenery photos. Years of phone camera use added the screenshots, food pics, and pet photos. The "Other" bucket was mostly close-ups of objects, hands holding things, abstract textures, and images where Gemma simply said "no human faces visible" without elaborating.

Accuracy

After the scan completed, I copied both the accepted and rejected photos into separate folders and reviewed the results. The solution worked well. The model correctly identified people in a wide variety of conditions — formal wedding dinner shots, casual vacation photos, selfies, group photos, and even images where people were small in the frame like pedestrians on a street.

The model may have been slightly generous in what it counted as "people" — for example, tiny figures in the distance or profile pictures visible in phone screenshots. But for my use case, false positives were fine. I'd rather have a few extra scenic shots with distant pedestrians than miss actual photos of friends and family.

What It Cost

This is the best part.

ItemCost
Cloud API fees$0
LM StudioFree and open source
Gemma modelFree (open weights from Google)
Electricity~$1 estimated
Total~$1

Everything ran locally. My photos never left my machine. Compare that to cloud vision APIs where you'd be paying per-image — at typical rates, 8,000+ images would cost $10-40 depending on the provider. And you'd be uploading all your personal photos to a third party.

Lessons Learned

Reasoning models need token headroom. Gemma's internal chain-of-thought reasoning consumes tokens before the visible output. If you set max_tokens too low, you get empty responses. Give it room to think.

Simple prompts work best. My first attempt asked for structured JSON output. The final version just asked for "YES or NO, then a 5-word description." Simpler prompt, more reliable parsing, same accuracy.

Image-first message ordering matters. Putting the image before the text in the API message produced better results with Gemma than text-first.

Build in resume capability from the start. With a 14-hour runtime, things can and will interrupt the process. Saving state to a JSON file every 25 images meant I could stop and restart without losing progress. This took maybe 10 extra lines of code and saved real headaches.

Resize before sending. Sending full-resolution 4000x3000 camera photos to a local model is slow and unnecessary for face detection. Resizing to 1024px max dimension kept inference fast with no impact on accuracy.

Local AI is practical for batch tasks. A year or two ago, this would have required a cloud API or a complex self-hosted setup. Now it's "install LM Studio, load a model, hit localhost." The ecosystem has matured to the point where a non-trivial batch processing job over thousands of images is genuinely easy to set up and run on consumer hardware.

The Bottom Line

I turned a long, manual photo review task into a 14-hour unattended automated process that cost about a dollar in electricity. The photos are now sorted, the metadata is preserved, and I can upload the people photos knowing I'm not wasting cloud storage on 3,000 pictures of buildings, food, and my dog.

If you've got an old photo library you've been putting off organizing — and you have a halfway decent GPU — this is a very solvable problem in 2026.

Comments