What is Tug

tl;dr

The Tug Thesis: To get better at large-scale refactors, AI coding assistants need tools that supply language semantics and a codebase-wide view they don’t have. Tug aims to provide that.

The project also has a meta-project angle. The way the work is pursued and accomplished is as interesting—and perhaps as instructive—as the actual work product. The journal tracks this progress.

The main meta-project is working with AI and figuring out what it’s good for—especially when it comes to software development.

Origin Story

In December 2025, I started working on a new software project to mash up some ideas from some software I like a lot: JSON, Polars, jq, DuckDB. The idea was to framework that can process tree-structured data, store it in files, re-load it super fast, query it, reshape it... the works. Rust core with Python bindings. In about five weeks, working with Claude Code and Cursor with GPT 5.2, I wrote over 200k lines of code and 8000 tests. I think the software was, and is, cool. (Note to self: I should open-source this project.)

But I also decided to set it aside for now. My reason: the end goal was too abstract. I still don't really know who the software is for and who might want to use it.

Even so, I made three interesting discoveries:

  1. WOW! You can write a lot of code these days using AI coding assistants.
  2. Because the LLMs relieved me from having to concentrate about every line of code that needed to get written, I could think at a more high level about the work I was doing. This allowed me to push boundaries more. I had more cognitive overhead available, which I used to discover new things, like: write amplification, Multiversion concurrency control/(MVCC) , Optimistic concurrency control (OCC) , and the Arrow columnar format. This was great fun.
  3. Writing that much code in such a short amount of time led to the need to refactor way more than would have been necessary if the code evolved more slowly and incrementally. I found I needed to move things around, do mass renames of file and functions, trim off dead ends that weren't worth keeping.

This last discovery hit me while I was watching Claude Code do its work in the terminal. 1 I saw how the AI assistants really struggled at refactoring. They could write code fluently—almost effortlessly—but then couldn't always read back what they had written. The problem became more profound as the number of lines of code involved became larger, which makes sense, given the size of context windows. However, there was also a struggle with the semantic and conceptual connections between code, which led to observations like this:

Ever see your AI use grep for code "analysis" or sed/awk to do a complex refactor? It sees your software as text and not as code! Ewww!

In a way, this makes perfect sense. To the AI, software is just lines of code. It doesn't know what the software does. Code is merely a lexical entity, a stream of tokens.

This brings me to the Tug Thesis: To get better at large-scale refactors, AI coding assistants need tools that supply language semantics and a codebase-wide view they don’t have. Tug aims to provide that.

This thesis seems like a more generally useful idea to work on as a project. It has the proper mix of generality, specificity, and the need for some work at a sufficient technical depth to be interesting, while also being achievable. Not only that: I want this software myself. This means I can use myself as a guide for how to manage the project, its goals, and its features.

To start, the first target language is Python, and to provide a set of useful refactorings, like: rename-symbol, extract-class, move-module, etc., and then use them myself over the code I'm working on while I write it.

Much work to do. Read the journal for updates, thoughts on the work, the meta-projects and their (hopefully interesting) side stories.

  1. I watch Claude Code a lot. In fact, I rarely look away. It's like a show being put on just for my benefit. A command performance.