LLMs for Software Developers (notes from my talk at NWRUG)
I recently gave a talk at the North West Ruby User Group about how I use LLMs for software development. This was an update on a previous demo I had given on Claude Code.
We didn't record the talk, but here are my (adapted) notes.
LLMs for Software Developers
This is an update on how I use LLMs - mainly Claude and Claude Code - in my day to day software development life. It's a follow-on to the demo I did a few months ago, but what I do now is very different to how I used them then.
A quick history lesson
I think I look at LLMs slightly differently to many other people. This is because I never did learnt formal computer science or software engineering; my degree was in Cognitive Science, as I was (and still am) interested in cognitive neuroscience, linguistics and philosophy of mind.
However, I've been a professional software developer for thirty years now; that's because there have been two occasions when computers have had a profound influence on my life.
Firstly, when I was a kid, in the 1980s, my dad got a PC from work. It ran DOS, so nothing was graphical. I got extremely frustrated trying to teach him how to use the word processor; he could not grasp that "blue text on-screen" meant "prints out in bold" and "green text on-screen" meant "prints out double-width". My friend Ben's dad was an academic and they got a Mac. I remember walking in to their front room and seeing it - and as my (no doubt embellished) memory recalls it, Ben's dad was teaching Ben how to use the computer. This was sorcery - and since that moment, I've always tried (but often failed) to make the interfaces to my software as friendly and accommodating of human sensibilities as possible.
Secondly, when I discovered Ruby and Ruby on Rails, I loved it because the APIs were designed to look like english. Not bad considering neither the language nor the framework author were native english speakers. The underlying issue is that code is easy to write but hard to read - so if you can make your code read like english, it reduces that little bit of friction in your head. Which, in theory, should make the code more understandable and more maintainable.
Then, I was watching the latest series of Black Mirror and there's an episode, called Eulogy, starring Paul Giamatti. He has to prepare some memories and recollections for a funeral and is guided through the process by an AI device. It's a great episode with a strong emotional pay-off. But I also realised that the AI device was basically doing a project configuration and data gathering exercise. In a few years time, the idea that you would have to manually search through stuff, learn a load of settings and options and organise your information by hand, will seem antiquated. A computer can simply have a conversation with you, ask the relevant questions, sift out the unimportant stuff and then put the relevant data into the right places.
Two very important ideas there - making computers more accessible to people who do not understand how they work and code being harder to read than it is to write.
Ethics
It's not possible to discuss LLMs without mentioning the ethics of these things.
Most importantly, be careful who you listen to. I was reading one very angry blog-post the other day, slating LLMs, saying anyone who uses them was a misguided fool. Then at the end, the author mentioned that their experience with them was 2 hours pasting some Python code into ChatGPT (they didn't say, but I assume it was the free version, using older, less capable models) and reading Google's famously terrible AI search result summaries. On the other hand, you have loud tech-bros who think "AI" is the second coming of crypto when actually they're just massive arseholes.
With regards to energy usage, it's hard to say - because the "AI" companies are generally private so do not need to break down their spending. What we can do is think about the energy usage in two forms: inference and training. With regards to inference, even if they are subsidised, the costs for using OpenAI or Anthropic's APIs are probably a good guide - and they have been trending downwards with each model that is released. However, for training costs, the attitude is "MOAR MOAR MOAR". But I feel that's probably because these companies are VC funded, meaning they are using the billions of dollars that were released to already rich people after 2008 which never reached the normal economy. They want to justify their existence and so engage in this huge dick waving contest over who can spend the most. The Chinese Deepseek models caused such a stir because, even if you can't trust the their costings, they must be a fraction of the cost of the American models. Simply because the Chinese do not have access to the same power hungry hardware.
There's a narrative doing the rounds that AI is already causing massive job losses. I'm not sure this is true; I think it's just being used as cover for job-cutting. But there's a good chance that it will cause job losses in the future. That's because technology always creates change. In the 1980s with desktop publishing, in the 1990s with the web, in the 200s and 2010s with mobile - they all destroyed entire industries, but also created new ones. Back in the 1800s Luddites smashed up machines - but not because they were anti-technology. It was because the technology was dehumanising and they had no other way to get their lofty overlords, the factory bosses, to listen.
With regards to copyright, I have a different view to most people. I absolutely believe you should get paid for what you create. But I also grew up in a time of musical "remix culture" - dub reggae, hip hop, British rave and house music - they were all sampled liberally from other people's ideas but they created brand new forms, the likes of which had never been heard before. The real issue I have with copyright violations is who those violations benefit.
LLMs are owned by VCs, tech bros and Silicon Valley giants. These people live in a different world to the rest of us, they don't look at ordinary people as something they need to worry about (with the likes of the transhumanists and effective altruists openly encouraging their followers to ignore the suffering of billions today because we may be able to alleviate the suffering of hundreds of billions in 20,000 years time).
If it weren't for Big Tech, I would have no problem with the copyrighted training material. If LLMs were publicly owned, I wouldn't mind about the energy usage. 35% of the US stock exchange is in tech stocks and the value is rising because of LLMs - investors are pouring money into a technology that is demanding ever-growing capital expenditure and is yet to show any signs of making a profit. It's a bubble that's about to burst - and when it does, it will be the ordinary folk, who don't have billions, who will suffer.
The real problem here is how power and control is distributed across society. Which is as it's always been.
How I use LLMs (summer 2025 edition)
The most important thing that I've learnt about LLMs is that you have to control their "context". LLMs are stateless - as you have a "conversation" with them, the previous messages are sent back and forth, between you and the LLM, growing in size every time a new question or reply is added. The maximum size of the state they can pass is called the "context window". Newer models have bigger context windows. You might think that bigger is better, but there's actually a sweet spot.
Too little context and the LLM will "fill in the gaps" by delving into its training data - and if it can't find something directly relevant, it will choose the next best thing. This phenomenon is commonly known as "hallucinations" and is one of those things people, who don't really use LLMs, use as proof that these things are useless - when actually you've not given it enough information. But too much context and the LLM gets overwhelmed. It gets stuck in loops, it goes off on a tangent and needs the context clearing before it can do anything useful.
That's why there's a new, emerging, discipline called "Prompt Engineering". Some people scoff at this - "it's just a sentence, how can that be engineering?" Well, it may not be engineering in the strict sense but it's definitely more than "just a sentence" - it's how you make sure that the context that the LLM is working with is just the right size, with the important details it needs without any of the extraneous stuff. And arguably, like engineering, you need to understand the constraints and test the results to ensure that they are within acceptable tolerances.
When using Claude Code, I use the following files to control the context.
- CLAUDE.md - Claude's instruction file. Claude can generate this for you, but I find it puts too much in there. Mine basically says:
- This is a Ruby on Rails application
- Use
bundle exec standardrb --fix
to run the linter - Use
bin/rails spec
to run the full test suite - Use
bin/rspec path/to/file_spec.rb:LINE_NUMBER
to run an individual spec - Details on the models and application structure are in
docs/glossary.md
- Details on coding conventions are in
docs/style-guide.md
- Glossary - describing the structure and ubiquitous language used in the application
- Style Guide - conventions and notes on how the code itself is structured (for example, using Phlex components or always using resourceful routes)
- Commands - Claude Code has a
commands
folder, where you can define specific commands (prompts); more on these later.
Note that the CLAUDE.md file is very short but includes references to the other files. If the LLM needs to know which models to look it, it can read the glossary, if it needs to write code, it can look at the style guide - but it won't load those files into its context unless they're necessary.
Writing Code
Ever had a ticket that says something like "the customer wants more widgets"? It's not really helpful - why do they want more widgets, what are they trying to achieve. So I've written some prompts for bug reports and feature specifications that ask the reporter for more details. Unlike filling out a generic form, the LLM has been instructed to ask certain questions based on the previous answers - plus it comes across as a conversation, so is much more natural for non-technical users.
When I'm designing a new function I often ask the LLM for advice. Especially when it's something technical, such as an external API. Recently I had to amend the contents of Word document's XML. The specification was a 5000 page PDF (of which I have read about 600). But the LLM immediately knew which tags and structures I needed to look at.
Every now and then I need to make a change to a load of files. I could figure out a load of regexes, look up the syntax for sed
and awk
and write a script to do it. Or I can say "find all ruby classes that do something like this ... then adjust them like this and if they should be name-spaced, update the module and move them to the correct folder". Not an exact regex in sight.
In fact the LLM can correctly respond to instructions that are nothing like regexes - such as "find classes that are structured like this one". I used this to do a major refactoring on a large application. The test suite took over 40 minutes to run, but I knew how to speed it up. The problem was there were over 30,000 test cases - I just couldn't face doing the work. So instead, I updated a couple of the specs myself and got the LLM to do it for me. "Look at how I've edited these files and make the same changes across all the rest of the specs - after each one, run the linter, then the individual test, fixing if needed; then move on to the next spec". I started it running on Friday afternoon, by Sunday evening, the entire test suite took less than 15 minutes to run.
If you've got tests, the LLM is really good at bug fixing. I spent 3 hours banging my head on the desk trying to fix a weird routing error; then I asked Claude Code. At first it tried a load of things I had tried, then it "thought" "the issue is to do with the routing, so I'll replace that". And the test passed! I looked at what it had done; it replaced documents_form_path(@form)
with Rails.application.routes.url_helprs.documents_form_path(@path)
. I'm not really sure what caused the routing problem - but I wasted 3 hours while Claude fixed it in under 10 minutes. And as for that Word XML processor - Claude can spot typos and issues in the XML in seconds.
Speaking of tests, I often write entire specifications first. "I need a new class ... it will do X, it won't do Y it will do Z". Then I go through them and make each one pass, one at a time. Except now I often ask Claude to make them pass - it goes away and writes code, running the test suite until it's got a working implementation. In a few cases it's ended up with code than I would have done. And even in the cases where it doesn't, the tests pass so I know it's safe to ship and safe to refactor later.
Reading Code
Remember, code is easy to write but hard to read. At least for humans. LLMs are actually quite good at reading code. So you can ask it "how does this work?" when looking at a new project, and it will probably give you a decent answer.
However, the thing I hate most about reading code are code reviews. So I thought I'd see if I get the LLM to do the boring bits for me. I added in a "code_review.md" file to Claude's commands folder which detailed the process for code reviews:
- Read the issue from Linear (our issue tracking system)
- Read the project style guide
- Run the linter and ensure all tests pass
- Do a diff between the feature branch and the develop branch
- Briefly evaluate if the diff implements everything required in the issue ticket
- Check that the changes match the style guide
- Perform a security check on the changed code - ensure all endpoints have automated tests to verify authentication and authorisation
- Ensure that the project glossary, README and other documentation has been updated to include details of these changes
- Make a final recommendation on the changes:
- Accepted - the code meets the requirements
- Accepted with UI Review - the code meets the requirements but includes user-interface changes, so requires a visual review
- Rejected - the code does not meet requirements and should be returned to developer with feedback
It doesn't mean I can just trust Claude's code review. But it does do a lot of the boring, tedious stuff and tell me how much effort I need to put in. If Claude says it's OK, I do a scan of the diff to see what it's missed. But if Claude says it's not, I dive in and do a full review. Also, Claude updates the documentation for me.
Integrating LLMs into Rails applications
RubyLLM
The RubyLLM gem helped me understand how to make these things useful. I'd read the various APIs and bits of documentation but nothing really made sense till I saw RubyLLM's ruby-ish interface. It lets you connect to an LLM's API, send it messages and receive responses. In one of our projects, I started with some simple tasks - "if this image does not include ALT text then ask the LLM to summarise the image" and "extract the keywords from this document" (which then get used to generate a Postgres full text search index).
Then I got started on a more complex task - importing a PDF document. The instructions included "see if there is an existing document with this filename; if there is then add a new revision otherwise create a new document". To make this work, you add in "tools" - functions that the LLM can call. These are implemented as ruby classes with an execute
method; they also include a description and list of parameters (and types). When you start the RubyLLM chat, you pass it the tool instances and, as the LLM is working, it decides if it needs to make a tool call, based upon its current context and the tool descriptions.
MCP
The next step was to try building a "conversational" interface. As I mentioned at the start, this is the thing that could be transformative for computer interfaces. To make that work, I investigated the Model Context Protocol - a very simple JSON API that runs over stdio
or streaming HTTP/SSE. The key thing about it are that it also includes discovery (think an OpenAPI specification but much much simpler).
The Fast-MCP gem is rack middleware that implements a streaming HTTP/SSE server inside your rack application. You add in "resources" (such as documents) and "tools" (functions that the LLM can call, just like for Ruby-LLM), the gem then publishes these and the protocol allows the LLM to discover which resources and tools are available and then call them whenever needed. In addition, MCP includes OAuth2 - so if a resource or tool returns a 401, it tries OAuth2 discovery and then asks the user to authenticate.
Unfortunately I've only had time to do a very basic investigation into Fast-MCP (one resource and one tool) - but I had added OAuth2 discovery and client registration to my application (which the doorkeeper gem does not include), which is an important starting point. But it looks pretty simple; the only thing that I'm not sure about is the best way to organise a lot of repeating functionality - HTML controllers, JSON API controllers and MCP tools and resources.
King Ludd
As I said at the start, the two big things about LLMs are that they are very good at reading code and they could be the basis of a computer interface that is much less alienating for a lot of people. But there still remain a lot of questions about this technology and the situation is changing very fast.
However, I'm from Nottingham; we call ourselves the Rebel City, because of Robin Hood, the Civil War, Brian Clough (and now Evangelos Marinakis) and the Luddites. I'm happy to call myself a Luddite - they didn't hate the technology, they wanted the owners of that technology to stop treating them with contempt.
And that's how I feel today. Embrace the technology, but don't trust the people who own it.