Fork the repo.
Ask an LLM to rename all the variables and add comments and docstrings. Give it your style guide (assuming you have one).
Ask another LLM to check their work.
Done.
Disclaimer: I’m not a programmer, I’m a network engineer who dabbles in automation and scripting. But it seems to me that grunt work like this is what LLMs are really good for.
Also I only use short variable names inside of loops (for i in iterable…). Is that not how it should be done?
i and I are acceptable in small loops. But it depends a lot on the language used. If you’re in C or bash maybe it’s fine. But if you’re in a higher level language like C# you usually have built on functions for iterating over something.
For example you have a list of movies you want to get the rating from, instead of doing
for (i = 0; i < movies.length; i++) var movie = movies[i] ....Its often more readable to do
movies.forEach { movie -> var rating = movie.rating .... }Also if you work with tables it can be very helpful to name your iteration variables as row and column.
It’s all about making it readable, understandable, and correct. There’s no point having comments if you forget to update them when you change the code. And you better make sure the AI comments on the 2000 lines of three letter variables is correct!
Yeah I script more than anything…python, bash, powershell, etc.
Only terrible code I inherit is the stuff I wrote >=3 months ago. I’ll keep saying that three months from now, too.
In Go, the recommended convention for variable name length is to be proportional to their scope. It is common to use one or few letters long variables if they are local to a few lines loop or a short function.
The only experience I have like this is when I wanted to see how the ARMA Life mod was doing certain things, but it was programmed by like 20 different people in 3 different languages. Most of it was in German and French.
It was easier to just to find my own way of doing what I wanted to do.
longest file I have ever maintained contained 50,000 lines of code.
fifty THOUSAND.
forgive me for not weeping for 2000 lines.
my advice, don’t fucking touch it. pull out as much functionality out of it into other things over time.
there will come a day when you can throw it away. maybe not today, maybe not tomorrow… but some day.
Yeah, been there. The codebase I worked on also had a single method with 10k lines.
The database IDs were strings including the hostname of the machine that wrote to the DB. Since it was a centralized server, all IDs had the same hostname. The ID also included date and time accurate to the millisecond, and the table name itself.
Me: Mom, can we have UUIDs? Mom: We have UUIDs at home UUIDs at home: that shit
You should add the local weather forecast, a random fun fact and the canteen menu of the day to the key to make it more interesting to read.
I was working on a project that had 100 000 line oracle database PL/SQL procedure that ordered a work order from subcontractor. It was just one single function. That was called by classic asp + visual basic COM component.
Oh Lord, I get Vietnam flashbacks about it.
ARGH this triggered a bit of PTSD for me…
“We’re going to convert these COBOL applications to C#, and you need to test that the new application works exactly the same, including the same bugs as the old application.”
“Ok, where’s the specifications and test reports of the old COBOL applications?”
“They were lost to time, we don’t know where they are.”
“Ok, so how are the developers going to write the C# code?”
“They’re going to read the COBOL scripts and recreate them into C#, we advise you do the same.”Cue me spending a month trying to decypher the COBOL gobbledigook into inputs and outputs, and write testcases based on that. And after that month was up, and I had delivered my testcases, they told me that my services were no longer needed.
I had delivered my testcases, they told me that my services were no longer needed.
Gee, I wonder how all those specifications and test reports became “lost to time”…
more than half of the code is commented out but you’re not allowed to remove it
I didn’t even know we were hiring …
Honest question: would an LLM be able to write useful comments in code like this?
It would probably struggle to see the larger picture. I can see it being used to add comments in self-contained functions though without too much difficulty.
use the LLM to generate regression tests for the large file, then start refactoring it
100% I use them a lot to ingest and understand shitty code for me. Of course it’s not perfect, it’s like having a colleague who’s not super strong but has infinite patience for bullshit
Honest question: would an LLM be able to write useful comments in code like this?
It can be better han nothing, but not really. The LLM faces the same challenge that any competent coder does: neither were present to learn the human, business and organization context when the code was first written.
That time I started a new job and my first task was “fix bash”…and then I discovered a multi megabyte monstrosity called “bash.sh”
vomit
omfg that’s over 1 MILLION characters 💀 💀 💀

Allow me to introduce a shit ton of jQuery into all the jsp files you got.
I literally told my boss that I was just going to rebuild the entire pipeline from the ground up when I took over the codebase. The legacy code is a massive pile of patchwork spaghetti that takes days just to track down where things are happening because someone, in their infinite wisdom, decided to just pass a dictionary around and add/remove shit from it so there is no actual way to find where or when anything is done.
Side-rant:
I rarely write Python code. One reason for that is the lack of type safety.
Whenever I’m automating something and try to use some 3rd party Python library, it feels like there’s a good 50/50 chance that front and center in its API is some method that takes a dict of strings. What the fuck. I feel like there’s perhaps also something of a cultural difference between users of scripting languages and those of backend languages.What you described sounds so much worse though holy shit.
Yeah, the new pipeline is based HEAVILY on object inheritance and method/property calls so there is a paper trail for ALL of it. Also using Abstract Base Classes so future developers are forced to adhere to the architecture. It has to be in Python, but I am also trying to use the type hinting as much as humanly possible to force things into something resembling a typed codebase.
FUCK. Triggers me. Just got let go from a place that had this problem and wouldn’t let me make any changes whatsoever. I didn’t even push hard.
I did this once
I was generating a large fake dataset that had to make sense in certain ways. I created a neat thing in C# where you could index a hashmap by the type of model it stored, and it would give you the collection storing that data.
This made obtaining resources for generation trivial
However, it made figuring out the order i needed to generate things an effing nightmare
Of note, a lot of these resource “Pools” depended on other resource Pools, and often times, adding a new Pool dependency to a generator meant more time fiddling with the Pool standup code
The link is a proxied image link for some reason.

The next row would be “boss fires you thinking Claude can maintain the codebase.”
At least there’s a kind of happy ending when we walk past the old boss and don’t toss a dollar into his pan-handling hat.
I once worked with a guy who would actively remove everyone else’s comments any time he touched someone else’s code. Only comments he made during code reviews? “Does this comment need to be here?”. The code was a barren, commentless place.
Oh, it’s only the files that have over 2k lines of code? Hell, I’ll take that over what I’m dealing with now. I’ve got multiple FUNCTIONS that are over 2k lines. >:(
Yeah, I dont see a big problem with files over 2000 lines in some cases, as long as things remain well writrej, organized, abstractd.
One piece of garbage that I’ll never touch again hae most functions this size. One was 50,000 lines! Hundreds.of lines of if/else, half of the functions passed the same 60 arguments because he didn’t understand classes or even dictionaries, etc etc. And was used heavily.
well writrej
oh dear
Lol I’m leaving it
Yeah, honestly overly splitting things up is worse sometimes, that’s how you end up in Java land. Any time you want to grok a specific function you end up down 30 abstracted code paths. Essentially need a compiler to unroll it all to actually see what it’s doing.
Java was exactly the negative use case I was thinking of. Trying to track down the flow of things for code I don’t look at regularly drives me insane.
every programmer I’ve seen who says their code is self documenting writes dogshit code
I think we’re all just dogshit but think we’re better than the next person, it’s like driving. I’m a “comment if there’s no way to make it readable” kinda guy, I work with some “comment and don’t bother to make it readable because there’s comments” people. We all suck. I probably forget to comment on unreadable places sometimes, or overestimate readability he either doesn’t update comments so they’re out of date or the code is so gibberish that a comment didn’t help.
Ideally I guess you comment AND make it readable AND make sure the comments are up to date, but who do you think we are? Superman? And what’s the right level of commenting anyway? Probably depends on who is reading them.







