Lees weergave

Distribution Release: Home Assistant OS (HAOS) 18.0

19 Juni 2026 om 21:38

The DistroWatch news feed is brought to you by TUXEDO COMPUTERS. Home Assistant OS, or HAOS for short, has been upgraded to version 18.0. HAOS is an independently-developed, Linux-based operating system optimised to run Home Assistant, an open-source home automation tool. It is available for several popular platforms, including Raspberry Pi and ODROID, as well as generic x86_64 and....

Reproducible Builds (diffoscope): diffoscope 321 released

Planet Debian

19 Juni 2026 om 02:00

The diffoscope maintainers are pleased to announce the release of diffoscope version 321. This version includes the following changes:

[ Chris Lamb ]
* Fix compatibility with Ocaml 5.4.1.

You find out more by visiting the project homepage.

Reproducible Builds (diffoscope): diffoscope 320 released

Planet Debian

19 Juni 2026 om 02:00

The diffoscope maintainers are pleased to announce the release of diffoscope version 320. This version includes the following changes:

[ Chris Lamb ]
* Support androguard 4 and previous versions. Thanks, linsui!
  (Closes: #1140016)
* Use --long-form arguments when calling apktool in order to support apktool
  version 3. Thanks again to linsui. (Closes: #1140015)
* Update copyright years.

You find out more by visiting the project homepage.

Under The Hood: In-Game Map QA

SCS Software

Door: Alex

19 Juni 2026 om 17:00

Creating Euro Truck Simulator 2 and American Truck Simulator is a collaborative effort involving many talented teams across SCS Software. While map designers, artists, programmers and more build the driving experience, another team works alongside them to ensure everything functions exactly as intended before players hit the road.

In this Under the Hood blog, we'd like to introduce you to two members of our In-Game QA team, Ivan and David. We asked them about their day to day work, how testing fits into the development process, why quality assurance is about much more than simply playing the game and more!

David - ATS Map QA Lead

"Hey, fellow truckers! My name is David, and I'm 28 years old. I joined SCS as a junior tester when I was just 20, and at the time, I was the youngest employee in the entire company. Today, I'm the QA Lead for ATS map testing. That means I organize and oversee the testing of all ATS map DLCs, communicate with the leads of our map design teams, solve the most complex issues and bugs we encounter, and simply be there for my team whenever they need help. Over the years, I've seen SCS Software grow from a team of around 100 employees into a company of more than 400. When I joined, we were working on the Oregon DLC, and it has been incredible to see how our development and testing processes have evolved and improved alongside our expanding game worlds."

Ivan - World Map Design QA Lead

"Hi everyone! My name is Ivan, and I've been with SCS Software for a little over six years. I started out as a junior tester, but soon after, I took on the responsibility of overseeing map testing for Euro Truck Simulator 2. Today, my role is World Map Design QA Lead, and I manage our entire map testing team, which currently consists of 20 people. Together, we oversee testing for both American Truck Simulator and Euro Truck Simulator 2. While my colleague Davincillo handles the day to day management of ATS, my main focus over the years has remained on ETS2."

When people hear "game testing" they often imagine that you simply get to play games all day. How different is the reality?

"Map testing is definitely not just playing the game all day. That's a classic myth. While the 'playing' aspect certainly has its place, it really only happens during the final stages of our testing process. The reality is far more methodical. We spend hours, or even days, testing one specific part of the map. We drive through the same stretch of road multiple times, checking completely different things on each pass while using different camera views and debug tools.

Simply playing the game is not enough to be a good tester. There is a specific skill set you need, these include attention to detail, a logical and analytical mindset, a good understanding of game industry standards, and a passion for making games more enjoyable for others. Communication skills are also vital because finding a bug is only half of the job. The other half is making sure the right people understand the issue. Ultimately, a good tester should save developers time. Instead of simply reporting that 'something is wrong,' a proper report explains the issue, how to reproduce it, what causes it, and potentially how it could be fixed."

What does a typical day look like for a QA Lead?

"Every day is a little different, but it generally consists of a mix of meetings, coordination, and oversight. Most of my time is spent assigning work, tracking testing progress, reviewing reported bugs, and regularly syncing with developers. Some days are calm and focused on planning, while others are all about solving unexpected, fast-moving issues. A large part of the job involves working closely with the team, discussing the bugs we find, figuring out the best approach, and deciding together what needs the most urgent attention."

What are some of the main things your teams are looking for when testing the game?

"It heavily depends on the stage of production. In the early stages, we focus mostly on the road network itself, its layout, and ensuring the drive is smooth. A big part of this phase is also checking the functionality of the economy and verifying the placement of game elements such as gas stations, companies, and truck dealers. In the later stages, our focus shifts to the AI's ability to navigate the road network, alongside visual polish, correct signage, and core gameplay. This is also when we examine performance across different areas to identify and fix any problematic frame rate drops.

Broadly speaking, we focus on almost everything related to the map. That includes road layouts and collisions, the job economy, gas station distribution, sleep areas and service locations, the UI map and its icons, direction blockers, road markings, traffic signs, speed limits, traffic lights, navigation and voice guidance, garage cutscenes, AI trajectories, triggers, quality consistency, scene logic, terrain, vegetation, world and country borders, asset collisions, gaps in terrain, floating objects, performance-heavy locations, environmental sounds and more!"

What do you enjoy most about working in QA?

"Being a game tester is a dream job for many people, and in many ways, it really is. There is an incredibly rewarding feeling in knowing that you're the safety net protecting the player's immersion and helping make the game better for everyone. It's deeply satisfying to watch a messy, broken build gradually turn into a polished world that millions of people will enjoy driving through.

When a new DLC is released and you see players talking about how smooth the roads feel, how great the scenery looks, or how well everything runs, it's a fantastic feeling. You can look at that and think, 'Yeah, my team helped build that.'"

When a new map DLC or major update enters testing, how do you approach such a large project from start to finish?

"The QA process often begins before production even starts. We provide early feedback on concepts to avoid known issues before development kicks off. Once production begins, we use an agile testing approach, working through multiple iterations throughout development rather than waiting until the very end to deliver one massive list of issues.

Our systematic testing process is divided into four iterations and an economy test. The first iteration focuses entirely on road layouts, ensuring roads, turns, and slopes are safely drivable, even with the longest trailers and low-power engines. The economy test then verifies that companies generate jobs correctly and that cities provide a healthy variety of destinations. As development progresses, later iterations shift towards visual quality, gameplay consistency, and overall polish.

To make testing manageable, we divide each project into smaller sections, sometimes resulting in dozens or even hundreds of individual tasks covering specific roads and cities. These are tracked throughout development, allowing us to revisit the same areas at different stages. We use maps, checklists, internal tools, and bug-tracking systems to ensure every square mile is covered, while also encouraging testers to explore freely because unexpected issues are often found where nobody would think to look."

Many players only see the finished product. Roughly how much testing goes into a map expansion, update, or feature before release? Does it differ depending on what needs testing?

"There is a massive amount of testing involved, and it differs greatly depending on the project. Smaller projects, such as special event maps, can be thoroughly tested in just a few days. On the other hand, a huge project like the Nordic Horizons expansion takes thousands of hours of rigorous testing before it is ready for release.

Every single road, city, company, gas station, sleep area, tollgate, and ferry is tested at least four times, with a different tester each time. To give some insight into the scale, our Mantis bug tracker recorded 6,849 reports for the Illinois DLC, while South Dakota has generated 6,318 reports so far. These reports range from tiny holes in the terrain that are almost impossible to notice to major bugs that can cause the game to crash. Every report is assigned a priority and severity level so that the most serious issues are addressed first."

How closely do QA teams work with map designers, programmers, artists, and other departments throughout development?

"We work very closely across departments because testing is integrated throughout the entire development cycle. As map QA, we collaborate most closely with the map design and art teams. While the majority of our day-to-day communication happens through reports in the Mantis bug tracker, we also actively discuss issues through private messages on our internal chat system, and arrange direct meetings whenever an issue is important enough. Our interaction with the programming department is mostly on a need-to-know basis, usually when there is an issue involving erratic AI behaviour or when a brand-new code feature is being implemented directly into the map."

What tools or methods help you track, reproduce, and report issues efficiently?

"We rely on several internal systems that are connected to one another to track individual bugs and the overall progress of a DLC. We use a specialised internal reporting tool that allows a tester to submit a bug directly from the game or the map editor into our central bug-tracking database. Within a few minutes, the report appears and can even be viewed directly inside the map editor itself. This allows map designers to immediately see the exact issue within their active workspace and resolve it much more efficiently, saving a significant amount of time throughout development."

If there's one thing you'd like for people to better understand about QA and the work your teams do, what would it be?

"We'd like players to understand that map testing is a highly skilled, technical job, not simply driving around looking at the scenery or casually stumbling across a floating tree. In reality, a good tester is part detective and part data analyst. If we come across a strange physics bump on a highway or see AI traffic piling up at a roundabout, we don't just report it and move on. We have to understand exactly why it's happening. Translating what is broken on the road into actionable, structured information that our developers can easily understand and fix takes time, patience, and deep knowledge of the game."

What is one aspect of QA work that you think players would be most surprised to learn about?

"Players would probably be surprised by just how much knowledge about the game and real-world infrastructure you need to become a good tester. Our team has to maintain a solid understanding of complex internal game rules, real-world traffic laws, and regional layout standards across different countries.

It's similar to the difference between someone who owns a truck and knows how to drive it and a mechanic who can remove the entire engine, take it apart piece by piece, and put it back together again. Becoming a highly skilled map tester can take years, and many testers naturally become specialists in certain areas of the game because they spend so much time working with those specific systems behind the scenes."

Have you encountered any particularly memorable, unusual, or funny bugs during your time at SCS Software?

Ivan: "Absolutely. Simulators have incredibly complex physics engines, and when things go wrong, they go wrong hilariously. It never gets old seeing an AI vehicle catapulted straight into space. Sometimes, our map designers also leave creative little surprises or jokes for us to discover during development, although we always make sure they don't make it into the live version of the game.

David: "One memorable moment happened while I was parking at a company prefab. I heard a train horn somewhere in the distance, and the sound kept getting louder until suddenly it was right next to me. The only problem was that there was no train there, and there weren't even any railway tracks nearby. A moment later, something invisible hit my truck and launched it all the way across the company. For a few seconds, I genuinely thought I had discovered a haunted company prefab."

How valuable are bug reports and feedback from the community when helping improve the game?

"Community feedback is extremely valuable to us. While our internal QA process is thorough, there are always issues that slip through, and players help us catch them by spotting details or inconsistencies that we might miss. What makes community feedback especially useful is the context players provide. Many are very familiar with the real-world locations we recreate, so they can quickly point out inaccuracies that would otherwise be difficult for us to notice. They also encounter a huge variety of gameplay situations, which helps surface edge cases that are hard to reproduce internally.

"In many cases, a well written report from the community can save us hours of investigation because players provide screenshots, videos, logs, save files, and clear reproduction steps."

Do you have a message for our community?

"A huge thank you for your support, feedback, and for riding along with us for so many years. It's an amazing feeling to work on a game where the players care just as much about the world as the people who build it. Your dedication pushes everyone at the studio to keep raising the bar with every new state, country, and feature. Safe travels, and we'll see you out on the road!"

We'd like to thank both David and Ivan for taking the time out of their busy days to chat with us about their roles in QA and how the team plays such an integral part in bringing our truck simulator titles to life. We hope you've learned a little more about the work that goes on behind the scenes. If you enjoyed this edition of Under the Hood, be sure to leave them a message in the comments below or on our social media channels. Until next time, keep on truckin'!

Wouter Verhelst: Agentic coding and Free Software

Planet Debian

19 Juni 2026 om 14:09

Through work, I have paid license to windsurf (recently renamed to "devin"), an application for LLM-based (aka, "Agentic") development.

I hadn't been using it that much, but in an effort to more clearly understand how this whole AI development thing works, I decided to give it a closer look recently.

My conclusions:

In its current form, this whole LLM wave is problematic for multiple reasons. But ignoring that, and looking at the technology only, I can say that:

it is a paradigm shift;
it is, at the technological level, a positive evolution;
and it is a threat to free software.

Problems

Lest someone (incorrectly) assume that I am arguing in favour of the current state of affairs with regards to LLMs, let me state this first.

The way LLMs are built today is highly parasitic. Websites are downloaded in whole, at unsustainable rates, regardless of the consent of the people who made the original content. The result is predictable: servers get overloaded, server administrators attempt to implement various mitigations. Some of these mitigations work; some do, for a while; some are entirely useless. In actual fact, the mitigations are an arms race -- if too many people implement the same mitigation, then the people who try to build yet another LLM so they can extract rent will just try to work around the mitigation, eventually they will succeed, and you'll just have to come up with another mitigation. It's a bit like spam; you introduce regex-based spam filters, they introduce spelling mistakes, you introduce bayesian filters, they add a large batch of markov chain-generated semi-nonsense words made invisible by markup, you add filters to block emails with such markup, they move the text into an image. We have working mitigations today, but eventually we'll run out of ideas.

LLMs glob up everything they can while ignoring the license of the source material. The people who push those LLMs claim that pushing the source material through the machine learning algorithms makes the output of the algorithm distinct enough from the source material that the license no longer applies; I'm not so sure that this is true. I guess the New York Times v OpenAI lawsuit will teach us some of the answer to that question here, but even so the ethical questions about "is it OK to bring down another server just so we can download the internet for another for-pay LLM" are still open. And regardless of what the law states, my opinion on "you're using my copyleft code to generate code under a different license" is not something you might like if you agree with the rent seekers' opinion on the subject.

That all being said and true, the technology works. You can have a "conversation" with an LLM that resembles a human one. If you pass it some data, you can use plain english to ask it questions about that data, which is a lot easier than to ask it about that in a formal way. You can request it to generate some code, and it will generate something that looks like what you need and that will be mostly correct for like 95% of the time.

Now, yes, 95% of the time is not 100% of the time, and no, you can't ask it to "write me a piece of software that implements this 300-page requirements document and get back to me when you're done", because it will fail, and you won't know where it has failed, and you'll take it into production and expect everything to be fine because it won't and this one minor logic bug will cause half your servers to spin and consume credits with your infrastructure provider with nothing to show for it.

But that doesn't mean you can't use an LLM to build a large piece of software. It just means you have to understand the LLMs limitations and strenghts, and use them correctly.

Here's what an LLM is good at:

Generating plausible text
Interpreting text to figure out what a plausible meaning or summary of that text is
Giving vague indications as to what the probable context of a given body of text is.

It turns out that that's enough to use the LLM to build a reliable piece of software, provided you do it right.

Paradigm shift

An LLM can generate text by the truckful. The generated text could be code. Given a good enough LLM, the generated text might even run and do something useful.

You can try to blindly run the code, and if it doesn't run correctly, you can paste the error message to the LLM, and it can tell you what went wrong and how you could possibly fix it. This creates a feedback loop: you ask it for an amount of code, you run the code, you receive an error, you tell it that the code is problematic and give it the error message, it makes changes to the code, now you have something that at least no longer fails at startup.

If you ask it to add tests to make sure that your code acts as per your specification, now you get an error if and when the code doesn't act as per your specification. Or, well, at least not as per the part of the specification that was correctly turned into a unit test by the LLM.

LLMs have a context window, so if the error message is pasted in the same conversation as where the code was generated, it is able to reuse the earlier prompts to refine how it should interpret the error message that you received.

You can't really paste the source code of an entire application into the prompt of your LLM, that would quickly overrun its context window. But LLMs also allow you to provide some form of background information -- a document, say -- on which you ask it to reason. It will interpret that document, but doing so uses less of the LLMs context window. So providing the LLM with your application's source code as background information can help it understand better how your code interacts. This is especially helpful if you only provide the LLM the background information relevant to the actual question.

So now if you are able to:

Create background context with your application's source code
Have the LLM generate a first draft of your requested change, plus the tests to make sure it works
Compile (if applicable) the generated code (and tests) and run said tests
Return any error messages to the LLM with a request to correct the error

Then the combination of "getting it 95% right off the bat" and the above feedback loop means you can generate syntactically correct code, that probably does what you need, in minutes.

I say "probably" for a reason. There are going to be cases where you specify a request without a number of details (because they are implied), and the LLM will get most of those details right but just not implement the one bit because it's an automaton and it doesn't think. Or you will ask it to make sure that two bits of the application look exactly the same, without specifying that they must act the same, now and in the future, and it will just generate the same block of code twice and then in a future change it will change one but not the other.

But if you review the changes, and you have experience as a programmer, you will be able to spot most cases where the LLM got it wrong. And so it's possible, if not necessarily easy at first, to use an LLM to generate mostly correct code.

There are certain places where "mostly correct" code is not desireable. But equally, there are also cases where, "mostly correct" is good enough.

After all, most of the software you run today -- the bits of it that weren't, yet, generated by an LLM -- is only "mostly correct", too, because to err is human and we all make mistakes. If not, there wouldn't be any CVEs and your software would never do anything wrong.

Now, doing the feedback loop described above is certainly something you could do manually. You could open an account on one of the LLM websites, upload the source code of your application, ask it to generate some new feature, download the newly generated feature, run it, and then copy/paste any error messages back into the LLM.

But that's a lot of manual work of the type that computers are pretty good at. So that's what the "windsurf" tool helps you with: you run it inside your IDE -- either a VSCode-based tool that you download from their website which comes with their product preinstalled, or a separate JetBrains plugin that you can install. You can then open your entire relevant codebase in a workspace in your IDE. You then ask the LLM, through the IDE, to generate a new feature in your codebase, and to also generate the test while it's at it. It will use a mixture of LLM interpretation and non-LLM functionality to scoop out the relevant bits of your codebase to send to the LLM as background information, will send it your prompt, will download the generated code and patch or create files, will compile (if required) and run the newly generated code and tests, and will refine the generated code if the tests produce any errors. All mostly automatic; by default, running anything requires explicit confirmation. You can turn that off completely (probably not a good idea), or you can give it a whitelist of things that you don't want to confirm (perhaps OK), and the tool also passes standing instructions to the LLM to never generate any command that deletes a file (which, like with any LLM, can be overridden, but it requires you to be very stubborn and to use more credits than you'd probably like).

All this put together means you can build something without writing any piece of code, provided you do it right.

A technically positive evolution

Don't go and say, "here's a 300-page document, read it and write whatever the document says". It will get it wrong, it will write a massive test suite that it will only run at the end, it will choke itself up trying to interpret the massive amount of failures it encounters, it will fill up its context window and it will start to forget some of the requirements. That won't work.

But what you can do -- what I did, in fact -- is this.

First, create an empty workspace. Don't put any code in it.

Then, tell the LLM to generate a backend framework using technology X and a frontend framework using technology Y that initially only says "hello, world". Also add tests to it, and run the tests.

It will do that. You'll not get much, but it will work.

Then, ask it to add some UI elements. A login page, perhaps. A navigation bar. Small things. Most of it doesn't have to be functional -- but tests must be there for the bits that are, and have it run the tests and evaluate the results.

Rinse, repeat, until you have a working application.

Importantly, in between the steps, you should also run the application yourself and see if the change was implemented correctly. Sometimes it won't be. Sometimes there will be a subtle bug -- I at one point had a the application hang after a few minutes. Sometimes you tell it that there's a subtle bug, and it will discover it more quickly than you could, and it will fix it, and in implementing the fix it will uncover another bug, and then you have to fix that one -- the fix it came up with for the hang was to move something to an async process on the server, which caused the application to start spinning while trying to create hundreds of async jobs (this is when I realized that the hang was a deadlock due to some part of the codebase doing something that indirectly triggered itself). Sometimes it will try to fix the bug you tell it about, and you'll see that it's going off on a tangent that has nothing to do with what you're seeing. It's important to keep an eye on what it's doing, so you can guide it back on track when that happens -- when I told it about the hang, it started investigating the part of the code which sends out emails, thinking that it could hang while waiting for sendmail to finish, but the hang was happening when the application was idle, not when it was sending out emails, and only when I told it about it happening when it was idle did it find the deadlock.

So it's not a fully automatic process, and it needs to be guided by someone who knows what they're doing. But if that is the case, you can come up with something that works. I spent evenings and breaks for about a week, and I managed to create a working application which, had I written it by hand, would have taken me a few months of full-time work to come up with. And I now have a side project, fully complete and working, that I had been thinking about doing for more than a decade, but never got around to actually doing, because of all the work that would be involved and I just didn't see myself having the time for.

It's not perfect code. But it's mostly good enough, and it will perform the job it needs to. And it looks far slicker than most of the side projects I've done in the past, because in the past I would prioritize between implementing new features or making something look slick, and I would decide that the new feature was more important because it's only for me and there's only me and nobody cares if it looks good or not and I don't have three weeks to come up with something that looks better. But here, I found myself sometimes spending 10 minutes writing a prompt with instructions on making things look better. Because what's 10 minutes when you just spent an hour writing down and refining specifications for functionality and tests?

There are a number of other things in which an LLM can help a programmer.

For instance.

I received a bug report recently in a project I'm paid to maintain that I couldn't make heads or tails of. I opened the source code in my windsurf IDE, pasted the bug report in the prompt, and then requested the tool to analyze the source code and the associated logs and tell me how the described behavior could be happening. It turned out that I had overlooked something, but with the help of the tool, I found the bug in minutes.

I was trying to understand a particular part of a large codebase that I didn't really grasp very well. I loaded the codebase in the tool, and asked it to explain to me how a particular action is performed by the code. I requested specific functions and line numbers. I now have a far better understanding of how the code works, and will be able to write that patch that I've been wanting to write for years -- without using the LLM.

I have been struggling for, literally, years with understanding why another tool that I maintain was misbehaving in a particular way but only in Firefox. I opened the codebase in Firefox, explained the buggy behavior in plain English, and asked it to explain how this could be happening. It picked up some obscure corner case behavior of ffmpeg and mp4 containers that I was not aware of and that perfectly explained why things were misbehaving in the way that they were.

At the same time, there are limitations. Giving an LLM a codebase that was originally generated by an LLM (either the same one or another one) seems to work well. Giving it a codebase that was written by a human and expecting it to correctly update it seems to be more error-prone. I did one or two of those as a trial, and it is more problematic than anything.

An LLM is also not intelligent, notwithstanding the popular term of "Artificial Intelligence". On multiple occasions, I've asked it to write a test case for some code that was not set up to do so; and rather than suggesting a refactor is required, it would instead copy the code that needed to be tested and then test the copy, rather than the original. The tool has made multiple similar errors. I have sometimes people describe agentic coding as "similar to interacting with junior programmers", but that is not the case. A junior programmer will either fill in the gaps in your specifications, or ask for clarification when something seems off. The LLM will not do that; it will do what you ask, exactly that and nothing more. If you missed a corner case in your specification, then all bets are off.

I remember learning about programming language generations in college. A first-generation language is "machine code", a second-generation language is "assembler", a third-generation language is any high-level language such as C, Perl, or Pascal. I've forgotten what set a 3rd-generation language apart from a 4th-generation language. But I remember the definition they gave me for a 5th-generation language: "you tell the computer what to do, and it will do it". At the time, I thought it was ridiculous. Nobody could ever write something like that.

But it's here.

And it's a threat to free software.

A threat to free software?

Yes.

There is the obvious part where most of the well-known LLMs are non-free software. I mean, there are some "open source" LLM models. The windsurf tool that I used doesn't allow you to use them (directly), but they're there. There are also open source applications that implement what the windsurf editor does. So it's definitely possible to work like this without resorting to non-free software and non-free services, even though the non-free LLMs might be a bit ahead of the curve of the free ones. But that's not what I mean.

And there is also the obvious thing which I mentioned earlier in this post, which is that the people who try to build LLMs are doing it in unethical, disgusting ways, causing downtimes and disregarding licenses for whatever they can get their grubby hands on. Ideally we wouldn't be in that situation, and ideally this wouldn't be a problem, but we are where we are.

And there's the obvious thing where the OSI sold itself out and declared that a machine learning program can be open source even when the very things it was built from -- the training data -- is not available. That's a major issue that the free software community needs to fight against, but there's not really anything that that is a threat to free software. You just build your own, free software, LLM, and you're done.

The actual threat is in funding and developer support.

Most large businesses do not care about free-as-in-freedom software. They like the free-as-in-beer part, and they appreciate that the free-as-in-freedom bits can make the software more customizable. They are (mostly) happy to do sponsorships of the free-as-in-freedom projects that they use if that means their free-as-in-beer usage of the software gets improved.

But why would you care about all that when you can just generate the code you need, rather than interacting with an open source community that may or may not care about your business's interests?

Where to go from here

Although I think the moral and environmental issues with LLMs are real and problematic, given the experiments I did I am not convinced that the concept of interacting with a computer system in natural language and to use it to generate code is necessarily deficient. There are pitfalls, but they can be managed. It is possible to use such a system to create throwaway, proof-of-concept type "good enough" code bases. It can be used to interpret code bases and to understand bug reports.

I believe that the major issue with LLMs has to do with that saying about hammers and nails:

If all you have is a hammer, then everything looks like a nail.

LLMs are an outgrowth of machine learning, pushed by large corporations. These large corporations have a lot of money. If all you have is money, then every problem can be fixed by throwing more money at it. The initial language models were promising but not (yet) good enough, and it seemed that one way in which they could be improved was to increase the scale of the statistics: throw more hardware (and thus money) at it, and rather than improving the efficiency of the models, just scale up.

Scaling up is something that megacorporations are very good at. It's only a money problem, after all. Does that mean that "scaling up" is the only way to improve the models, though? I'm not convinced.

Some hardware, such as most modern Apple and Samsung devices, ship with accelerator hardware for machine learning algorithms. There are some models that are small enough to be able to run on these devices. I don't see why it should not be possible to create a small(er) language model that can do some useful part of the above-described use cases; if not locally, then at least on a server that one can run on-prem rather than requiring that you pay rent to one of the LLM companies.

The Software Freedom Conservancy has published an aspirational statement on machine learning-assisted programming that, I think, gets a lot right. It's not quite a definition, but it's something to keep in mind.

Perhaps that's the way forward?

More questions than answers at this point, anyway.

Junichi Uekawa: looking for last.

Planet Debian

19 Juni 2026 om 07:30

looking for last. I realized it's gone. what's my replacement?