Hacker Newsnew | past | comments | ask | show | jobs | submit | 9cb14c1ec0's commentslogin

Is it just me, or has Claude Code gotten really stupid the last several days. I've been using it almost since it was publicly released, and the last several days it feels like it reverted back 6 months. I was almost ready to start yolo-ing everything, and now it's doing weird hallucinations again and forgetting how to edit files. It used to go into plan mode automatically, now it won't unless I make it.

Exactly. There is a big difference in code quality with state-of-the-art models versus 6 months ago. I'm strongly resisting the urge to run Claude Code in dangerous mode, but it's getting so good I may eventually cave.

The difference here is the qualitative difference that has existed between Google Search results and other competitors. Switching away from Google Search is a high friction move for most people. I'm not sure the same goes for AI chat.

I already don't use ChatGPT. I use OpenWeb UI with OpenRouter, and the API costs for my usage are peanuts. Switching to a different interface is so easy many people will. (You don't need to self host. T3 Chat, for example.) This is the difference between Google Search and ChatGPT.

There is a big waiting list for fab tools. You can't just spin that up out of nowhere. Modern chip fabs are the most complex things ever created, and till you spun up your own fab, supply and demand will have balanced out.

Also, how is nationalizing something pro-competition? Nationalized companies have a history of using their government connections to squash competition.


The stupid thing about the experiment was that it's never been a secret that the kernel is vulnerable to malicious patches. The kernel community understood this long before these academics wasted kernel maintainer time with a silly experiment.

Agree, to me this "research" is like proving grocery stores are vulnerable to theft by sending students to shoplift. If review process guaranteed that vulnerabilities can't pass, wouldn't that mean that the current kernel should be pristinely devoid of them?

Well I didn’t know and thanks to them now I know.

I believe most people believe that the Linux kernel couldn’t be compromised because there is multiple approval process and highly professional people vetoing.

It seems like a big vulnerability, if a teacher assistant could do that, there is no doubt that government agencies can too.


Quite interesting to observe PyPI being used as a distro agnostic binary package manager. Someone is going to create a NixOs competitor that uses PyPI for hosting and uv for installation.

I realize you are tongue in cheek, but I hope people respect the logical limits of this sort of thing.

Years ago, there were some development tools coming out of the Ruby world – SASS for sure, and Vagrant if I remember correctly – whose standard method of installation was via a Ruby gem. Ruby on Rails was popular, and I am sure that for the initial users this had almost zero friction. But the tools began to be adopted by non-Ruby-devs, and it was frustrating. Many Ruby libraries had hardcoded file paths that didn’t jive with your distro’s conventions, and they assumed newer versions of Ruby than existed in your package repos. Since then I have seen the same issue crop up with PHP and server-side JavaScript software.

It’s less of a pain today because you can spin up a container or VM and install a whole language ecosystem there, letting it clobber whatever it wants to clobber. But it’s still nicer when everything respects the OS’s local conventions.


I think golang in this context is better

Golang has really fast compilation time unlike rust and its cross compatible (usually, yes I know CGo can be considered a menace)

Golang binary applications can also be installed rather simply.

I really enjoy the golang ecosystem.


For those who like the idea but don't want to use someone else's bandwidth for it: the PyPI API is described across several PEPs and documented on-site (https://docs.pypi.org/api/); and a mirroring tool is implemented under PyPA stewardship (https://pypi.org/project/bandersnatch/).

But at the individual project level this definitely isn't new. Aside from the examples cited in https://news.ycombinator.com/item?id=46561197, another fairly obvious example of a compiled binary hosted on PyPI is... uv.


General Caine specifically said they utilized CYBERCOM (which is the US inter-branch hacking command) to pave the way for the special ops helicopters. I personally have no doubt that any (whether or not they all were) lights being out was due to a US hack. Some of the stuff that got blown up may well have been to prevent forensic recover of US tools and techniques.


I have no doubt they used cyberattacks and electronic warfare.

Trump just seems the worst person in the world to play a game of telephone with on such a subject.

For example: https://www.defensenews.com/air/2025/05/16/pentagon-silent-a...

> “The F-35, we’re doing an upgrade, a simple upgrade,” Trump said. “But we’re also doing an F-55, I’m going to call it an F-55. And that’s going to be a substantial upgrade. But it’s going to be also with two engines.”

> Frank Kendall, the secretary of the Air Force during former President Joe Biden’s administration, said in an interview with Defense News that it is unclear what Trump was referring to when he discussed an “F-22 Super,” but it may have been a reference to the F-47 sixth-generation fighter jet… Kendall said it is also unclear what Trump was referring to when he discussed the alleged F-55.


Also: “Everything’s computer!”


It's been well known to be a major part of world power war plans for like 20 years now. Yes, it's a terrifying concept.


> We all know that LLMs were used to find these vulnerabilities

How do we know that? You seem quite certain.


They pitch their company as finding bugs "with AI". It's not hard to point one of the coding agents at a repo URL and have it find bugs even in code that's been in the wild for a long time, looking at their list that looks likely to be what they're doing.


The list is pretty short though for 8 months. ossfuzz has found a lot more even with the fuzzers often not covering a lot of the code base.

Manually paying people to write fuzzers by hand would yield a lot more and be less expensive than data centers and burning money, but who wants to pay people in 2026?


Bugs are not equivalently findable and different techniques surface different bugs. The direct comparison you're trying to draw here doesn't hold.


It does not matter what purported categories buffer overflows are in when manual fuzzing finds 100 and "AI" finds 5.

If Google gave open source projects $100,000 per year for a competent QA person, it would cost less than this "AI" money straw fire and produce better results. Maybe the QA person would also find the 5 "AI" detected bugs.


This would make sense if every memory corruption vulnerability was equivalently exploitable, which is of course not true. I think you'll find Google does in fact fuzz ffmpeg, though.


Google gives a pittance even for full ossfuzz integration. Which is why many projects just have the bare minimum fuzz tests. My original point was that even with these bare minimum tests ossfuzz has found way more than "AI" has.


Another weird assumption you've got here is that fuzzing outcomes scale linearly with funding, which, no. Further, the field of factory-scale fuzzing and triage is one Google security engineers basically invented, so it's especially odd to hold Google out as a bad actor here.

At any rate, Google didn't employ "AI" to find this vulnerability, and Google fuzzing probably wouldn't have outcompeted these researchers for this particular bug (totally different methods of bugfinding), so it's really hard to find a coherent point you'd be making about "fuzzers", "AI", and "Google" here.


My guess is the main "AI" contribution here is to automate some of the work around the actual fuzzing. Setting up the test environment and harness, reading the code + commit history + published vulns for similar projects, identifying likely trouble spots, gathering seed data, writing scripts to generate more seed data reaching the identified trouble spots, adding instrumentation to the target to detect conditions ASan etc don't, writing PoC code, writing draft patches... That's a lot of labor and the coding agents can do a mediocre job of all of it for the cost of compute.


If it's finding exploitable bugs prior factory-scale fuzzing of ffmpeg hasn't, seems like a pretty big win to me.


For sure, and I think it expands the scope of what factory scale efforts can find. The big question of course being how to handle remediation because more bugs without more maintainer capacity is a recipe for tears.


[flagged]


I am a professional software developer and have been since the 1990s.


I can't speak to what exactly this team is doing but I haven't seen any evidence that with-robot finds less bugs than without-robot. I do have some experience in this area.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: