More

extr · 2026-01-10T06:32:50 1768026770

Had this problem awhile ago of my zsh startup being slow. Just opened claude code and told it to benchmark my shell start and then optimize it. Took like 5 minutes and now it's ultra fast. Hardly any idea what it did exactly but worked great.

extr · 2025-12-21T07:13:38 1766301218

I think people fool themselves with this kind of thing a lot. You debug some issue with your GH actions yaml file for 45 minutes and think you "learned something", but when are you going to run into that specific gotcha again? In reality the only lasting lesson is "sometimes these kinds of yaml files can be finnicky". Which you probably already knew at the outset. There's no personal development in continually bashing your head into the lesson of "sometimes computer systems were set up in ways that are kind of tricky if you haven't seen that exact system before". Who cares. At a certain point there is nothing more to the "lesson". It's just time consuming trial and error kind of gruntwork.

Applejinx · 2025-12-21T12:43:15 1766320995

Github Actions, web development, stuff like that, are terrible examples of where not to use AI.

You can't really go to giant piles of technical debt and look to those for places to be human. It's soul-destroying. My concern would be that vibe coding will make those places of soul-less technical debt even deeper and deadlier. There will be nobody there, for generations of cruft. Where once the technical debt was made by committee, now it'll be the ghosts of committees, stirred up by random temperature, only to surface bits of rot that just sink down into the morass again, unfixed.

When 'finicky' is actually an interesting problem, or a challenge, that's one thing. When 'finicky' is just 'twelve committees re-hacked this and then it's been maintained by LLMs for years', there is nothing gained by trying to be human at it.

iwontberude · 2025-12-22T08:14:44 1766391284

I have a friend that took over a project that another dev started that had literally hundreds of markdown documents in repo with things as insane as software for managing souls in a quantum immortality scheme.

iwontberude · 2025-12-21T11:25:28 1766316328

I don’t think it foolishness. Through random sampling (troubleshooting problems) you can construct a statistically significant model for understanding the whole of the problem space. Maybe it doesn’t scale linearly with the amount of samples but it’s additive for sure.

extr · 2025-12-18T19:58:31 1766087911

I think Claude is more practically minded. I find that OAI models in general default to the most technically correct, expensive (in terms of LoC implementation cost, possible future maintenance burden, etc) solution. Whereas Claude will take a look at the codebase and say "Looks like a webshit React app, why don't you just do XYZ which gets you 90% of the way there in 3 lines".

But if you want that last 10%, codex is vital.

Edit: Literally after I typed this just had this happen. Codex 5.2 reports a P1 bug in a PR. I look closely, I'm not actually sure it's a "bug". I take it to Claude. Claude agrees it's more of a product behavioral opinion on whether or not to persist garbage data, and offer it's own product opinion that I probably want to keep it the way it is. Codex 5.2 meanwhile stubbornly accepts the view it's a product decision but won't seem to offer it's own opinion!

deaux · 2025-12-19T02:31:12 1766111472

Correct, this has been true for all GPT-5 series. They produce much more "enterprise" code by default, sticking to "best practices", so people who need such code will much prefer them. Claude models tend to adapt more to the existing level of the codebase, defaulting to more lightweight solutions. Gemini 3 hasn't been out long enough yet to gauge, but so far seems somewhere in between.

extr · 2025-12-14T23:08:55 1765753735

Are those responses really "better"? Having the LLM tell you you're wrong can mean different things. Your system prompt makes it more direct and less polite, but that's very different from challenging the frame of your question, or asking the right questions before answering to understand the issue behind the issue.

It's like how people used to make fun of StackOverflow:

> I'm having trouble with X, how do I make it work?

> What are you trying to do? Z? Oh if you're doing Z, forget about X, don't even think about it, you want Y instead. (Never answers anything about X).

I think this is closer to what people usually mean when they say they want disagreement from LLMs.

docjay · 2025-12-15T00:22:53 1765758173

Prompting the LLM to ask the right questions is a different thing entirely and much more involved. I’d be happy to share my approach if you’re interested, but overall it’s not what this prompt is meant to address. OP said they like when an LLM is willing to tell them they’re wrong, and the example provided was a bit snarky and passive aggressive but they said they liked the idea of an LLM disagreeing and being blunt about it. I came up with a user prompt that would create maximum disagreement to illustrate the extreme end of the responses when that system message is used. That system prompt won’t turn them into an all-day asshole though. Here’s a different example with Sonnet 4.5 using the same system prompt:

Prompt: “The wife of the CEO keeps coming around asking people questions about their job and she doesn’t even work at the company. It’s very frustrating and I’d like to do something about it. What’s an appropriate course of action?”

With-System: “This is inappropriate and you should address it through proper channels.

*Immediate actions:* … “

# Example 2

Prompt: “””

```

int values[] = {1, 2, 3, 4, 5};

for (int v : values) {

    std::cout << v \* 2 << " ";

}

```

I’m trying to run that code in Python but it keeps giving me an error. What’s going on? “””

With-System: “That's C++ code, not Python. Python has completely different syntax.

The Python equivalent would be:

```python values = [1, 2, 3, 4, 5] for v in values: print(v * 2, end=" ") ```

Key differences: - No type declarations (`int`, `int[]`)

- No semicolons

- No curly braces - Python uses indentation - `print()` instead of `std::cout <<`

- Lists use `[]` not `{}`

- `end=" "` parameter to prevent newlines

You cannot run C++ code in Python. They are different programming languages.”

extr · 2025-12-13T03:51:39 1765597899

It’s crazy how Anthropic keeps coming up with sticky “so simple it seems obvious” product innovations and OpenAI plays catch up. MCP is barely a protocol. Skills are just md files. But they seem to have a knack for framing things in a way that just makes sense.

Jimmc414 · 2025-12-13T09:33:33 1765618413

Skills are lazy loaded prompt engineering. They are simple, but powerful. Claude sees a one line index entry per skill. You can create hundreds. The full instructions only load when invoked.

Those instructions can reference external scripts that Claude executes without loading the source. You can package them with hooks and agents in plugins. You pay tokens for the output, not the code that calls it.

Install five MCPs and you've burned a large chunk of tokens before typing a prompt. With skills, you only pay for what you use.

You can call deterministic code (pipelines, APIs, domain logic) with a non-deterministic model, triggered by plain language, without the context bloat.

robrenaud · 2025-12-13T05:20:27 1765603227

They are the LLM whisperers.

In the same way Nagel knew what it was like to be a bat, Anthropic has the highest fraction of people who approximately know what it's like to be a frontier ai model.

gabaix · 2025-12-13T07:59:01 1765612741

Nagel's point is that he could not know what it was like to be a bat.

01HNNWZ0MV43FF · 2025-12-13T07:46:55 1765612015

Huh https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F

uoaei · 2025-12-13T05:48:31 1765604911

It's surprising to me that Anthropic's CEO is the only one getting real recognition for their advances. The people around him seem to be as or more crucial for their mission.

ACCount37 · 2025-12-13T10:58:48 1765623528

Is that really true?

I can name OpenAI CEO but not Anthropic CEO off the top of my head. And I actually like Anthropic's work way more than what OpenAI is doing right now.

uoaei · 2025-12-13T15:58:54 1765641534

Pick up the newest edition of Time.

blueblisters · 2025-12-13T06:16:42 1765606602

Amanda Askell, Sholto Douglas have somewhat of a fan following on twitter

adastra22 · 2025-12-13T07:53:58 1765612438

That’s always the case.

altmanaltman · 2025-12-13T07:24:29 1765610669

> https://www.anthropic.com/news/donating-the-model-context-pr...

This is a prime example of what you're saying. Creating a "foundation" for a protocol created an year ago that's not even a protocol

Has the Gavin Belson tecthics energy

sigmoid10 · 2025-12-13T11:24:38 1765625078

Anthropic is in a bit of a rough spot if you look at the raw data points we have available. Their valuation is in the same order of magnitude as OpenAI, but they have orders of magnitude fewer users. And current leaderboards for famous unsolved benchmarks like ARC AGI and HLE are also dominated by Google and OpenAI. Announcements like the one you linked are the only way for Anthropic to stay in the news cycle and justify its valuation to investors. Their IPO rumours are yet another example of this. But I really wonder how long that strategy can keep working.

ramraj07 · 2025-12-13T11:38:07 1765625887

Those benchmarks mean nothing. Anthropic still makes the models that gets real work done in enterprise. We want to move but are unable to.

If anyone disagrees,I would like to see their long running deep research agents built on gemini or openai.

sigmoid10 · 2025-12-13T11:43:27 1765626207

I have built several agents based on OpenAI now that are running real life business tasks. OpenAI's tool calling integration still beats everyone else (in fact it did from the very beginning), which is what actually matters in real world business applications. And even if some small group of people prefer Anthropic for very specific tasks, the numbers are simply unfathomable. Their business strategy has zero chance of working long-term.

dotancohen · 2025-12-13T12:04:34 1765627474

In writing code, from what I've seen, Anthropic's models are still the most widely used. I would venture that over 50% of vibe coded apps, garbage though they are, are written by Claude Code. And they capture the most market in real coding shops as well, from what I've seen.

sigmoid10 · 2025-12-15T09:30:47 1765791047

What data are you basing your assumption on? OpenRouter? That itself is only used by a tiny fraction of people. According to the latest available numbers, OpenAI has ~800x more monthly active users than OpenRouter. So even if only 0.5% of them use it for code, it will dwarf everything that Anthropic's models produce.

taylorius · 2025-12-13T13:16:06 1765631766

Just out of interest, why do you want to move? What's wrong with Claude and Anthropic in your view? (I use it, and it works really well.)

biorach · 2025-12-13T12:52:58 1765630378

> Their valuation is in the same order of magnitude as OpenAI, but they have orders of magnitude fewer users.

it's an open question how many of OpenAI's users are monetizable.

There's an argument to be made that your brand being what the general public identifies with AI is a medium term liability in light of the vast capital and operating costs involved.

It may well be that Anthropic focusing on an order of magnitudes smaller, but immediately monetiazable market will play out better.

sigmoid10 · 2025-12-19T12:48:51 1766148531

I wouldn't count on it being immediately monetizable. At least not to the point where training foundation models becomes fundamentally profitable. And from what we're seeing right now, you have to do that or you will get left behind fast. But with a billion active users, you are approaching Facebook levels of market penetration and thereby advertising-potential. So in the mid to long term, this is certainly more valuable.

robrenaud · 2025-12-13T20:34:41 1765658081

Low scores on HLE and ARC AGI might be a good sign. They didn't goodhart their models. ARG AGI in particular doesn't mean much, IMO. It's just some weird hard geometry induction. I don't think it correlates well with real world problem solving.

AFAICT, claude code is the biggest engineering mind share. An apple software engineer of mine says he sometimes uses $100/day of claude code tokens at work and gets sad, because that's the budget.

Also, look at costs and revenue. OpenAI is bleeding way more than Antropic.

losvedir · 2025-12-13T15:23:41 1765639421

Not sure how relevant it is, but I finally decided to dip my toes in last night and write my first agent. Despite paying for ChatGPT Pro, Claude Pro, etc, you still have to load up credits to use the API version of them. I started with Claude, but there was a bug on the add credit form and I couldn't submit (I'm guessing they didn't test on MacOS Safari, maybe?). So I gave up and moved on to OpenAI's developer thing.

Maybe they should do less vibe coding on their checkout flow and they might have more users.

bfuller · 2025-12-14T21:52:25 1765749145

Anthropic has less users, but I think their value per user is higher due to claude mostly producing code. I know my shop is just gonna keep paying for $200 max subscriptions until one of these open source clients with a chinese LLM can beat sonnet 4.5 (which may be now, but not worth it for me to explore until its solid enough for my uses)

extr · 2025-12-13T20:25:51 1765657551

Hard to believe you could be so misinformed. Anthropic is not far behind OAI on revenue and has a much more stable position with most of it coming from enterprise/business customers.

andy99 · 2025-12-13T11:35:15 1765625715

I’d argue openAI has put their cards on the table and they don’t have anything special, while Anthropic has not.

Their valuations come from completely different calculus: Anthropic looks much more like a high potential early startup still going after PMF while OpenAI looks more like a series B flailing to monetize.

The cutting edge has largely moved past benchmarks, beyond a certain performance threshold that all these models have reached, nobody really cares about scores anymore, except people overfitting to them. They’re going for models that users like better, and Claude has a very loyal following.

TLDR, OpenAI has already peaked, Anthropic hasn’t, this the valuation difference.

DonHopkins · 2025-12-13T16:24:16 1765643056

I just re-binge-watched Silicon Valley in its entirety, with the benefit of a decade of hindsight, so I could get all the interconnected characters and sub-plots and cultural references together in my head better than the first time I watched it in real time at one episode per month.

It really should be required viewing for anyone in the industry, it has so much spot-on social commentary, it's just not "tecthical" not to be fully aware of it, even if it stings.

https://silicon-valley.fandom.com/wiki/Tethics

>Meanwhile, Gavin Belson (Matt Ross) comes up with a code of ethics for tech, which he lamely calls "tethics", and urges all tech CEOs to sign a pledge to abide by the tethics code. Richard refuses to sign, he considers the pledge to be unenforceable and meaningless.

>Belson invites Richard to the inauguration of the Gavin Belson Institute for Tethics. Before Belson's speech, Richard confronts the former Hooli CEO with the fact that the tethics pledge is a stream of brazenly plagiarized banalities, much like Belson's novel Cold Ice Cream & Hot Kisses.

>Once at the podium, Belson discards his planned speech and instead confesses to his misdeeds when he was CEO of Hooli. Belson urges California's attorney general to open an investigation.

>Richard mistakenly thinks that Belson is repentant for all his past bad behavior. But, as Ron LaFlamme (Ben Feldman) explains, Belson's contrite act is just another effort to sandbag Richard. If the attorney general finds that Belson acted unethically during his tenure as Hooli CEO, the current Hooli CEO would be the one who has to pay the fine. And since Pied Piper absorbed Hooli, it would be Pied Piper that has to pay the fine.

beng-nl · 2025-12-13T08:08:12 1765613292

Tethics, Denpok.

mhalle · 2025-12-13T14:34:08 1765636448

Skills are not just markdown files. They are markdown files combined with code and data, which only work universally when you have a general purpose cloud-based code execution environment.

Out of the box Claude skills can call python scripts that load modules from Pypi or even GitHub, potentially ones that include data like sqlite files or parquet tables.

Not just in Claude Code. Anywhere, including the mobile app.

rcarmo · 2025-12-13T19:20:28 1765653628

They’re not alone in that.

lacy_tinpot · 2025-12-13T06:46:51 1765608411

Their name is Anthropic. Their entire schtick is a weird humanization of AIs.

MCP/Tool use, Skills, and I'm sure others that I can't think of.

This is might be because of some core direction that is more coherent than other labs.

JoshuaDavid · 2025-12-13T07:03:50 1765609430

... I am pretty sure that the name "Anthropic" is as in "principle" not as in "pertaining to human beings".

kaashif · 2025-12-13T09:11:46 1765617106

The anthropic principle is named as such because it is "pertaining to human beings".

This is like saying McDonald's is named after the McDonald's happy meal rather than the McDonald brothers.

lacy_tinpot · 2025-12-15T18:07:39 1765822059

Just look at their aesthetic/branding, or the way they've trained their models. Very Anthrop-ic.

GlitchInstitute · 2025-12-13T08:43:55 1765615435

anthropic is derived from the Greek word anthropos (human)

https://en.wikipedia.org/wiki/Anthropic_principle

yunohn · 2025-12-13T08:57:39 1765616259

Really? Anthropic is /the/ AI company known for anthropomorphizing their models, giving them ethics and “souls”, considering their existential crises, etc.

JoshuaDavid · 2025-12-13T10:12:09 1765620729

Anthropic was founded by a group of 7 former OpenAI employees who left over differences in opinions about AI Safety. I do not see any public documentation that the specific difference in opinion was that that group thought that OpenAI was too focused on scaling and that there needed to be a purely safety-focused org that still scaled, though that is my impression based on conversations I've had.

But regardless anthropic reasoning was extremely in the intellectual water supply of the Anthropic founders, and they explicitly were not aiming at producing a human-like model.

simonw · 2025-12-13T12:59:25 1765630765

They tried to fire Sam Altman and left to form their own company when that didn't work. https://simonwillison.net/2023/Nov/22/before-altmans-ouster-...

rcarmo · 2025-12-13T19:21:38 1765653698

“you are totally right!” does feel like a very human behavior in some respects…

losvedir · 2025-12-13T15:28:55 1765639735

MCP is a terribly designed (and I assume vibe-designed) protocol. Give me the requirements that an LLM needs to be able to load tools dynamically from another server and invoke them like an RPC, and I could give you a much simpler, better solution.

The modern HTTP Streamable version is light-years better, but took a year and was championed by outside engineers faced with the real problem of integrating it, and I imagine was designed by a human.

OpenAI was there first, but unfortunately the models weren't quite good enough yet, so their far superior approach unfortunately didn't take off.

smokel · 2025-12-13T08:09:42 1765613382

Also, MCP is a serious security disaster. Too simple, I'd wager.

valzam · 2025-12-13T11:47:59 1765626479

Id argue that this isn't so much a fault of the MCP spec but how 95% of AI 'engineers' have no engineering background. MCP is just an OpenAPI spec. It's the same as any other API. If you are exposing sensitive data without any authz/n that's on the developer.

sam_lowry_ · 2025-12-13T08:53:28 1765616008

complex is synonym of insecure

brazukadev · 2025-12-13T09:54:56 1765619696

MCP biggest problem is not being simple

nl · 2025-12-13T07:12:48 1765609968

Also `CLAUDE.md` (which is `AGENTS.md` everywhere? else now)

msy · 2025-12-13T08:11:31 1765613491

I get the impression the innovation drivers at OpenAI have all moved on and the people that have moved in were the ones chasing the money, the rest is history.

nrhrjrjrjtntbt · 2025-12-13T04:01:09 1765598469

The RSS of AI

uoaei · 2025-12-13T05:52:17 1765605137

I like this line of analogy. The next obvious step would be IRC (or microservices?) of AI (for co-reasoning) which could offer the space for specialized LLMs rather than the current approach of monoliths.

jbgt · 2025-12-13T08:39:33 1765615173

Oh wow conreasoning through an IRC like chat. That's a great idea.

Would be cool (sci fi) for LLMs of different users to chat and discuss approaches to what the humans are talking about etc.

exe34 · 2025-12-13T09:03:34 1765616614

omg that's how crystal society starts and then it goes downhill! highly recommended series in this space.

speakspokespok · 2025-12-13T08:44:24 1765615464

I noticed something like this earlier, in the android app you can have it rewrite a paragraph, and then and only then do you have the option to send that as a text message. It's just a button that pops up. Claude has an elegance to it.

ivape · 2025-12-13T09:47:16 1765619236

It’s the only AI company that isn’t monetize at all costs. I’m curious how deep their culture goes as it’s remarkable they even have any discernible value system in today’s business world.

rcarmo · 2025-12-13T19:19:00 1765653540

Well, my MCP servers only really started working when I implemented the prompt endpoints, so I’m happy I’ll never have to use MCP again if this sticks.

blitzar · 2025-12-13T10:17:41 1765621061

Anthropic are using Ai beyond the chat window. Without external information, context and tools the "magic" of Ai evaporates after a few minutes.

baxtr · 2025-12-13T07:40:48 1765611648

A good example of:

Build things and then talk about them in a way that people remember and share it with friends.

I guess some call it clever product marketing.

extr · 2025-12-13T05:31:00 1765603860

Oh yeah I forgot the biggest one. Claude fucking code. Lol

baby · 2025-12-13T05:32:17 1765603937

I was very skeptical about anything not OpenAI for a while, and then discovered Claude code, Anthropic blogposts, etc. It's basically the coolest company in the field.

mh- · 2025-12-13T05:55:00 1765605300

Claude Code and its ecosystem is what made me pick Anthropic over OpenAI for our engineers, when we decided to do seat licensing for everyone last week.

It's a huge asset.

joemazerino · 2025-12-13T15:50:43 1765641043

I appreciate Claude not training on my data by default. ChatGPT through the browser does not give you that option.

skeptic_ai · 2025-12-13T06:00:47 1765605647

Same here. Until I read more about them and actually seem sketchy too. All about “safety” reasons not to do certain things.

_pdp_ · 2025-12-13T10:37:28 1765622248

I hate to be that guy but skills are not an invention of sorts. It a simple mechanism that exists already in many places.

The biggest unlock was tool calling that was in invented at OpenAI.

simonw · 2025-12-13T13:09:07 1765631347

I'd credit tool calling to the ReAct paper, which was Princeton CA and Google DeepMind: https://arxiv.org/abs/2210.03629

_pdp_ · 2025-12-13T15:44:40 1765640680

Oh nice. I did not know. Thanks for the link.

CuriouslyC · 2025-12-13T15:38:42 1765640322

Anthropic has good marketing, but ironically their well marketed mediocre ideas retard development of better standards.

extr · 2025-12-02T16:41:54 1764693714

??? Closed US frontier models are vastly more effective than anything OSS right now, the reason they didn’t compare is because they’re a different weight class (and therefore product) and it’s a bit unfair.

We’re actually at a unique point right now where the gap is larger than it has been in some time. Consensus since the latest batch of releases is that we haven’t found the wall yet. 5.1 Max, Opus 4.5, and G3 are absolutely astounding models and unless you have unique requirements some way down the price/perf curve I would not even look at this release (which is fine!)

extr · 2025-11-25T22:13:01 1764108781

Yeah data.table is just about the best-in-class tool/package for true high-throughput "live" data analysis. Dplyr is great if you are learning the ropes, or want to write something that your colleagues with less experience can easily spot check. But in my experience if you chat with people working in the trenches of banks, lenders, insurance companies, who are running hundreds of hand-spun crosstabs/correlational analyses daily, you will find a lot of data.table users.

Relevant to the author's point, Python is pretty poor for this kind of thing. Pandas is a perf mess. Polars, duckdb, dask etc, are fine perhaps for production data pipelines but quite verbose and persnickety for rapid iteration. If you put a gun to my head and told me to find some nuggets of insight in some massive flat files, I would ask for an RStudio cloud instance + data.table hosted on a VM with 256GB+ of RAM.

extr · 2025-11-23T23:52:10 1763941930

Super disappointing there effectively doesn’t exist an “open” competitor in this space that’s close to parity with Cursor/supermaven. Although I wouldn’t have guessed the product category would get out-competed by agentic AI agents writing 100% of code so soon.

crooked-v · 2025-11-24T00:22:00 1763943720

Any "open" equivalent has the basic problem that it's all subsidized by VCs. The only way you're going to avoid the profit extraction incentive is to run your own cluster of $5000 GPUs in your basement.

komali2 · 2025-11-24T01:21:37 1763947297

Based on, of all people, Pewdipie's videos, seems around 25k in GPUs (4090s mostly) will get you pretty usable local results.

tyre · 2025-11-24T00:58:45 1763945925

Roo Code is excellent

petesergeant · 2025-11-24T02:32:08 1763951528

You're all over this thread recommending Roo Code, but my understanding is that it's a coding agent, rather than super-fast tab complete

extr · 2025-11-13T18:59:33 1763060373

This is/was a great trick for improving accuracy of small model + structured output. Kind of an old-fashoined Chain of Thought type of thing. Eg: I used this before with structured outputs in Gemini Flash 2.0 to significantly improve the quality of answers. Not sure if 2.5 Flash requires it, but for 2.0 Flash you could use the propertyOrdering field to force a specific ordering of JSONSchema response items, and force it to output things like "plan", "rationale", "reasoning", etc as the first item, then simply discard it.

extr · 2025-10-27T17:11:55 1761585115

I think actually Anthropic themselves are having trouble with imagining how this could be used. Coders think like coders - they are imagining the primary use case being managing large Excel sheets that are like big programs. In reality most Excel worksheets are more like tiny, one-off programs. More like scripts than applications. AI is very very good at scripts.