Rendered at 20:22:00 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
ValentineC 2 hours ago [-]
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
dgellow 1 hours ago [-]
One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally.
I think its only accounting depreciation.
I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
munk-a 31 minutes ago [-]
In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so.
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
Aurornis 39 minutes ago [-]
There are data centers that use and rent out 10 year old server GPUs.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
jmalicki 15 minutes ago [-]
As long as the demand for GPUs keeps increasing, there are more data centers being built to house them.
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
vb-8448 28 minutes ago [-]
I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month.
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
malfist 16 minutes ago [-]
They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power.
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
numpad0 17 minutes ago [-]
Chips do deteriorate and fail naturally at datacenter scale or in timescales of decades, though not exactly like on financial reports. Leak current increases or electro-migrations occur at junctions or whatever those words mean.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
tardedmeme 15 minutes ago [-]
Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions.
whateverboat 44 minutes ago [-]
Today's data center GPUs are essentially overclocked, and so at limit of how much the chip materials can physically handle, and therefore degrade over time. For example, GH200s operate at 1W/superchip but the actual safe power is somewhere around 650W which will allow them to function for a decade or more. But that leads to around 15% slowdown and that is unacceptable in today's competition. So current GPUs are destined to be depreciating assets.
In future, we might have fixed cost GPUs but not today.
foobarian 30 minutes ago [-]
> There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally
I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
threetonesun 27 minutes ago [-]
I assumed the issue was similar to crypto mining, where given finite amounts of space and power it makes sense to always be running the latest and most powerful GPUs instead of keeping older hardware running. There's definitely a secondary market for these GPUs as well.
dgellow 55 minutes ago [-]
GPU do depreciate indeed, but here the depreciating commodity is the token, not the hardware. You sell cheaper token with the same hardware
manyatoms 19 minutes ago [-]
the hardware itself is still useful, but random failures happen every so often, so if you're trying to run a fixed sized fleet then your fleet shrinks when you can't get spares any more
bigfishrunning 49 minutes ago [-]
Your laptop doesn't have a 100% duty cycle. If you ran it like a data center it would indeed wear out much faster.
sandworm101 38 minutes ago [-]
Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation.
As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
freediddy 1 hours ago [-]
Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them.
eikenberry 43 minutes ago [-]
Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.
ceejayoz 1 hours ago [-]
Saner companies ask the same question about models from their own country too.
tmp10423288442 8 minutes ago [-]
There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.
rd 49 minutes ago [-]
I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.
CobrastanJorji 44 minutes ago [-]
Here's a free startup idea: operate an open-weight model service, and offer "Verified AI Integrity," which signs the input tokens, the seed for the randomness in selecting outputs, and the model ID, proving that the result of the call to AI was completely "organic" and was not interfered with.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
tokioyoyo 45 minutes ago [-]
Yes, you can. There are multiple inference providers out there. The problem is, it’s hard to beat the Chinese providers in cost. And you also have to compete with frontier model providers’ subsidized offerings.
mediaman 41 minutes ago [-]
There are plenty of US-based inference providers available, including AWS, that serve Chinese models at competitive prices (vs frontier US models). They also have lots of usage. Not necessarily for coding, but for other enterprise tasks.
amunozo 52 minutes ago [-]
You can run DeepSeek as it's open weights, unlike Claude or GPT.
cheeze 1 hours ago [-]
Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company
Animats 57 minutes ago [-]
Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity.
testdelacc1 2 hours ago [-]
Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.
sevenzero 2 hours ago [-]
I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...
apsurd 1 hours ago [-]
Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
thefunnyman 1 hours ago [-]
This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime.
sfn42 1 hours ago [-]
A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
sevenzero 1 hours ago [-]
Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
KaiShips 1 hours ago [-]
[flagged]
SecretDreams 2 hours ago [-]
> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
aDyslecticCrow 1 hours ago [-]
An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
SecretDreams 45 minutes ago [-]
I agree with all of this.
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
cyanydeez 1 hours ago [-]
id be amazed any american business will aend data to china
linkregister 1 hours ago [-]
HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control.
I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
dghlsakjg 46 minutes ago [-]
The majority of Deepseek providers on OpenRouter for v4 pro are in the US. Especially interesting is that they are in the same ballpark for pricing.
eikenberry 35 minutes ago [-]
They are in the same ballpark for deepseek-v4-flash, but deepseek-v4-pro from deepseek is still around 1/2 of the alternatives.
dghlsakjg 29 minutes ago [-]
I'm pretty sure that Deepseek said that pricing was promotional. Be curious to see if it lasts.
V3 pricing from them was right in line with what the commodity providers are charging.
alpinisme 1 hours ago [-]
“Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.
dkersten 44 minutes ago [-]
Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is)
lowbloodsugar 1 hours ago [-]
Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!
Not everyone using AI is using it to code core value IP.
mcmcmc 49 minutes ago [-]
[dead]
siliconc0w 16 minutes ago [-]
I use the $100/mo sub but my 30 day API cost is about $1700/mo.
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
tuesdaynight 1 hours ago [-]
Why there are so many people that still believe that AI coding is a fad? It's something that started less than two years ago and companies are already paying thousands per seat. I know one that gives you 5k per month. Which other tool went from nothing to this level of acceptance so quickly?
OptionOfT 38 minutes ago [-]
Because companies are betting that this spending will allow them to reduce cost by firing people.
Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
It's not build up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
They get all the glory, but do none of the work.
It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
saulpw 18 minutes ago [-]
> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
You can absolutely do this. It's even right most of the time.
datsci_est_2015 12 minutes ago [-]
I believe the “them” the OP was talking about was referring to the people opening the PRs, not the LLMs.
toasty228 7 minutes ago [-]
There is a whole spectrum between "ai coding is a fad" and "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"
lbrito 19 minutes ago [-]
That's just a non sequitur. "companies are already paying thousands per seat" has zero correlation with something being a fad or not. There are much more reasonable rationales explaining why companies are acting the way they are than "because AI coding is not a fad"
agumonkey 47 minutes ago [-]
I would use these exact facts as a sign that it's maybe not what it seems. It's much too big and too fast to feel stable. It might keep at that level, increase even more, or drop down to a saner level of use / allocation.
Aurornis 31 minutes ago [-]
> It might keep at that level, increase even more, or drop down
Bold prediction. :)
I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
teeray 36 minutes ago [-]
I can see a corporate future where tokens are haggled over in department budgets just like any other line item. Some projects will get more of them, other projects will get less of them. "Use AI for everything" will become "use AI economically and build things that outlast our budget for it."
johnfn 39 minutes ago [-]
So it might either go up, stay the same, or go down? :)
tokioyoyo 39 minutes ago [-]
“AI coding is a fad” is not just one big camp of similar-minded people. Different groups have to give up on their pre-existing beliefs in order to be ok with AI coding.
Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
fragmede 35 minutes ago [-]
What's an int vs a float vs a boolean? What's a function? What's a class? What's a variable? You don't actually need to know the answer to those questions in order to vibe code. That's a lot of priors to update!
malfist 1 minutes ago [-]
> You don't actually need to know the answer to those questions in order to vibe code
No, but you do need to know the answer to respond to that 3AM page about prod being down.
tokioyoyo 21 minutes ago [-]
Just to go on record, as of today, I’m a big believer that a person that knows all that stuff is much more productive with AI-coding than a person who doesn’t.
I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
nomel 9 minutes ago [-]
And, you don't have to vibe code. A competent developer can make great use of AI. I think a developer that can develop the system themselves is the most accelerated user.
anthonypasq 1 hours ago [-]
perhaps the personal computer? Companies were spending 3-5k (10-15k inflation adjusted) on every employee for just hardware.
everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
thewebguyd 1 hours ago [-]
No disagreement on computing 2.0, but companies spending 3-5k per employee for hardware isn't generally a monthly cost. It's a at the time of hire, and then once every 3 to 5 years after that, for a monthly amortized cost of about $50/employee.
I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
dghlsakjg 32 minutes ago [-]
Every employee doesn't need $1k in token spend per month, either. That kind of spend makes sense for technical workers in r+d.
Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.
dghlsakjg 36 minutes ago [-]
The Dotcom bubble is an interesting comparison.
The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
threetonesun 15 minutes ago [-]
The question you always have to ask is what problems does it directly solve. I personally think most of the current problems in software development and really the world at large are not time-bound problems but alignment issues, and all an LLM can really do there is be some 3rd party oracle that gives you an answer without needing other humans to agree with you.
jghn 1 hours ago [-]
Two things can be true at the same time. It can be true that this is here to stay. It can also be true that companies are grossly overvalued right now and that the market is irrationally exuberant. This would mean we could both have a crash and also see AI coding be the new future.
pmg101 48 minutes ago [-]
I think the right comparison is the invention of the microprocessor. At that time people were grappling with a lot of the same things we are today - would it automate jobs away, would it transform education and the work place, etc.
pixelesque 55 minutes ago [-]
Hardware's not generally a subscription, monthly cost though.
You update it for them every 3/4 years (if they're lucky).
It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
thewebguyd 29 minutes ago [-]
[dead]
jorl17 45 minutes ago [-]
Don't worry, eventually "we'll all just regret it because it's just bitcoin 2.0".
Meanwhile, I couldn't be more excited for the future of automation and the empowerment of human beings.
Barrin92 30 minutes ago [-]
>Why there are so many people that still believe that AI coding is a fad?
Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.
The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic
john01dav 19 minutes ago [-]
Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.
soleveloper 10 minutes ago [-]
If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model.
Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
No one got fired from licensing claude code.
datsci_est_2015 9 minutes ago [-]
There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.
f311a 7 hours ago [-]
How many more months do we need to wait, until big companies realize that flash models work just fine if you:
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
_jab 47 minutes ago [-]
It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.
lavezzi 26 minutes ago [-]
They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster
mrothroc 57 minutes ago [-]
The easy decision is to just go with the biggest SOTA model you can afford.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
econ 1 hours ago [-]
I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
AgentMasterRace 26 minutes ago [-]
Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
ValentineC 1 hours ago [-]
> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
warmwaffles 2 hours ago [-]
> Don't ask LLMs for big changes
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
eikenberry 1 hours ago [-]
They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.
AgentMasterRace 30 minutes ago [-]
He was making a joke.
CharlieDigital 7 hours ago [-]
$1500/mo is $18,000/seat/annum.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
pqtyw 2 hours ago [-]
How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
brianwawok 2 hours ago [-]
I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.
smallerize 1 hours ago [-]
Running locally is even worse for this, because if you're running 4 jobs at once they just run at 1/4 speed. Not literally, you can make up some of the difference with batching, but you have limited resources instead of spreading your requests out on an API provider's nodes.
cyanydeez 56 minutes ago [-]
It's not a bottleneck if you care about the actual code.
pqtyw 43 minutes ago [-]
I would expect the overwhelming majority of output tokens would not be the actual code but used for analysis, reasoning, testing and iteration. If you only use the agent for autocomplete then yes, the calculation is probably different.
cyanydeez 35 minutes ago [-]
yea, and understanding that too is important. the idea you dont need to read code or analysis seems to align with the depwndcy addiction being shoved in thw pipe.
dgellow 57 minutes ago [-]
You’re way better to run your own on premise models. Laptops are depreciating assets, do not benefit from economy of scale, have fixed specs, result in a fragmented fleet where you need to keep models up to date. Without talking about power consumption and cooling issues. I really don’t see why companies would go that direction
bluGill 9 minutes ago [-]
You don't need to run on laptops, desktops plugged into mains power get more power consumption and better cooling. I want my laptop to work, but I can accept when I'm on an airplane at 32k feet I get less abilities.
CharlieDigital 29 minutes ago [-]
[dead]
Buttons840 2 hours ago [-]
I think companies will eventually just buy a local AI server.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
pm90 1 hours ago [-]
Yep, its already quite easy to do so with tools like opencode/openrouter. Ive used some open source models and they seem … ok? Im not doing foundational math, just refactoring code, understanding existing code etc. I don’t see a future where companies blow 11% of employee compensation on a single tool; the hosted AI server + oss models will 99% win out.
dangus 1 hours ago [-]
I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
pm90 1 hours ago [-]
> I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
1 hours ago [-]
zozbot234 5 hours ago [-]
I agree on the basic point, but running $1500/mo's worth of SOTA local AI is non-trivial already, and that's a figure for a single seat. That's equivalent to generating at least 20 tok/s on a 24/7 basis, in fact probably quite a bit more than that (because open-weight models are vastly cheaper than proprietary ones even when served from reputable Western providers - reaching the same spend would take around 100 tok/s or more, which is well within datacenter hardware territory).
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
ssivark 1 hours ago [-]
Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.
ricardobayes 51 minutes ago [-]
128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.
dkdcdev 7 hours ago [-]
at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like
embedding-shape 7 hours ago [-]
> even license a SOTA model’s weights if you’d like
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
throwway120385 7 hours ago [-]
That's going to stop eventually, and I think at that point we're going to see business models more like the major CAD providers.
7 hours ago [-]
idiotsecant 7 hours ago [-]
I don't think they'll have a choice, open weights models are not far behind. At some point it's essentially a commodity game
dkdcdev 7 hours ago [-]
they also already do this…
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
thewebguyd 48 minutes ago [-]
I'm not sure the labs will win either. I wouldn't be surprised to see OpenAI & Anthropic just get acquired, either by Microsoft or Amazon and their models just become another product offering in their public cloud and and some hybrid on-prem offering like Azure Stack HCI or Azure Stack Hub (already basically a "cloud in a black box" that could become "AI in a box")
mrweasel 4 hours ago [-]
The problem isn't really Uber, Microsoft or Nvidia, it's all the smaller none IT companies that also have developers on staff. They are screwed. $1500 per seat per month is just way to expensive, but they also can't afford to build and maintain their own on-premise solution. If Microsoft can't afford to run CoPilot for their own developer, what chance does any of their customers stand?
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
treis 2 hours ago [-]
There's models for every price point. What was SOTA and stupid expensive to run a year ago is a cheap flash model today.
skybrian 2 hours ago [-]
It's an extra 18k a year for developer tools when they're paying how much a year per developer? Having software developers at all isn't cheap.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
ecshafer 2 hours ago [-]
$18k a year is a non starter in most companies. Ive seen companies balk at Intellij.
mrweasel 1 hours ago [-]
That depends on where you are. $18K is the equivalent of paying around 15% more for your developer.
ricardobayes 47 minutes ago [-]
In hcol locations yes, but in south of spain you can get full time talent for that figure. It's also an entry-level salary in eastern europe, with ukraine and turkey even being somewhat cheaper.
mvdtnz 2 hours ago [-]
Why are smaller non-IT companies "screwed" because they can't pay out the nose for their developers' AI usage? They're non-IT companies, developers are presumably not on their critical path, or not their bottleneck. Developers can keep on writing code the old way, or doing it with a more reasonable AI spend. I don't see how this "screws" any company.
mrweasel 1 hours ago [-]
That was badly worded on my part, my intend was to indicate that there was no way they can or will pay $1500 per month per seat.
55 minutes ago [-]
darkwater 7 hours ago [-]
> it's WTF did Uber build with all of that spend?
You can ask the same for the median 330k salary in the US for Uber Engineering...
and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
SlinkyOnStairs 6 hours ago [-]
> You can ask the same for the median 330k salary in the US for Uber Engineering
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
quantified 41 minutes ago [-]
Sure, but has their rate of value added increased as a result? It's a good question to ask. They added value before LLM coding, and now are more expensive than before thanks to token costs.
FergusArgyll 2 hours ago [-]
This is a very good answer but there's a flip side too.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
CharlieDigital 7 hours ago [-]
This is what all "platform engineers" have to do once things are working nicely: you have to keep inventing work.
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
darkwater 7 hours ago [-]
But most Platform Engineering teams in smaller companies (and especially non-US) add a layer on top of existing technologies. A layer that usually maps to the specific culture and idiosyncrasies of that company; a bit like the deployment flow which is usually very specifically shaped on how a company is.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
throwaw12 7 hours ago [-]
you don't get promotion for supporting existing things, but for "inventing" you can get promoted. also for large migrations
7 hours ago [-]
jvanderbot 7 hours ago [-]
Right - the future of LLMs is like ol' windows XP+Dell. Commercialized "things" you run locally offline, co-designed with hardware, with a known productivity suite, and large businesses building the next generation thing and suite with 18mo release cycles (ish).
treis 2 hours ago [-]
I don't see it. Leasing equipment and paying per seat license fees makes a lot of accounting and cash flow sense. Maybe when it gets to the point where you can run SOTA LLMs on consumer hardware. But that seems a solid decade and probably much more away.
Even then it makes more sense to rent the bigger GPU and get your answer faster.
nonethewiser 7 hours ago [-]
XP? I can see the argument for enterprise support but in that case the latest windows OS is going to be virtually free and I dont know if MS and Dell etc. would even support an XP machine. Might even be required for hardware. If no enterprise support wouldnt Linux make a lot more sense?
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
jvanderbot 7 hours ago [-]
"Windows XP+Dell" should have been in quotes. It's similar to the way enterprise productivity software was developed, packaged co-designed with hardware, and sold on an 18mo upgrade cycle assumption. It's not literally windows xp.
nonethewiser 3 hours ago [-]
Oh gotcha. Yeah that's an interesting idea.
gedy 1 hours ago [-]
There's waayyyy too much money betting on that not happening, to the point I feel there'll be regulations popping up for "safety reasons" etc to ensure the big players control this.
thewebguyd 23 minutes ago [-]
3/4 of Microsoft's BUILD conference the past two days were about local AI, foundry local and Windows ML along with a big section in the keynote about running local workloads on their new hardware with Nvidia. Say what you want about Microsoft's reputation, but they are a "big player" and seem to be moving in the direction of local AI first.
ungreased0675 7 hours ago [-]
Your last question is really important. What did they accomplish with all that spend?
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
devttyeu 7 hours ago [-]
If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.
f311a 7 hours ago [-]
Can you show me an example of a hard task that can't be achieved using light models? When we don't want the model to work on autopilot without reviewing the code at all. Even SOTA models will produce garbage code, if you don't guide them all the time.
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
empath75 1 hours ago [-]
I think probably the correct spend is something closer to 10x that if people can figure agent coordination problems out. It's not even really about capability at this point, it's about keeping track of what agents are doing.
infecto 7 hours ago [-]
I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
CharlieDigital 3 hours ago [-]
That's because for some of these folks, the cost of the tokens doesn't have to match the value of the output; the hype from the story is all they need.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
ofjcihen 5 hours ago [-]
Running hundreds of agents overnight is almost certainly 99 percent waste.
Pixel-Labs 49 minutes ago [-]
[flagged]
sourcecodeplz 7 hours ago [-]
$1.5kpm for SOTA. 128gb you run DSV4 Flash.
pqtyw 2 hours ago [-]
What's the point of running it locally though? Inference for open models is quite cheap already. They could just selfhost, anyway. The experience of running LLMs locally will be excruciatingly bad in comparison at least for the near future.
jcgrillo 7 hours ago [-]
> WTF did Uber build with all of that spend?
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
ftkftk 2 hours ago [-]
~70 FTE Engineering team. We are shipping more features, especially features that previously would not have survived the cut to make it on the roadmap. Even though we are shipping more, our total amount of escaped bugs has not increased, so our escape rate has actually lowered. On top of that we are able to triage and fix escaped bugs more quickly now. And then of course there has been an uptick in internal tooling that makes the rest of the company more efficient, and we have been able to address tech debt at a higher rate than before.
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
CharlieDigital 26 minutes ago [-]
> We are shipping more features
That's not really the important question; the important question: is it generating revenue.
If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.
If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?
There was probably a reason it was on the backlog (because it didn't really have value).
ftkftk 4 minutes ago [-]
> is it generating revenue
Yes! :)
> There was probably a reason it was on the backlog (because it didn't really have value).
There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
awesan 7 hours ago [-]
I can say at least for me at a small-ish company (~40 FTE) there has been a surge in internal productivity tools. Nothing to improve the end user product directly but a lot of tools to make processes easier and less error prone.
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
CharlieDigital 7 hours ago [-]
About the same ~40 FTE team. We're doing the same thing. Smattering of internal tools, but no net gain in external revenue. Who knows which of those tools will have any value or ppl are just doing it because it's cool now to make fancy dashboards.
OK. I guess that's good, too.
jcgrillo 7 hours ago [-]
Yeah this seems to be a pretty widespread story, from what I've heard as well. The thing about those janky dashboards and spreadsheets though is that somebody understood them and built them with intent to solve a particular problem. Despite the rickety appearance, they're trustworthy tools. A polished single page app might look nicer but it's harder to debug than an excel sheet, and much less transparent in its internal workings--especially if nobody actually wrote it...
izacus 2 hours ago [-]
More importantly, it's questionable how much extra revenue improving a design of internal tool brings.
nonethewiser 7 hours ago [-]
The real answer?
Software engineer quality of life.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
pqtyw 2 hours ago [-]
> doing a days work in an hour then fucking off in a variety of ways
Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
nonethewiser 1 hours ago [-]
Quite possibly. Doubftul it will happen all at once. If you can get 8 hours of work done in 1 they'd need to ramp up demand 8x. Would be interesting to see that happen over night. Happy monday. Here, take these 30 tickets.
MengerSponge 1 hours ago [-]
But that's an inefficient use of dev salary. Y'all are gonna get ground to smooth well-compensated paste.
slopinthebag 2 hours ago [-]
Yeah I think this is probably most accurate.
RugnirViking 7 hours ago [-]
Imo its pretty clear that anyone who is taking the issue at least somewhat seriously knows the amount of value they provide is not non-zero. However, the problems are manifold: firstly, toolchains vary wildly, from fancy autocomplete, to engineers chatting with codebases they're unfamiliar with, to people integrating them into devops and infra, to people doing spec driven development, with a thousand philosophies inbetween. Many people suspect that those above them in the ladder are on the cusp of massive failure due to losing track of the code, and many people higher on the ladder think those below them are overly cautious. I hate to be the guy saying "oh it must be somewhere in the middle", but I will say at the very least I like being able to use it to read docs for me, and to synthesize syntax and simple scripts (give me a join that works across these tables and gives me column x, y and z - give me a python script that parses a file like this example and extracts abc data - given this api spec figure out how I can get this data from this endpoint, go)
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
jcgrillo 6 hours ago [-]
I agree the most interesting use cases I've heard of are about increasing the rigor of software development practices, but there's definitely a lack of coherence in methodology.. I believe that some users and companies are successful in this effort, but the odd (and interesting!) thing is that so far we don't seem to know how to communicate how to do it successfully.
m3kw9 7 hours ago [-]
You can't get an edge using local models, these guys may have competitors that will spend on SOTA models. They won't likely ever consider local machines even for some offloading scenarios, the complexity and costs will be even higher.
CharlieDigital 7 hours ago [-]
Consider rewiring your perspective: getting an edge doesn't really matter; the only thing that matters is will customers pay for this? Is this a useful, valuable problem to solve?
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
analognoise 6 hours ago [-]
18k/yr? None of the LLMs generate anything like that in value!
simonw 6 hours ago [-]
I'm definitely getting that much value out of Claude Code and Copilot.
CharlieDigital 6 hours ago [-]
You're a content creator; you define your revenue stream.
Uber engineers do not define their revenue stream; the product leadership team does.
$1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
Daishiman 1 hours ago [-]
$18K a year is a fraction of the salary of a junior engineer.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
jg0r3 37 minutes ago [-]
$18k a year is near half of my salary as junior verging on senior developer in the conservation field. Not everyone works in FAANG.
analognoise 32 minutes ago [-]
The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
ofjcihen 5 hours ago [-]
Can you share some examples that you would say justify that price? Not a gotcha, I’m genuinely curious where you’re seeing a return at that level.
simonw 4 hours ago [-]
I've written tens of thousands of lines of tested, working code that I would not have written otherwise, and that code is useful to me.
I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
ofjcihen 3 hours ago [-]
> that I would not have written otherwise
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
simonw 3 hours ago [-]
My "job" is building open source software for data journalism (and anyone else who needs the tools data journalists need, which is pretty much everyone else). I can build more of those tools, and better, in exchange for a fraction of the cost it would take to hire a team to help.
2 hours ago [-]
geodel 12 minutes ago [-]
> A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending,...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
newobj 2 hours ago [-]
It's also a useful signal for AI value. Looks like it's a max value add of $18,000 per engineer per year.
Anon1096 1 hours ago [-]
No, that's not what it means at all even if just doing it purely in math terms. Really it is just a reasonable amount to cap at to stop the long tail of super spenders (tokenmaxxers). You could also call it "the amount of AI spend after which Uber has decided there is diminishing returns for the average engineer".
dandellion 23 minutes ago [-]
I'm sure if a dev can show useful results at 1k they won't have trouble getting permission for a higher cap as well.
csallen 2 hours ago [-]
It's not so simple to determine and generalize how much value AI adds. It's going to be different on a per-company basis and a per-engineer basis. It's also affected by the competitive market place and how many other companies are using AI for their engineers.
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
pqtyw 2 hours ago [-]
I find it really doubtful anyone has managed to quantify that in any meaningful way. Seems like mostly an arbitrary number. Also the article does claim that's its actual several times more than 18k if you are fine with using Codex, Cursor or etc. when you Claude tokens run out.
alasano 2 hours ago [-]
Their initial budget for determining how much value AI adds is $18,000 per engineer.
tfehring 1 hours ago [-]
Not really. There are clearly diminishing marginal returns, so it's likely that the first $2,400/engineer/year adds >>$2,400 of value, even if 18,001st $/engineer/year adds <$1 of value.
eqvinox 2 hours ago [-]
It's among a wave of fresh "non-insane" takes on AI in the enterprise. Maybe we can reel things in to a sustainable level before a giant bubble bursts.
jdkdksksn 2 hours ago [-]
[dead]
jkwang 7 hours ago [-]
The $1500 number is less interesting than the fact that they hit a ceiling at all. Most engineering teams I've talked to have no idea what their AI spend is per developer because it's buried in a consolidated cloud bill. Having a hard cap forces two useful conversations: what workflows actually justify API calls vs local inference, and whether the output is being measured against any real productivity metric. Without that feedback loop it's just a race to see who can burn tokens fastest.
simonw 7 hours ago [-]
Both the Anthropic and OpenAI "Enterprise" plans include per-developer analytics:
What makes it look like one? All their dead comments read pretty normal to me.
cmiles8 30 minutes ago [-]
And $1500 a month is on the very high end of where most companies will land. When you run the numbers there isn’t a realistic path that connects the dots between likely market size and the claimed valuation of the AI companies. The math simply does not add up.
szatkus 24 minutes ago [-]
That's a lot. On my usual day I burn less than $1 on Opus. I could get beyond $10 only if I have a complex and well-defined problem, which is rare (the second part at least).
etothet 2 hours ago [-]
In my experience, this is far below the cost the average dev will incur per month so this seems very reasonable to me. And, no doubt there are exceptions for heavy users so they can get some extra token usage when they need it.
waffuldrop 2 hours ago [-]
unless they changed something in the like 2 months (edit: besides implementing a cap for claude code specifically, since other tools already had caps) since ive left my job there im pretty sure 1500$ is the very max you can use after maxing out free calls, initial budget, then 2 extensions individually reviewed by your manager
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
galaxyLogic 2 hours ago [-]
It's probabaly a good things that Uber-developers are now forced to do some coding on their own. Only use AI where it absolutely helps
sva_ 2 hours ago [-]
Or be smarter about their usage. $50 on tokens per day can get you a long way.
estomagordo 2 hours ago [-]
Some people also take weekends off.
aerhardt 23 minutes ago [-]
I don't think at $1,500 you're not forced to code on your own at all, in the sense of typing code. You're simply forced to not yolo-max twelve parallel agents at all times.
pmontra 2 hours ago [-]
I wonder what they are doing with $1500 per month. I'm on Claude Pro $20 plan and I'm doing well. That's 3 days per week. On the other 2 days I'm using a customer's Claude Max, I don't know if it's the $100 or the $200 plan, but I'm sharing it with some of its other developers.
SyneRyder 47 minutes ago [-]
I'm on a $100 Claude Max plan, my usage is only about 50% of the plan limits, but in the last 30 days my usage was equivalent to API token spend of $1850. If you save all your Claude Code conversations, the saved files include API costs and you can calculate this yourself.
One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
coff i would not buy the Bending Spoons IPO coffsaaspocalypse
I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
hrpnk 1 hours ago [-]
$1500/mth is token pricing.
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
pmontra 14 minutes ago [-]
> Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly.
Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
kingstnap 45 minutes ago [-]
Next to no one would be using less than the subscription price given how expensive Opus API is.
flyinglizard 1 hours ago [-]
Yea, I’m sure the personal plans are subsidized. I have $200 Claude Max at home and straight API pricing at work and equivalent work would easily cost me 5x if not more on the API.
idiliv 1 hours ago [-]
Uber is likely on an enterprise plan - these charge tokens at API cost, which can be much more expensive than the $20 flat rate.
5701652400 45 minutes ago [-]
eventually tokens will cost price of energy. and china is miles ahead.
china will be major token exporter soon. mark my words.
PessimalDecimal 7 hours ago [-]
These are still at currently subsidized prices. We'll see if they think they're getting $1500/month of value when that buys significantly fewer tokens.
square_usual 7 hours ago [-]
There is no evidence that per-token inference prices (which is what Uber is setting a cap on) is subsidized.
pier25 7 hours ago [-]
AI companies have more expenses than inference.
RugnirViking 6 hours ago [-]
yes, and theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses. Some companies will keep spending massively to train better models, and some other companies will not, and offer good api prices. Which will end up being used? That depends on whether the spending turns into better value models
pier25 2 hours ago [-]
> theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses
as far as we know there's no evidence that they can produce any profits at all
lelanthran 7 hours ago [-]
Is there any evidence that it's not?
Topfi 7 hours ago [-]
The fact that Anthropic models are offered at the same API pricing by not just themselves but AWS, Azure and Vertex despite Anthropic taking a major slice on licensing along with the cost an open weight 1T parameter model like K2.6 costs to run on any third-party provider, make it unlikely that API inference cost are subsidized by the labs.
pqtyw 2 hours ago [-]
Openrouter? i.e. Even excluding Deep Seek inference for very large open models is way cheaper. Maybe these providers are not very profitable but its highly unlikely that they are losing $4 for every $1 they make since selling inference is their only product...
thejazzman 7 hours ago [-]
Yes; they ban various uses of their subscriptions but say you can do whatever if you’re paying for the API without limits
pqtyw 2 hours ago [-]
That's just market segmentation and them trying to maximize revenue it doesen't really say anything about their costs.
simonw 7 hours ago [-]
This story isn't about those subscriptions - enterprise customers like Uber are paying the full API prices.
lelanthran 7 hours ago [-]
That's not evidence. Very likely though, but the only evidence we get one way or another is when they IPO.
pdyc 7 hours ago [-]
afaik, enterprise plans are not subsidized. its 20$/seat+api pricing. Unless you are saying api pricing itself is subsidized.
LurkandComment 7 hours ago [-]
This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.
swiftcoder 7 hours ago [-]
Assuming this were accurate, then presumably the AI companies would be betting that inference costs come down before the bill is due - I don't see enterprises being willing to absorb another ~10x price increase for tokens (as they've just done going from subscription prices to per-token pricing)
LurkandComment 6 hours ago [-]
For claude shops this was a huge hit. But lets back this up. There are some companies that haven't even built a break-even model at this price because they are funded by investment. As soon as those investors lose patience the first dominos will fall. For those who have somewhat of a business model, will it survive a price increase? The bigger question is do the base model providers have enough runway and have a way to keep going as they need to recover costs.
pqtyw 2 hours ago [-]
It's mostly R&D though, not inference. If LLM's effectively become a commodity then they are screwed anyway.
swiftcoder 1 hours ago [-]
Aren’t the Chinese labs quickly turning them into a commodity?
The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
pqtyw 2 hours ago [-]
Yeah, that's not going to work if you can get e.g. 80% of value by using 10-20x or more cheaper open models. At some point it would just make sense for large companies to rent compute and deploy their version of DeepSeek or whatever (if they don't trust Chinese providers)
logancbrown 7 hours ago [-]
None of what you said is true
rimliu 7 hours ago [-]
And you know this how?
pqtyw 2 hours ago [-]
The inference prices for very large open models would indicate that Antrophic's and OpenAI's margins are quite large.
boringg 7 hours ago [-]
True but they will raise prices slowly so people will optimize their workflow so they aren't just throwing as much inference as fast as possible like the current state. Right now you should do everything you wanted to try out because it is cheap (as long as you don't become dependent ... the risk).
sourcecodeplz 7 hours ago [-]
I understand current Codex $20 sub is worth about $480 GPT5 api credits.
It's not. They recently forced enterprise customers onto API billing instead of the cheap consumer pricing. Now the pricing is brutal.
rasbmn 2 hours ago [-]
Uber is in the business of experimenting with robotaxis and automated food delivery.
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
lazyasciiart 2 hours ago [-]
Is this inside knowledge, or speculation?
hrpnk 1 hours ago [-]
If budgeted at $1,500/month per user, power users still can get 5-10x of that allocation if the user pool is large enough.
epsteingpt 7 hours ago [-]
Uber engineers reported that loading their workspace and pulling recent commits exhausted that AI limit for Claude Code (4.8 x-high) immediately.
wmf 2 hours ago [-]
I don't think loading up a single context window costs $1,500. Which limit are you talking about?
LurkandComment 7 hours ago [-]
1) This happened because they fundementally misunderstand how to use AI and how AI is priced
2) Most organizations are throwing everything in for analyses and not limiting the answer they want. You need to be specific of about what you analyze and what answers you want
3) People undervalue prompting or templated responses. I will have written. validated and sanity checked a prompt several times and run it across several models before I say its ready for use. But when it is, I know what it will give me and that the scope of its research and answer is as close to what I want as it can be. As little excess as I can. This all saves tokens
jwpapi 7 hours ago [-]
If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
ilia-a 7 hours ago [-]
Seems odd limit, especially since it highly dependant on Token provider used, with Opus this is not much and could easily be burnt in a week or less, but with something like deepseek the 1500 can literarily be an annual budget.
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
iceman28 7 hours ago [-]
It’s not just about the model but also setting up the system to create and share compute (GPUs) which is quite complicated on its own. Ubers primary business focus isn’t infrastructure.
7 hours ago [-]
cyanydeez 59 minutes ago [-]
no....the fact that you could buy a reasonably prices MAC or AMD395+ thats AI tool pricing; it loads a big enough model and spits out tokens just fast enough that you can read what it's doing and comprehend it instead of magic.
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
ChrisArchitect 7 hours ago [-]
Related:
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
They are also beholden to enterprise pricing and can't use the subsidized consumer max plans.
sremani 6 hours ago [-]
I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
jedisct1 7 hours ago [-]
A lot of things can be done with local models.
rimliu 7 hours ago [-]
Even more things can be done without any models just as well.
dude250711 7 hours ago [-]
Single developers seeking local models.
dmaso191 36 minutes ago [-]
[flagged]
throwaway613746 1 hours ago [-]
[dead]
Ozzie-D 52 minutes ago [-]
[flagged]
ashahin 7 hours ago [-]
[flagged]
onlyrealcuzzo 7 hours ago [-]
It's interesting to me how ineffective LLMs are at refactoring, but when you think closely about how they work, it makes sense.
They are good at searching for things that have been done 10,000 times before, and slightly changing them. This is the majority of all "new" features.
Almost nothing is "new"...
Refactors are not this. If you can't just write a gsub to do the work, they need to essentially break it up into N problems to solve, each of them pretty slow and expensive. Sure, none of these problems individually are "new" - which is why they can do it. But they can't do it as effectively as you'd think.
jbvlkt 2 hours ago [-]
Exactly my experience. I always refactor first myself then delegate boring tasks to AI. It saves me energy, time and also tokens. If code is not prepared for easy implementation agents always fail.
slopinthebag 2 hours ago [-]
LLM generated comments are against site rules btw.
hanzeweiasa 7 hours ago [-]
Good point about the unit of consumption shifting from prompts to agent loops. That makes pricing even trickier for vertical-specific AI tools.
We see this firsthand building AI Workdeck (open-source AI workspace for legal teams). A single due diligence review might chain 20+ agent calls: OCR -> text extraction -> clause classification -> risk scoring -> evidence chain assembly. The user sees one action, but the backend burns through significant inference.
The interesting thing about vertical tools is the pricing model can be fundamentally different. Horizontal tools charge per seat or per token. But in legal, the value is in the document, not the seat. A lawyer reviewing a 500-page M&A file gets way more value than one reviewing a 2-page NDA.
Self-hosting changes the calculus too. Our users run on their own infra, so the AI cost is whatever their GPU costs. That makes $1,500/month caps less relevant and throughput optimization more important.
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE
I think its only accounting depreciation.
I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
In future, we might have fixed cost GPUs but not today.
I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
V3 pricing from them was right in line with what the commodity providers are charging.
Not everyone using AI is using it to code core value IP.
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
It's not build up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
They get all the glory, but do none of the work.
It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
You can absolutely do this. It's even right most of the time.
Bold prediction. :)
I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
No, but you do need to know the answer to respond to that 3AM page about prod being down.
I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.
The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
You update it for them every 3/4 years (if they're lucky).
It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
Meanwhile, I couldn't be more excited for the future of automation and the empowerment of human beings.
Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.
The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic
Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
No one got fired from licensing claude code.
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
Even then it makes more sense to rent the bigger GPU and get your answer faster.
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.
If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?
There was probably a reason it was on the backlog (because it didn't really have value).
Yes! :)
> There was probably a reason it was on the backlog (because it didn't really have value).
There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
OK. I guess that's good, too.
Software engineer quality of life.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
Uber engineers do not define their revenue stream; the product leadership team does.
$1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...
OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
coff i would not buy the Bending Spoons IPO coff saaspocalypse
I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
china will be major token exporter soon. mark my words.
as far as we know there's no evidence that they can produce any profits at all
The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
https://news.ycombinator.com/item?id=48335388
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
They are good at searching for things that have been done 10,000 times before, and slightly changing them. This is the majority of all "new" features.
Almost nothing is "new"...
Refactors are not this. If you can't just write a gsub to do the work, they need to essentially break it up into N problems to solve, each of them pretty slow and expensive. Sure, none of these problems individually are "new" - which is why they can do it. But they can't do it as effectively as you'd think.
We see this firsthand building AI Workdeck (open-source AI workspace for legal teams). A single due diligence review might chain 20+ agent calls: OCR -> text extraction -> clause classification -> risk scoring -> evidence chain assembly. The user sees one action, but the backend burns through significant inference.
The interesting thing about vertical tools is the pricing model can be fundamentally different. Horizontal tools charge per seat or per token. But in legal, the value is in the document, not the seat. A lawyer reviewing a 500-page M&A file gets way more value than one reviewing a 2-page NDA.
Self-hosting changes the calculus too. Our users run on their own infra, so the AI cost is whatever their GPU costs. That makes $1,500/month caps less relevant and throughput optimization more important.