I got an Nvidia GH200 server for €7.5k on Reddit and converted it to a desktop

365
106
dnhkng
2 days ago
dnhkng.github.io

dnhkng
·
2 days ago
·
[ - ]

This is the story of how I bought enterprise-grade AI hardware designed for liquid-cooled server racks that was converted to air cooling, and then back again, survived multiple near-disasters (including GPUs reporting temperatures of 16 million degrees), and ended up with a desktop that can run 235B parameter models at home. It’s a tale of questionable decisions, creative problem-solving, and what happens when you try to turn datacenter equipment into a daily driver.

amirhirsch
·
2 days ago
·
[ - ]

# Tell the driver to completely ignore the NVLINK and it should allow the GPUs to initialise independently over PCIe !!!! This took a week of work to find, thanks Reddit!

I needed this info, thanks for putting it up. Can this really be an issue for every data center?

Tinyyy
·
2 days ago
·
[ - ]

Doesn’t this prevent the GPUs from talking to each other over the high speed link?

dnhkng
·
2 days ago
·
[ - ]

I'll find out soon, but without this hack, the GPUs are non-functional.

ipsum2
·
2 days ago
·
[ - ]

I saw the same post on Reddit and was so tempted to purchase it, but I live in the US. Cool to see it wasn't a scam!

GPTshop
·
1 day ago
·
[ - ]

We can get around tariffs, if that is your concern.

ipsum2
·
1 day ago
·
[ - ]

Honestly I wasn't going to drop ~10k USD on an unknown seller that was from another country.

GPTshop
·
1 day ago
·
[ - ]

There is always some risk in business and life itself...

pointbob
·
2 days ago
·
[ - ]

Loved it. You are mgyver. You should post more stuff on Twitter. Thanks for the story.

dnhkng
·
2 days ago
·
[ - ]

lol, I tried posting stuff on Twitter, but never got any traction. This might be too nerdy for that crowd?

DANmode
·
1 day ago
·
[ - ]

Mastodon and Bluesky would welcome you.

Hackaday would probably welcome you.

·
2 days ago
·
[ - ]

dauertewigkeit
·
2 days ago
·
[ - ]

It's a very interesting read, but a lot is not clear.

How does the seller get these desktops directly from NVIDIA?

And if the seller's business is custom made desktop boxes, why didn't he just fit the two H100s into a better desktop box?

dnhkng
·
2 days ago
·
[ - ]

These are on a custom board from Nvidia, so its not possible to separate them. I think the seller usually gets H100's and them into a custom case, with a PCIE adapter to the server GPUs.

This thing too unwieldy to make into a desktop (you can see how much effort it took), and was in pretty bad condition. I think he just wanted to get rid of it without having to deal with returns. I took a bet on it, and was lucky it paid out.

Ntrails
·
2 days ago
·
[ - ]

> why didn't he just fit the two H100s into a better desktop box?

I expect because they were no longer in the sort of condition to sell as new machines? They were clearly well used and selling "as seen" is the lowest reputational risk associated with offload

wtallis
·
2 days ago
·
[ - ]

There also weren't H100s available to scavenge. GH200 puts the Grace CPU and H100 GPU on a big module with a custom form factor and connectors, so the only viable route for using those GPUs was to keep all the electronics together and build a suitable case and cooling system around them. There wasn't any way to adapt any of this for use in an ordinary EATX case or with a different CPU, because the GPUs weren't PCIe add-in cards.

renewiltord
·
2 days ago
·
[ - ]

At that pricing I honestly thought they fell off a truck. Even well used H100 go for more than that entire system. In the US an RTX A6000 Ada is already close in price.

GPTshop
·
1 day ago
·
[ - ]

We build these desktops from Nvidia servers we buy from reputable manufacturers like Pegatron, Gigabyte, Asrock Rack, and many more.

H100 PCI and GH200 are two very different things. The advantages of Grace Hopper are much higher connections speeds, bandwidth and lower power consumption.

baud147258
·
1 day ago
·
[ - ]

When you said you paid cash, you paid all ~7.5k€ in paper money? How do you get that much cash out of your bank?

devilbunny
·
1 day ago
·
[ - ]

Presumably by going there, showing your ID, and withdrawing it? They might make you wait a day to have that much on hand, but not more than that.

leipert
·
1 day ago
·
[ - ]

We are talking Germany here. People buy cars in cash. You don’t even have to necessarily wait a day.

devilbunny
·
1 day ago
·
[ - ]

True, which is why I said “might”. Even in the US. I only have to call ahead if I want smaller bills - $20 and $100 they usually have plenty of unless it’s a tiny branch.

baud147258
·
15 hours ago
·
[ - ]

ok, I guess. I just never handled that many. I would also have though that the bank might ask questions

devilbunny
·
13 hours ago
·
[ - ]

Cash deposits or withdrawals over $10k in the US will be reported to the Treasury but if you don’t do them often it won’t raise a big flag, and the bank doesn’t care what you do with it. Treasury only cares if they think you are trying to evade taxes.

In Germany, where large items are often purchased with cash, it would be unremarkable if you did it several times a year.

jerome-jh
·
1 day ago
·
[ - ]

Securing soldered components with epoxy? You have to be very confident at your soldering :) You had no hot glue?

dnhkng
·
1 day ago
·
[ - ]

I've had a bit of practice, but I don't have the right gear for this level of soldering. It took maybe an hour to solder in 2 components, after many failed attempts. Persistence beats intelligence?

ProAm
·
2 days ago
·
[ - ]

Which is how you learn to become an expert. I love it

Fire-Dragon-DoL
·
2 days ago
·
[ - ]

Did it behave like a star at 16 million degrees? Lol

Helmut10001
·
2 days ago
·
[ - ]

I recently had a similar experience, although not this size.

Pre-story: For 3 years I wanted to build a rack-gaming-server, so I can play with my son in our small apartment where we don't have enough space for a gaming computer (wife also doesn't allow it). I have a stable IPsec connection to my parents house, where I have a powerfull PV plant (90kWp) and a rack server, for my freelance job.

Fast forward to 2 months ago, I see a Supermicro SYS-7049GP-TRT for 1400€ on Ebay. It looks clean, sold by some IT reuse-warehouse. No desription, just 3 photos and the case label. I ask the seller whether he knows whats in it and he says he didn't check. The case alone comes new at 3k here in Germany. I buy it.

It arrives. 64GB ECC memory, 2x Xeon silver, 1x 500GB SSD, 5x GBit LAN Cards. Dual 2200 Watt PowerSupply. I remove the airshroud, and: A Nvidia V100S 32GB emerges. I sell the card on ebay for 1600€ and buy 2x Xeon 6254 CPUs (100€ each) to replace the 2x Silver ones that are in it. Last week, I bought two Blackwell RTX 4000 Pro for 1100€ each. Enough for gaming with my son! (and I can do some fun with LLMs and home assistant/smart home..)

The case fits 4x dual-size GPUs, so I could fit 4x RTX 6000 in it (384GB VRAM). At a price of 3k, this would come at 12k (still too much for me.. but let's check back in a couple of years..).

Buying used enterprise gear is fun. I had so many good experiences and this stuff is just rock solid.

systemtest
·
2 days ago
·
[ - ]

Love how a €7.5k 20 kilogram server is placed on a €5 particleboard table. I have owned several LACKs but would never put anything valuable on it. IKEA rates them at 25 kilogram maximum load.

dnhkng
·
2 days ago
·
[ - ]

Oh no, thats not right. 20 Kg was in the original server case. With the Aluminium frames, and glass panel, its more like 40 Kg now... Shit, maybe I should take it off the Lack table...

Ao7bei3s
·
2 days ago
·
[ - ]

LACK tables specifically are well proven to be quite sturdy actually. They happen to be just the right width for servers / network devices, and so people have used them for that purpose for ages. Search for "LACK rack", or see e.g. https://wiki.eth0.nl/index.php/LackRack. 20kg is nothing; I've personally put >100kg on top.

·
2 days ago
·
[ - ]

rtkwe
·
2 days ago
·
[ - ]

They're a bit less usable that way now. The legs are basically completely hollow these days so you're not actually able to bear much weight on the screws so the only option is stacking the items so the weight is born by whatever surface is below the "rack" at which point you could just as easily call stacking the equipment an air rack (or an iLackaRack maybe /s).

ivanjermakov
·
2 days ago
·
[ - ]

Whole 25% safety margin!

rtkwe
·
2 days ago
·
[ - ]

Well to be fair their quoted rating has it's own built in margin. So you're already stacking safety margins.

n3t
·
2 days ago
·
[ - ]

Kind of similar to error bars on error bars → https://xkcd.com/2110/

jauntywundrkind
·
2 days ago
·
[ - ]

What an incredible barn-find type story. Incredible. And you are among very few buyers who could have so lovingly done such an incredible job debugging driver & motherboard issues. Please add a kitsch Serial Experiment Lain themed computing shrine around this incredible work, and all's done.

> 4x Arctic Liquid Freezer III 420 (B-Ware) - €180

Quite aside, but man: I fricking love Arctic. Seeing their fans in the new Corsi-Rosenthal boxes has been awesome. Such good value. I've been sing a Liquid Freeze II after nearly buying my last air-cooled heat-sink & seeing the LF-II onsale for <$75. Buy.

Please give us some power consumption figures! I'm so curious how it scales up and down. Do different models take similar or different power? Asking a lot, but it'd be so neat to see a somewhat high res view (>1 sample/s) of power consumption (watts) on these things, such a unique opportunity.

Tenemo
·
2 days ago
·
[ - ]

Huge fan of those AIOs as well! I have LFIII 420mm in my PC and I've successfully built a 10x10cm cloud chamber with another one which is really pushing it as far as it can go.

djoldman
·
2 days ago
·
[ - ]

> Getting the actual GPU working was also painful, so I’ll leave the details here for future adventurers:

> # Data Center/HGX-Series/HGX H100/Linux aarch64/12.8 seem to work! wget https://us.download.nvidia.com/tesla/570.195.03/NVIDIA-Linux...

> ...

Nothing makes you feel more "I've been there" than typing inscrutable arcana to get a GPU working for ML work...

crapple8430
·
2 days ago
·
[ - ]

While this is undoubtably still an excellent deal, the comparison to the new price of H100 is a bit misleading, since today you can buy a new, legit RTX 6000 Pro for about $7-8k, and get similar performance the first two of the models tested at least. As a bonus those can fit in a regular workstation or server, and you can buy multiple. This thing is not worth $80k in the same way that any old enterprise equipment is not worth nearly as much as its price when it was new.

dnhkng
·
2 days ago
·
[ - ]

Fair points, but the deal is still great because of the nuances of the RAM/VRAM.

The Blackwells are superior on paper, but there's some "Nvidia Math" involved: When they report performance in press announcements, they don't usually mention the precision. Yes, the Blackwells are more than double the speed of the Hopper H100's, but thats comparing FP8 to FP4 (the H100's can't do native FP4). Yes, thats great for certain workloads, but not the majority.

What's more interesting is the VRAM speed. The 6000 Pro has 96 GB of GPU memory and 1.8 TB/s bandwidth, the H100 haas the same amount, but with HBM3 at 4.9 TB/s. That 2.5X increase is very influential in the overall performance of the system.

Lastly, if it works, the NVLink-C2C does 900 GB/s of bandwidth between the cards, so about 5x what a pair of 6000 Pros could do over PCIE5. Big LLMs need well over the 96 GB on a single card, so this becomes the bottleneck.

e.g. Here are benchmarks on the RTX 6000 pro using the GPT-OSS-120B model, where it generates 145 tokens/sec, and I get 195 tokens/sec on the GH200. https://www.reddit.com/r/LocalLLaMA/comments/1mm7azs/openai_...

crapple8430
·
1 day ago
·
[ - ]

The perf delta is smaller than I thought it'd be given the memory bandwidth difference. I guess likely comes from the Blackwell having native MXFP4, since GPT-OSS-120b has MXFP4 MOE layers.

The NVLink is definitely a strong point, I missed that detail. For LLM inference specifically it matters fairly little iirc, but for training it might.

GPTshop
·
9 hours ago
·
[ - ]

GH200 has HBM3 memory. You cannot compare this to a RTX Pro 6000...

segmondy
·
2 days ago
·
[ - ]

you do realize he has 2 H100s, you would need to buy 2 RTX 6000 Pro for $15-$16k plus the hardware. The ram that came with that hardware is worth more than $7000 now.

Helmut10001
·
2 days ago
·
[ - ]

I think he is still correct in saying that the gear OP bought is worth much less now and further deteriorating fast. See my comment above here https://news.ycombinator.com/item?id=46227813.

GPUs have such a short liefspan these days that it is really important to compare new vs. used.

segmondy
·
1 day ago
·
[ - ]

Is it? The used data center P40s I bought for $150 2 years ago went back up to $450 a few months ago, I sold one for $400. I just checked and price is down to $200, so I'm still profitable. I bought MI50s for $90 less than a year ago, they are now going for $200. What deterioration? OPs gear was far less and is no longer deprecating. It will probably hold this value for the next 4 years.

dnhkng
·
2 days ago
·
[ - ]

This is hard to say for sure.

I had 4x 4090, that I had bought for about $2200 each in early 2023. I sold 3 of them to help pay for the GH200, and got 2K each.

skizm
·
2 days ago
·
[ - ]

Serious question: does this thing actually make games run really great? Or are they so optimized for AI/ML workloads that they either don’t work or run normal video games poorly?

Also:

> I arrived at a farmhouse in a small forest…

Were you not worried you were going to get murdered?

dnhkng
·
2 days ago
·
[ - ]

It was fun when the seller told me to come and look in the back of his dirty white van, because "the servers are in here". This was before I had seen the workshop etc.

aeve890
·
1 day ago
·
[ - ]

The lengths someone will go just to have a graphics card and some ram nowadays smh

fsckboy
·
3 hours ago
·
[ - ]

>Were you not worried you were going to get murdered?

he had left a trail of breadcrumbs. although he was hungry, it seemed a prudent precaution.

jaggirs
·
2 days ago
·
[ - ]

I believe these gpus dont have direct hdmi/DisplayPort outputs, so at the very least its tricky to even run a game on them, I guess you need to run the game in a VM or so?

the8472
·
2 days ago
·
[ - ]

Copying between GPUs is a thing, that's how integrated/discrete GPU switching works. So if the drivers provide full vulkan support then rendering on the nvidia and copying to another GPU with outputs could work. And it's an ARM CPU, so to run most games you need emulation (Wine+FEX), but Valve has been polishing that for their steamframe... so maybe?

People have gotten games to run on a DGX Spark, which is somewhat similar (GB10 instead of GH200)

dnhkng
·
2 days ago
·
[ - ]

Correct! I added an Nvidia T400 to the rig recently, as it gives me 4x Display ports, and a whole extra 2GB VRAM!

throawayonthe
·
1 day ago
·
[ - ]

https://looking-glass.io/ could be interesting

nicman23
·
2 days ago
·
[ - ]

you can just force a edid in xorg and run sunshine (streaming)

wtcactus
·
1 day ago
·
[ - ]

Unfortunately sunshine introduces a lot of input lag on NVIDIA.

In AMD I’ve read it works great, but for NVIDIA chips, in mouse heavy games, it becomes unusable for me.

nicman23
·
1 day ago
·
[ - ]

really? that is not the case for me and i use it extensively both for work and games - i have a vdi solution.

wtcactus
·
1 day ago
·
[ - ]

Last time I've tried it was about 9 months ago and that was really an issue.

But I also think that for people that didn't try a "snappier" alternative, it was possible not to realize it's there.

Try and make a comparison with Parsec of even the Steam's own streaming. You will notice a big difference if the issue still exists.

nicman23
·
1 day ago
·
[ - ]

i did a test with just spamming date in a terminal and having a high fps video captured from my phone, it was usually under a frame (granted 60 fps so 1/60 sec)

wtcactus
·
1 day ago
·
[ - ]

Ah, no, that's not what I mean. It's the input devices. Mainly the mouse pointer.

I now remember there was a way to go around it (a bit cumbersome and ugly) which was to render the mouse pointer only locally. That means no mouse cursor changes for tooltips/resizing/different pointers in games, etc. But at least it gets rid of the lag.

nicman23
·
22 hours ago
·
[ - ]

oh but the forwarding of inputs should be irrelevant to gpus.. maybe this is because the vdis run windows and it is a xorg issue?

zamadatix
·
2 days ago
·
[ - ]

I think the point of negative returns for gaming is going above the RTX PRO 6000 Blackwell + AMD 9800X3D CPU + latency optimized RAM + any decent NVMe drive. Seems to net ~1.1x more performance than a normal 5090 in the same setup (and both can be overclocked about equally). Aside from what the GPU is optimized for, the CPU in these servers being ARM based ends up adding more overhead for games (and breaks DRM) which still assume x86 on Windows/Linux.

Havoc
·
2 days ago
·
[ - ]

>Serious question: does this thing actually make games run really great?

LTT tried it in one of their videos...forgot which card but one of the serious nvidia AI cards.

...it runs like shit for gaming workloads. It does the job but comfortably beaten by a mid tier consumer card for 1/10th the price

Their AI track datacenter cards are definitely not same thing different badge glued on

·
2 days ago
·
[ - ]

mrandish
·
2 days ago
·
[ - ]

> does this thing actually make games run really great

It's an interesting question, and since OP indicates he previously had a 4090, he's qualified to reply and hopefully will. However, I suspect the GH200 won't turn out to run games much faster than a 5090 because A) Games aren't designed to exploit the increased capabilities of this hardware, and B) The GH200 drivers wouldn't be tuned for game performance. One of the biggest differences of datacenter AI GPUs is the sheer memory size, and there's little reason for a game to assume there's more than 16GB of video memory available.

More broadly, this is a question that, for the past couple decades, I'd have been very interested in. For a lot of years, looking at today's most esoteric, expensive state-of-the-art was the best way to predict what tomorrow's consumer desktop might be capable of. However, these days I'm surprised to find myself no longer fascinated by this. Having been riveted by the constant march of real-time computer graphics from the 90s to 2020 (including attending many Siggraph conferences in the 90s and 00s), I think we're now nearing the end of truly significant progress in consumer gaming graphics.

I do realize that's a controversial statement, and sure there will always be a way to throw more polys, bigger textures and heavier algorithms at any game, but... each increasing increment just doesn't matter as much as it once did. For typical desktop and couch consumer gaming, the upgrade from 20fps to 60fps was a lot more meaningful to most people than 120fps to 360fps. With synthetic frame and pixel generation, increasing resolution beyond native 4K matters less. (Note: head-mounted AR/VR might one of the few places 'moar pixels' really matters in the future). Sure, it can look a bit sharper, a bit more varied and the shadows can have more perfect ray-traced fall-off, but at this point piling on even more of those technically impressive feats of CGI doesn't make the game more fun to play, whether on a 75" TV at 8 feet or a 34-inch monitor at two feet. As an old-school computer graphics guy, it's incredible to be see real-time path tracing adding subtle colors to shadows from light reflections bouncing off colored walls. It's living in the sci-fi future we dreamed of at Siggraph '92. But as a gamer looking for some fun tonight, honestly... the improved visuals don't contribute much to the overall gameplay between a 3070, 4070 and 5070.

Scene_Cast2
·
2 days ago
·
[ - ]

I'd guess that the datacenter "GPUs" lack all the fixed-function graphics hardware (texture samplers, etc) that's still there in modern consumer GPUs.

jsheard
·
2 days ago
·
[ - ]

They do still have texture units since sampling 2D and 3D grids is a useful primitive for all sorts of compute, but some other stuff is stripped back. They don't have raytracing or video encoding units for example.

fuzzythinker
·
2 days ago
·
[ - ]

> Your mileage may vary. Literally: I had to drive two hours to pick this thing up.

Good one

volf_
·
2 days ago
·
[ - ]

That's awesome.

These are the best kinds of posts

BizarroLand
·
2 days ago
·
[ - ]

Yep. Just enough to inspire jealousy while also saying it's possible

rurban
·
14 hours ago
·
[ - ]

I also to limit my H100 GPU's to 300W max. They are specced at 310W, not 900W. But with 4*310W there are outages every 2nd week.

You really need a special server cabinet and HVAC for these kind of beasts. But you've need them for training, right

Beijinger
·
2 days ago
·
[ - ]

I would appreciate it if someone could name some shops where you can buy used enterprise grade equipment.

Most of them are in California? Anything in NY/NJ

bombcar
·
2 days ago
·
[ - ]

Look on eBay, find sellers with multiple listings, track them down.

There should be some all over the country.

kinow
·
1 day ago
·
[ - ]

That was enjoyable. I miss the days when I would buy old pieces, or find some in old dumpsters in Sao Paulo and try to use old video cards and memory modules to create little franksteins (a lot cheaper than this, but still fun).

I found interesting to learn there are businesses around converting used servers into desktops. Sounds like a good initiative to avoid some e-waste (assuming the desktops are easy to maintain).

m4r1k
·
2 days ago
·
[ - ]

Wow! As others have said, deal of the century!! As a side note, a few years back, I used to scrape eBay for Intel QS Xeon and quite a few times managed to snag incredible deals, but this is beyond anything anyone has ever achieved!

fnands
·
1 day ago
·
[ - ]

Man, you're living some crazy CyberPunk fever-dream.

Nice find, and I admire your courage for even attempting this!

mrose11
·
2 days ago
·
[ - ]

This is freaking cool. Nice job!

ycwatcher
·
2 days ago
·
[ - ]

Great work on the rebuild! The photos are helpful, but if by any chance you happened to film the process, I'd love to see it on YouTube.

dnhkng
·
2 days ago
·
[ - ]

No, it was done over the course of weeks, and I'm not motivated enough to do the production work required for good quality videos.

tigranbs
·
2 days ago
·
[ - ]

Ah, that's the best way to spend ~10K

Frannky
·
2 days ago
·
[ - ]

Wow! Kudos for thinking it was possible and making it happen. I was wondering how long it would be before big local models were possible under 10k—pretty impressive. Qwen3-235B can do mundane chat, coding, and agentic tasks pretty well.

jauntywundrkind
·
2 days ago
·
[ - ]

I feel like it's going to be a long long time before we get a repeat of something like this. And David did such an incredible job on this. Custom designed frame, designed his own water-block! Wildly great effort here.

We'll see how it goes, but what _is_ happening is ram replacement. Nvidia 5090's with 96GB are somewhat a thing now. $4K. YMMV, caveat emptor. https://www.alibaba.com/product-detail/Newest-RTX-5090-96gb-...

MLgulabio
·
2 days ago
·
[ - ]

Argh i was so so hoping that this is a 'thing' and I can just do that too.

Lets continue to hope

albertgoeswoof
·
2 days ago
·
[ - ]

What inference performance are you getting on this with llama?

How long would it take to recoup the cost if you made the model available for others to run inference at the same price as the big players?

kingstnap
·
2 days ago
·
[ - ]

He has GLM 4.5 Running at ~100 Tokens per second.

Assumptions:

Batch 4x and get 400 tokens per second and push his power consumption to 900W instead of the underutilized 300W.

Electricity around €0.2/kWhr.

Tokens valued at €1/1M out.

Assume ~70% utilization.

Result:

You get ~1M tokens per hour which is a net profit of ~€0.8/hr. Which is a payoff time of a bit over a year or so given the €9K investment.

Honestly though there is a lot of handwaving here. The most significant unknown is getting high utilization with aggressive batching and 24/7 load.

Also the demand for privacy can make the utility of the tokens much higher than typical API prices for open source models.

In a sort of orthogonal way renting 2 H100s costs around $6 per hour which makes the payback time a bit over a couple months.

segmondy
·
2 days ago
·
[ - ]

This is about more. I can run 600B+ models at home. Today I was having a discussion with my wife and we asked ChatGPT a quick question, it refused because it can't generate the result based on race. I tried to prompt it to and it absolutely refused. I used my local model and got the answer I was looking for from the latest Mistral-Large3-675B. What's the cost of that?

nicman23
·
2 days ago
·
[ - ]

about the cost of your hardware lol

PhilippGille
·
2 days ago
·
[ - ]

> He has GLM 4.5 Running at ~100 Tokens per second.

GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.

Worth mentioning when discussing token throughput.

dnhkng
·
2 days ago
·
[ - ]

I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).

It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.

Gracana
·
1 day ago
·
[ - ]

Check out "--n-cpu-moe" in llama.cpp if you're not familiar. That allows you to force a certain number of experts to be kept in system memory while everything else (including context cache and the parts of the model that every token touches) is kept in VRAM. You can do something like "-c128k -ngl 99 --n-cpu-moe <tuned_amt>" where you find a number that allows you to maximize VRAM usage without OOMing.

Deathmax
·
1 day ago
·
[ - ]

The author was running a quantised version of GLM 4.5 _Air_, not the full fat version. API pricing for that is closer to $0.2/$1.1 at the top end from z.ai themselves, half the price from Novita/SiliconFlow.

dnhkng
·
2 days ago
·
[ - ]

Running LLM's directly might not be effective.

I think there are probably Law Firms/doctors offices that would gladly pay ~3-4K euro a month to have this thing delivered and run truely "on-prem" to work with documents they can't risk leaking (patent filings, patient records etc).

For a company with 20-30 people, the legal and privacy protection is worth the small premium over using cloud providers.

Just a hunch though! This would have it paid-off in 3-4 months?

rcarmo
·
1 day ago
·
[ - ]

Pretty amazing, although the power consumption and volume put it past the envelope of what I would be willing to run at home…

DANmode
·
1 day ago
·
[ - ]

It has an “off” mode :)

hollow-moe
·
2 days ago
·
[ - ]

For that price ? The bubble already popped for sure !

arein3
·
2 days ago
·
[ - ]

It's practically free

·
2 days ago
·
[ - ]

20after4
·
2 days ago
·
[ - ]

Deal of the century.

zkmon
·
1 day ago
·
[ - ]

So, what do you plan to do with it?

pointbob
·
2 days ago
·
[ - ]

Actually the most incredible part of the story is that a computer geek ofthis 9th circle of geekdom level has a wife.

danr4
·
2 days ago
·
[ - ]

one of the coolest things i've seen recently. kudos!

ionwake
·
2 days ago
·
[ - ]

inspiring! is there an ip i can connect to test the inference speed?

Philpax
·
2 days ago
·
[ - ]

You lucky dog. Have fun!

KellyCriterion
·
1 day ago
·
[ - ]

but .. you know .. can it run Crysis? :-D

SCNR

pointbob
·
2 days ago
·
[ - ]

Can you bitcoin mine?

ChrisArchitect
·
2 days ago
·
[ - ]

Maybe the title could be I bought an Nvidia server..... to avoid confusion that it's something to do with Grace Hopper the person, and her servers ...or mainframes?

dnhkng
·
2 days ago
·
[ - ]

Makes sense. I'm so used to the naming I forgot it's not common knowledge. I hope the new title is clearer.

walrus01
·
2 days ago
·
[ - ]

Grace Hopper is the Nvidia product code name for the chip, much like how Intel cpus were named after rivers, etc

https://www.google.com/search?client=firefox-b-m&q=grace%20h...