Part of the main reason I built this was data privacy, I do not want to hand over my private data to any company to further train their closed weight models; and given the recent drop in output quality on different platforms (ChatGPT, Claude, etc), I don't regret spending the money on this setup.
I was also able to do a lot of cool things using this server by leveraging tensor parallelism and batch inference, generating synthetic data, and experimenting with finetuning models using my private data. I am currently building a model from scratch, mainly as a learning project, but I am also finding some cool things while doing so and if I can get around ironing out the kinks, I might release it and write a tutorial from my notes.
So I finally had the time this weekend to get my blog up and running, and I am planning on following up this blog post with a series of posts on my learnings and findings. I am also open to topics and ideas to experiment with on this server and write about, so feel free to shoot your shot if you have ideas you want to experiment with and don't have the hardware, I am more than willing to do that on your behalf and sharing the findings
Please let me know if you have any questions, my PMs are open, and you can also reach me on any of the socials I have posted on my website.
I wrote a blog on reducing the power limits of nvidia gpus. Definitely try it out. https://shelbyjenkins.github.io/blog/power-limit-nvidia-linu...
I would say that power limiting is a potential work around, and it should work perfectly fine for inference, but when it comes to trainning you will want to squeeze every ounce of power. So, depends on your goal.
What CPU/Mobo/Storage are you running with those two 3090s for a 700w to work? I am gonna say, if at any point you're pushing more than 500w out of that PSU, you might be risking the 80% safety rule. I would have at least used a 850w just to be safe with two 3090s + rest of hardware.
It is not expensive, nor is it highly technical. It's not like we're factoring in latency and crosstalk...
Read a quick howto, cruise into Home Depot and grab some legos off the shelf. Far easier to figure out than executing "hello world" without domain expertise.
Residential electrical is dangerous. Maybe you electrocute yourself. Maybe you cause a fire 5 years down the line. Maybe you cause a fire for the next owner because you didn't know to protect the wire with a metal plate so they drill into it.
Having said that, 2 4090s will run you aroud $5,000, not counting any of the surrounding system. At that cost point, hireing an electritian would not be that big of an expense relativly speaking.
Also, if you are at the point where you need to add a circut for power, you might need to seriously consider cooling, which could potentially be another side quest.
Re: cooling; I have an AC vent directed on the setup, plus planned out in-out in the most optimal way possible to maximize cooling. I have installed like 20 more fans since taking these pictures :D
If you screw it up and need to file a claim, insurance can’t deny the claim based solely on the fact that you performed the work yourself, even if you’re not a certified electrician/plumber/whatever.
What you don't want to do is have an unlicensed friend work on your home, and vice versa. There are no legal protections, and the insurance companies absolutely will go after you/your friend for damages.
Edit: sorry this applies to owned property, not if you’re renting
https://www.montgomerycountymd.gov/DPS/Process/combuild/home...
Granted, if you actually do unlicensed work in your house, no one will know. But it is still illegal.
What is common here, in the handy crowd at least, is to do your own electrical, plumbing, gas work and leave it open and accessable for a licenced professional to check and sign off on.
You're still paying for an hour or two of their time and a surcharge for "taking on the responsibility" but it's often not an issue if the work is clean, to current code, and sanity tests correct (correct wiring, correct angles on plumbing, pressure testing on gas pipes).
_This_ is the claim that is extraordinary. I'm not saying that the government would bust down my door for doing work on my own home, but rather that the insurance company would then view that work as uninsured.
The entire business model of insurance agencies is to find new, creative, and unexpected ways to deny claims. That is how they make their money. To claim that they would accept liability for a property that's had uninspected work done by an unlicensed, untrained, unregistered individual is just that - extraordinary.
There should be an easy/reliable way to channel "waste heat" from something like this to your hot water system.
Actually, 4 or 5 kW continuous is a lot more than most domestic hot water services need. So in my usual manner of overcomplicating simple ideas, now I want to use the waste heat to run a boiler driving a steam engine, perhaps to directly mechanically run your air conditioning or heat pump compressor.
These aren't power requirements that are insurmountable. They would get pricey though and I wish my rig for computing would use something around .1kW under load...
Using the heat from PCs would be nice. I guess most just use them as electrical heaters right now.
Stuffed the homelab next to the air intake of the water heater, now when I need hot water my water heater sucks the heat out of the air and puts it into the water.
It's obviously not 100% efficient, but at least it recaptures some of the waste heat and decreases my electrical bill somewhat.
The instinct to not touch something that you don't yet deeply understand is very much an engineer's instinct. Any engineer worthy of the title has often spent weeks carefully designing a system to take care of the hundreds of edge cases that weren't apparent at a quick glance. Once you've done that once (much less dozens of times) you have a healthy respect for the complexity that usually lurks below the surface, and you're loathe to confidently insert yourself confidently into an unfamiliar domain that has a whole engineering discipline dedicated to it. You understand that those engineers are employed full time for a reason.
The attitude you describe is one that's useful in a lot of cases and may even be correct for this particular application (though I'm personally leery of it), but if confidently injecting yourself into territory you don't know well is what being an "engineer" means to you, that's a sad commentary on the state of software engineering today.
Which is used as "what the heck" but it's direct kanji translation is one body.
Some old serial ports had 12V and a high max current. The DIY things you attached here were prone to kill your mainboard.
Voltage/current is either 0 or 1. Anything higher kills software developers instantly.
That being said, it's still very easy not to kill yourself with 120/230V: just shut down the power before touching anything.
If you know nothing about basic electric work or principles, sure - spend the $500 to have an electrician add a 30 or 50A 220V outlet near your electric service panel. Totally reasonable to do as it is indeed dangerous to touch things you don’t understand.
It’s far less complex and less dangerous than adding an EV charge point to your garage which seems to be quite common for this crowd. This is the same (less, since you typically have a lot more flexibility on where to locate the outlet and likely don’t need to pull through walls) complexity as adding a drop for an electric stove.
Where the “home electric hackers” typically tend to get in trouble is doing stuff like adding their own generator connection points and not properly doing interlocks and all that fun stuff.
If you can replace your own light switches and wall receptacles you are just one step away from adding an additional branch circuit. Lots of good learning material out there on the subject these days as well!
As a hobby, I restore pinball machines. A modern one is extremely careful about how it uses power, limiting wall current to a small, normally-sealed section of the machine. And even so, it automatically disables the lower-voltage internals the moment you open the coin door. A 1960s machine, by contrast, may not have a ground at all. It may have an unpolarized plug, and it will run wall current all over the place, including the coin door, one of the flippers, and a mess of relays.
In the pinball community, you'll find two basic attitudes toward this. One is people treating electrical safety about as seriously as the people who design the modern machines. The others is people who think anybody who worries about a little wall current are all pussies who don't have the balls to work on anything and should just man up and not worry about a little 120V jolt.
The truth is that most people here are not engineers of any sort. We're software developers. We're used to working in situations where safety and rigor basically don't matter, where you have to just cowboy ahead and try shit. And that's fine, because control-z is right there. I've met people who bring that attitude to household electrical work, and they're fucking dangerous. I know one guy, quite a smart one, who did a lot of his own electrical work based on manliness and arrogance, and once the inspector caught up with him, he immediately pulled the guy's meter and wouldn't let him connect up to the grid again until a real electrician had straightened it all out.
It's true that this stuff is not that hard to learn if you study it. But an architect friend likes to say that the building code is written in blood, meaning that much of it is there because confident dumbasses managed to kill enough people that they had to add a new rule. If people are prepared to learn the rules and appreciate why they're there, I'm all for it. But if they do it coming from a place of proving that they're not "so afraid of residential power", that's a terrible way to approach it.
FYI, I can handle electrical system design and sheet metal enclosure design/fabrication for these rigs, but my software knowledge is limited when it comes to ML. If anyone's interested, I'd love to collaborate on a joint venture to produce these rigs commercially.
As I mentioned in my reply to OP, very doable as long as you do your research. The only thing I did not do was not doing the installation itself because I was not comfortable with it, but I pretty much had everything named to the contractor, and even how I would have gone about the installation process was exactly how he did it.
Hit me up on Twitter or Email, we can chat ideas about this venture
I'm curious, how do you use e.g. a washing machine or an electric kettle, if 2kW is enough to flip your breaker? You should simply know your wiring limits. Breaker/wiring at my home won't even notice this.
Boiling 1 liter takes like 2 mins. Most Americans don’t have kettles because they don’t drink tea.
You're correct that the dryer is on a larger circuit, though.
You think that this is "just fine" because you've never experienced the glory that is a 3kW kettle!
1.5kW must be absolute agony
I've never seen a dedicated circuit for dryers in Australia, and I've lived in probably a dozen different properties. Ovens, aircon, hot water, bathroom heat lamps often have dedicated circuits, though.
If that wasn’t the limit though, the fact that the machine is currently a space heater at 2 liquid cooled 4090’s would be.
Back when I worked at a high-availability data center, all of our servers had dual psus, plugged into seperate circuits.
The transformer in the PSUs should electrical isolate the mains voltage from the low voltage side, so you aren't going to cause a short across the two circuts.
The only risk I see is a cascade failure, where the increased load on the second circuit causes its breaker to trip.
In the real world you would plug them into a PDU such as: https://www.apc.com/us/en/product/AP9571A/rack-pdu-basic-1u-...
Each GPU will take around 700W and then you have the rest of the system to power, so depending on CPU/RAM/storage...
And then you need to cool it!
Cooling is its own story... Man, figuring out this process was hella of a journey
Hell most kettles use 3kw. Tho for a big server I'd get it wired dedicated, same way power showers are done (7-12~ kW)
Which is all to say its possible in a residential setting, just probably expensive.
16 amps x 120v = 1920W, it would probably trip after several minutes.
16 amps x 230v = 3680W, it wouldn't trip.
So, as mentioned on the article, I actually have installed (2) 30amp 240v breakers dedicated entirely for this setup (and the next one in case I decide to expand to 16x GPUs over 2 nodes lol). Each breaker is supposed to power up to 6000w at ease. I also installed a specific kind of power outlet that can handle that kind of current, and I have done some extreme research into PDUs. I plan on posting about all of that in this series (part 3 according to my current tentative plans) so stay tuned and maybe bookmark the website/add the RSS feed to your digest/or follow me on any of the socials if this is something that you wanna nail down without spending a month on research like me :'D
What is your cost of electricity per kilowatt hour and what is the cost of this setup per month?
Did not know how expensive it is in usa, especially California.
Maybe a bit of a stupid question, but what do you actually do with the models you run/build, a part from tinkering? I'd assume most tinkering can also be done on smaller systems? Is it in order to build a model that is actually 'useful'/competitive?
But problem is even 7b models are too slow on my pc.
Hosted models are lightening fast. I considered possibility of buying hardware but decided against it.
Honestly though I'd be curious to see a cost analysis of Apple vs. Nvidia for commercial batched inference. The Nvidia system can obviously spit out more tokens/s but for the same price you could have multiple Mac Studios running the same model (and users would be dispatched to one of them).
Also most rooms have multiple circuits. Just connect half the GPUs to each outlet. I already do this with my desktop PC because it has 2 PSUs.
Also most homes in the US have 30A*240V = 7200W dryer/stove outlets in the kitchen, laundry room, garage, etc. That's where you want to keep loud computers.
Using two circuits is viable.
The NEC 80% rule is only relevant in data centers where someone enforces it. The time-current curve for any 20A breaker shows that it will take infinite time to trip when you draw 20A. This is correct behavior because 12awg wire can handle 20A indefinitely
I guess the outlet doesn't matter a whole lot, especially if you take multiple cables to the wall.
But I still don't want to be doing unenforced electrical violations.
Consider what happens when you plug two 10A devices into a power strip. In the US there are no true 15A rated outlets. All outlets must handle 20A because the electrical code allows any outlet to be on a 20A breaker.
I don't know about any argument that depends on such an action being safe. There's plenty of power strips I wouldn't trust with 20 amps.
This is exactly like when the AMD fanboys got a burr up their ass about the “$50k Mac Pro” with 2tb of memory… when you could the same thing with a threadripper with 256gb of memory for $5k, and it’s just as fast in Cinebench!
https://old.reddit.com/r/Amd/comments/f1a0qp/15000_mac_pro_v...
your gaming scores on your 3090 with 24gb are just as irrelevant to this 200gb workload as the threadripper is to those Mac Pro workloads lol
"Wow great post! I enjoy your valuable contributions. Can you tell me more about graphics cards and how they compare to other different types of computers? I am interested and eager to learn! :)"
The answer here is that the Nvidia system has much better performance. I’ve been focused on “can I even run the model” I didn’t think about the actual performance of the system.
If Apple Silicon was in any way a more scalable, better-supported or more ubiquitous solution, then OpenAI and the rest of the research community would use their hardware instead of Nvidia's. Given Apple's very public denouncement of OpenCL and the consequences of them refusing to sign Nvidia drivers, Apple's falling-behind in AI is like the #1 topic in the tech sector right now. Apple Silicon for AI training is a waste of time and a headache that is beyond the capacity of professional and productive teams. Apple Silicon for AI inference is too slow to compete against the datacenter incumbents fielded by Nvidia and even AMD. Until Apple changes things and takes the datacenter market seriously (and not just advertise that they are), this status quo will remain the same. Datacenters don't want to pay the Apple premium just so they can be treated like a traitorous sideshow.
I encourage you to update your beliefs about other people. I’m a very technical person, but I work in robotics closer to the hardware level - I design motor controllers and Linux motherboards and write firmware and platform level robotics stacks, but I’ve never done any work that required running inference in a professional capacity. I’ve played with machine learning, even collecting and hand labeling my own dataset and training a semantic segmentation network. But I’ve only ever had my little desktop with one Nvidia card to run it all. Back in the day, performance of CNNs was very important and I might have looked at benchmarks, but since the dawn of LLMs, my ability to run networks has been limited entirely by RAM constraints, not other factors like tokens per second. So when I heard that MacBooks have shared memory and can run large models with it, I started to notice that could be a (relatively) accessible way to run larger models. I can’t even remotely afford a $6k Mac any more than I could afford a $12k Nvidia cluster machine, so I never really got to the practical considerations of whether there would be any serious performance concerns. It has been idle thinking like “hmm I wonder how well that would work”.
So I asked the question. I said roughly “hey can someone explain why OP didn’t go with this cheaper solution”. The very simple answer is that it would be much slower and the performance per dollar would be 10x worse. Great! Question answered. All this rude incredulousness coming from people who cannot fathom that another person might not know the answer is really odd to me. I simply never even thought to check benchmarks because it was never a real consideration for me to buy a system.
Also the “#1 topic in the tech sector right now” funny in my circles people are talking about unions, AI compute exacerbating climate change, and AI being used to disenfranchise and make more precarious the tech working class. We all live in bubbles.
Again, I'm not accusing you of bad-faith. I'm just saying that asking such a bald-faced and easily-Googled question is indistinguishable from flamebait. There is so much signalling that should suggest to you that Apple hardware is far from optimized for AI workloads. You can look at it from the software angle, where Apple has no accessible GPGPU primitives. You can look at it from a hardware perspective, where Apple cannot beat the performance-per-watt of desktop or datacenter Nvidia hardware. You can look at it from a practical perspective, where literally nobody is using Apple Silicon for cost-effective inference or training. Every single scrap of salient evidence suggests that Apple just doesn't care about AI and the industry cannot be bothered to do Apple's dirty work for them. Hell, even a passing familiarity with the existence of Xserve should say everything you need to know about Apple competing in markets they can't manipulate.
> funny in my circles people are talking about unions, AI compute exacerbating climate change, and AI being used to disenfranchise and make more precarious the tech working class.
Sounds like your circles aren't focused on technology, but popular culture and Twitter topics. Unionization, the "cost" of cloud and fictional AI-dominated futures were barely cutting-edge in the 90s, let alone today.
The Geekbench GPU compute benchmarks are nearly worthless in any context, and most certainly are useless for evaluating suitability for running LLMs, or anything involving multiple GPUs.
Buying a PC for local AI? These are the specs that matter
https://www.theregister.com/2024/08/25/ai_pc_buying_guide/
It died without interest here: https://news.ycombinator.com/item?id=41347785 likely time of day, I'm mostly active in HN off peak hours.
As for actual trains, they can be suprisingly affordable (to live in):
https://atrservices.com.au/product/sa-red-hen-416/ https://en.wikipedia.org/wiki/South_Australian_Railways_Redh...
and the freight rolling stock flatcars make great bridges (single or sectioned) with concrete pylons either end for farms - once the axles are shot they can go pretty damn cheap and the beds are good enough to roll a car or small truck over.
addendum: in case you miss fresh reply to old comment: https://news.ycombinator.com/item?id=41484529
something that is actually interesting that is attempting to bring something on the table : check tinygrad/tinycorp
I wonder if this will happen. It's already really hard to buy big HDDs for my NAS because nobody buys external drives anymore. So the pricing has gone up a lot for the prosumer.
I expect something similar to happen to AI. The big cloud parties are all big leaders on LLMs and their goal is to keep us beholden to their cloud service. Cheap home hardware work serious capability is not something they're interested in. They want to keep it out of our reach so we can pay them rent and they can mine our data.
That said, I really don't think that the way forward for hobbyists is maxing VRAM. Small models are becoming much more capable and accelerators are a possibility, and there may not be a need for a person to run a 70billion parameter model in memory at all when there are MoEs like Mixtral and small capable models like phi.
I buy refurb/used enterprise drives for that reason, generally around $12 per TB for the recent larger drives. And around $6 per TB for smaller drives. You just need an SAS interface but that's not difficult or expensive.
IE; 25TB for $320, or 12TB for $80.
IME 20tb drives are easy to find.
I don't think the clouds have access to bigger drives or anything.
Similarly, we can buy 8x A100s, they're just fundamentally expensive whether you're a business or not.
There doesn't seem to be any "wall" up like there used to be with proprietary hardware.
For me these prices are prohibitive. Just like the A100s are (though those are even more so of course).
The problem is the common consumer relying on the cloud so these kind of products become niches and lose volume. Also, the cloud providers don't pay what we do for a GPU or HDD. They buy them by the ten thousands and get deep discounts. That's why the RRPs which we do pay are highly inflated.
Homelab vendor in Austin, TX with periodic sales, limited volume: https://shop.digitalspaceport.com
And they do have better sales.
Of course the vendor can't make a profit with such discounts so they inflate the RRP. But we do end up paying that.
I have seen very large ent customers get 80% discount on hardware - it’s mind boggling that the vendor is not going bankrupt.
Yes exactly. When I see what we pay for stuff at work...
Obviously the vendors don't have 80+% margins. So what do they do? Inflate the RRP to compensate. So they can give a huge discount that sounds good on paper.
But this makes it unviable to buy for consumers that do have to pay RRP.
> the heir to rear projection — a dynamic, real-time, photo-real background played back on a massive LED video wall and ceiling, which not only provided the pixel-accurate representation of exotic background content, but was also rendered with correct camera positional data.. “We take objects that the art department have created and we employ photogrammetry on each item to get them into the game engine”
Do you have a rough estimate of how much this cost? I'm curious since I just built my own 2x 3090 rig and I wondered about going EPYC for the potential to have more cards (stuck with AM5 for cheapness though).
All in all I spent about $3500 for everything. I'm guessing this is closer to $12-15k? CPU is around $800 on eBay.
I've heard that NVLink helps with training, but not so much with inferencing.
It also costs a lot to power. In the summer, 2x more than you expect, because unless it’s outside, you need cool 1000+ watts of extra heat with your AC. All that together and runpod starts to look very tempting!
Getting a circuit put is also much more difficult in a shared building...
Runpod has 3090s for .43 per hour! .22 spot. If your power costs .3$ per kWh, and you need to spend _another_ .3$ per kWh in cooling, say if you live in apartment in the Bay Area and it's summer, that's ~48 days to equal the cost of 30 days on runpod. So you are still saving some money, though much less than you might think and possibly spending more than spot instances!
I have a setup with 3 RTX 3090 GPUs and the PCIe risers are a huge source of pain and system crashes.
I've had my eye on these for a bit https://c-payne.com/
The worst thing is dust. They would accumulate so much every week I had to blow the dust off with an air compressor.
Electricity cost was around $4 a day (24 x $0.20~). If online GPU renting is more expensive, maybe the initial cost could be justifiable.
Except not doing the sketchy x1 pcie lanes. That’s the part that makes nice LLM setups hard
It works fine for crypto but LLM performance is far more sensitive to bandwidth. You lose a ton of performance if you’ve got PCIe in the loop, never mind one lane pcie. That’s why nvlink is (was) a thing - trying to cut that out entirely.
This might be the right time to ask: So, on the one hand, this is what it takes to pack 192gb of Nvidia flavored vram into a home server.
I'm curious, is there any hope of doing any interesting work on a MacBook Pro Which currently can be max-spaced at 128 GB of unified memory (for the low, low price of $4.7k).
I know there's no hope of running cuda on the macbook, and I'm clearly out of my depth here. But the possibly naive day-dream of tossing a massive LLM into a backpack is alluring...
My assumption was that going beyond 2 cards incurs significant bandwidth penalty when going from NVLink between 2x3090s to PCIe for communicating between the other 3090s.
What kind of T/s speeds are you getting with this type of 8x3090 setup?
Presumably then even crazier 16x4090 would be an option for someone with enough PCIe slots/risers/extenders.
I hope this guy posts updates.
Are you intending to use the capacity all for yourself or rent it out to others?
As a side note I’d love to find a chart/data on the cost performance ratio of open source models. And possibly then a $/ELO value (where $ is the cost to build and operate the machine and ELO kind of a proxy value for the average performance of the model)
I haven't had enough time to find a way to split inference which is what I'm most interested in. Yours is also much better with the 1600 W supply. I have a hodge podge.
I'm a believer! Can't wait to hear more about this.
I'm excited to see your benchmarks :)
Is a blockchain needed to sell unused GPU capacity?
Eventually there could be some tipping point where networks are fast enough and there are enough hosting participants it could be like a worldwide/free computing platform - not just for AI for anything.
IRL all you need is a simple platform to pay and schedule jobs on other’s GPUs.
Folding at home can track user contributions and issue micro/payments as they see fit. Crucially, this does not need an immutable chain of truth to do.
Instead, if we added a blockchain, then we would require 2 sets of participants - those who run the useful simulations for science, and those who run the useless calculations for the blockchain. A complete waste of resources.
Maybe you run a private platform too like git/GitHub if there are real world payments and user accounts, but I wonder why couldn't that technology be used? Does "blockchain" just have an irreparably bad name at this point?
- Fitting models in memory
- Inference / Training speed
8 x RTX 3090s will absolutely CRUSH a single Mac Studio in raw performance.
Also, modern GPUs are surprisingly good at throttling their power usage when not actively in use, just like CPUs. So while you need 3kW+ worth of PSU for an 8x3090 setup, it’s not going to be using anywhere near 3kW of power on average, unless you’re literally using the LLM 24x7.
Edit: I guess to directly answer your question, I don’t see why you couldn’t run a 70b model at full quality on either a M2 192GB machine or on an 8x 3090 setup.
Running llama3.1 70B is brutal on this thing. Responses take minutes. Someone running the same model on 32GB of GPU memory seems to have far better results from what I've read.
I'm currently using reflection:70b_q4 which does a very good job in my opinion. It generates with 5.5 tokens/s for the response, which is just about my reading speed.
edit: I usually dont run larger models (q6) because of the speed. I'd guess a 405B model would just be awfully slow.