I didn't write a single line of code.
Of course no-code doesn't mean no-engineering. This project took a lot more manual labor than I'd hoped!
I wrote a deep dive on the workflow and some thoughts about the future of AI coding and creativity:
It's genuinely astonishing how much clearer this is than a traditional satellite map -- how it has just the right amount of complexity. I'm looking at areas I've spent a lot of time in, and getting an even better conceptual understanding of the physical layout than I've ever been able to get from satellite (technically airplane) images. This hits the perfect "sweet spot" of detail with clear "cartoon" coloring.
I see a lot of criticism here that this isn't "pixel art", so maybe there's some better term to use. I don't know what to call this precise style -- it's almost pixel art without the pixels? -- but I love it. Serious congratulations.
But I didn't want to call it a "SimCity" map, though that's really the vibe/inspiration I wanted to capture, because that implies a lot of other things, so I used the term "pixel art" even though I figured it might get a lot of (valid) pushback...
In general, labels and genres are really hard - "techno" to a deep head in Berlin is very different than "techno" to my cousins. This issue has always been fraught, because context and meaning and technique are all tangled up in these labels which are so important to some but so easily ignored by others. And those questions are even harder in the age of AI where the machine just gobbles up everything.
But regardless, it was a fun project! And to me at least it's better to just make cool ambitious things in good faith and understand that art by definition is meaningful and therefore makes people feel things from love to anger to disgust to fascination.
Feels like something is missing... maybe just a pixelation effect over the actual result? Seems like a lot of the images also lack continuity (something they go over in the article)
Overall, such a cool usage of AI that blends Art and AI well.
It's very cool and I don't mind the use of AI at all but I think calling it pixel art is just very misleading. It's closer to a filter but not quite that either.
It kind of looks like a Google Sketchup render that someone then went and used the Photoshop Clone and Patch tools on in arbitrary ways.
Doesn’t really look anything like pixel art at all. Because it isn’t.
Otherwise every digital image could be classified as pixel art.
This person shares lots authentic looking ai generated pixel art. This should give the building more realistic pixel art look.
Edit: example showing small houses https://x.com/RealAstropulse/status/2004195065443549691 Searching for buildings
At some point I couldn't fiddle with the style / pipeline anymore and just had to roll with "looks ok to me" - the whole point of the project wasn't to faithfully adhere to a style but to push the limits of new technology and learn a lot along the way
> I spent a decade as an electronic musician, spending literally thousands of hours dragging little boxes around on a screen. So much of creative work is defined by this kind of tedious grind. ... This isn't creative. It's just a slog. Every creative field - animation, video, software - is full of these tedious tasks. Of course, there’s a case to be made that the very act of doing this manual work is what refines your instincts - but I think it’s more of a “Just So” story than anything else. In the end, the quality of art is defined by the quality of your decisions - how much work you put into something is just a proxy for how much you care and how much you have to say.
Great insights here, thanks for sharing. That opening question really clicked for me.
I agree that "push button get image" AI generation is at best a bit cheap, at worst deeply boring. Art is resistance in a medium - but at what point is that resistance just masochism?
George Perec took this idea to the max when he wrote an entire novel without the letter "E" - in French! And then someone had the audacity to translate it to English (e excluded)! Would I ever want to do that? Hell no, but I'm very glad to live in a world where someone else is crazy enough to.
I've spent my 10,000 hours making "real" art and don't really feel the need to justify myself - but to all of the young people out there who are afraid to play with something new because some grumps on hacker news might get upset:
It doesn't matter what you make or how you make it. What matters is why you make it and what you have to say.
Also, does someone have an intuition for how the "masking" process worked here to generate seamless tiles? I sort of grok it but not totally.
Reference image from the article: https://cannoneyed.com/img/projects/isometric-nyc/training_d...
You have to zoom in, but here the inputs on the left are mixed pixel art / photo textures. The outputs on the right are seamless pixel art.
Later on he talks about 2x2 squares of four tiles each as input and having trouble automating input selection to avoid seams. So with his 512x512 tiles, he's actually sending in 1024x1024 inputs. You can avoid seams if every new tile can "see" all its already-generated neighbors.
You get a seam if you generate a new tile next to an old tile but that old tile is not input to the infill agorithm. The new tile can't see that boundary, and the style will probably not match.
More interestingly, not even the biggest smartest image models can tell if a seam exists or not (likely due to the way they represent image tokens internally)
The issue is that the overall style was not consistent from tile to tile, so you'd see some drift, particularly in the color - and you can see it in quite a few places on the map because of this.
Would you mind sharing a ballpark estimate?
Firefox, Ubuntu latest.
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://isometric-nyc-tiles.cannoneyed.com/dzi/tiles_metadat.... (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 429.
Edit: i see now, the error is due to the cloudflare worker being rate limited :/ i read the writeup though, pretty cool, especially the insight about tool -> lib -> application
- Chromium: Failed to load tiles: Failed to fetch
- Zen: Failed to load tiles: NetworkError when attempting to fetch resource.
I know you'll get flak for the agentic coding, but I think it's really awesome you were able to realize an idea that otherwise would've remained relegated to "you know what'd be cool.." territory. Also, just because the activation energy to execute a project like this is lower doesn't mean the creative ceiling isn't just as high as before.
Maybe, though a guy did physically carve/sculpt the majority of NYC: https://mymodernmet.com/miniature-model-new-york-minninycity...
That being said I have three kids (one a newborn) - there's no possible way I could have done this in the before times!
Granted, it was a team effort, but that's a lot more laborious than a pixel-art view.
New York City is being recreated at 1:1 scale inside Minecraft
We have a blog post on a similar workflow here: https://www.oxen.ai/blog/how-we-cut-inference-costs-from-46k...
On the inference cost and speed: we're actively working on that and have a pretty massive upgrade there coming soon.
It would be neat if you could drag and click to select an area to inpaint. Let's see everyone's new Penn Station designs!
Would guess it'd have to be BYOK but it works pretty well:
https://i.imgur.com/EmbzThl.jpeg
Much better than trying to inpaint directly on Google Earth data
I especially appreciated the deep dive on the workflow and challenges. It's the best generally accessible explication I've yet seen of the pros and cons of vibe coding an ambitious personal project with current tooling. It gives a high-level sense of "what it's generally like" with enough detail and examples to be grounded in reality while avoiding slipping into the weeds.
gemini 3.5 pro reverse engineered it - if you use the code at the following gist, you can jump to any specific lat lng :-)
https://gist.github.com/gregsadetsky/c4c1a87277063430c26922b...
also, check out https://cannoneyed.com/isometric-nyc/?debug=true ..!
---
code below (copy & paste into your devtools, change the lat lng on the last line):
const calib={p1:{pixel:{x:52548,y:64928},geo:{lat:40.75145020893891,lng:-73.9596826628078}},p2:{pixel:{x:40262,y:51982},geo:{lat:40.685498640229675,lng:-73.98074283976926}},p3:{pixel:{x:45916,y:67519},geo:{lat:40.757903901085726,lng:-73.98557060196454}}};function getAffineTransform(){let{p1:e,p2:l,p3:g}=calib,o=e.geo.lat*(l.geo.lng-g.geo.lng)-l.geo.lat*(e.geo.lng-g.geo.lng)+g.geo.lat*(e.geo.lng-l.geo.lng);if(0===o)return console.error("Points are collinear, cannot solve."),null;let n=(e.pixel.x*(l.geo.lng-g.geo.lng)-l.pixel.x*(e.geo.lng-g.geo.lng)+g.pixel.x*(e.geo.lng-l.geo.lng))/o,x=(e.geo.lat*(l.pixel.x-g.pixel.x)-l.geo.lat*(e.pixel.x-g.pixel.x)+g.geo.lat*(e.pixel.x-l.pixel.x))/o,i=(e.geo.lat*(l.geo.lng*g.pixel.x-g.geo.lng*l.pixel.x)-l.geo.lat*(e.geo.lng*g.pixel.x-g.geo.lng*e.pixel.x)+g.geo.lat*(e.geo.lng*l.pixel.x-l.geo.lng*e.pixel.x))/o,t=(e.pixel.y*(l.geo.lng-g.geo.lng)-l.pixel.y*(e.geo.lng-g.geo.lng)+g.pixel.y*(e.geo.lng-l.geo.lng))/o,p=(e.geo.lat*(l.pixel.y-g.pixel.y)-l.geo.lat*(e.pixel.y-g.pixel.y)+g.geo.lat*(e.pixel.y-l.pixel.y))/o,a=(e.geo.lat*(l.geo.lng*g.pixel.y-g.geo.lng*l.pixel.y)-l.geo.lat*(e.geo.lng*g.pixel.y-g.geo.lng*e.pixel.y)+g.geo.lat*(e.geo.lng*l.pixel.y-l.geo.lng*e.pixel.y))/o;return{Ax:n,Bx:x,Cx:i,Ay:t,By:p,Cy:a}}function jumpToLatLng(e,l){let g=getAffineTransform();if(!g)return;let o=g.Ax*e+g.Bx*l+g.Cx,n=g.Ay*e+g.By*l+g.Cy,x=Math.round(o),i=Math.round(n);console.log(` Jumping to Geo: ${e}, ${l}`),console.log(` Calculated Pixel: ${x}, ${i}`),localStorage.setItem("isometric-nyc-view-state",JSON.stringify({target:[x,i,0],zoom:13.95})),window.location.reload()};
jumpToLatLng(40.757903901085726,-73.98557060196454);100 people built this in 1964: https://queensmuseum.org/exhibition/panorama-of-the-city-of-...
One person built this in the 21st century: https://gothamist.com/arts-entertainment/truckers-viral-scal...
AI certainly let you do it much faster, but it’s wrong to write off doing something like this by hand as impossible when it has actually been done before. And the models built by hand are the product of genuine human creativity and ingenuity; this is a pixelated satellite image. It’s still a very cool site to play around with, but the framing is terrible.
I wonder if for almost any bulk inference / generation task, it will generally be dramatically cheaper to (use fancy expensive model to generate examples, perhaps interactively with refinements) -> (fine tune smaller open-source model) -> (run bulk task).
Interestingly enough, the model could NOT learn how to reliably generate trees or water no matter how much data and/or strategies I threw at it...
This to me is the big failure mode of fine-tuning - it's practically impossible to understand what will work well and what won't and why
- the way they represent image tokens isn't conducive to this kind of task
- text-to-image space is actually quite finicky, it's basically impossible to describe to the model what trees ought to look like and have them "get it"
- there's no reliable way to few-shot prompt these models for image tasks yet (!!)
Oh man...
Upvote for the cool thing I haven’t seen before but cancelled out by this sentiment. Oof.
That's not to say they're not very important issues! They are, and I think it's reasonable to have strong opinions here because they cut to the core of how people exist in the world. I was a musician for my entire 20s - trust me that I deeply understand the precarity of art in the age of the internet, and I can deeply sympathize with people dealing with precarity in the age of AI.
But I also think it's worth being excited about the birth of a fundamentally new way of interacting with computers, and for me, at this phase in my life, that's what I want to write and think about.
You get your votes back from me.
Edit: this submission has a few links that could be what I had in mind but most of them no longer work: https://news.ycombinator.com/item?id=2282466
I am especially impressed with the “i didn’t write a single line of code” part, because I was expecting it to be janky or slow on mobile, but it feels blazing fast just zooming around different areas.
And it is very up to date too, as I found a building across the street from me that got finished only last year being present.
I found a nitpicky error though: in Brooklyn downtown, where Cadman Plaza Park is, your webite makes it looks like there is a large rectangular body of water there (e.g., a pool or a fountain). In reality, there is no water at all, it is just a concrete slab area.
I don't think there are enough artists in the world to achieve this in a reasonable amount of time (1-5 years) and you're probably looking at a $10M cost?
Part of me wonders if you put a kickstarter together if you could raise the funds to have it hand drawn but no way the very artists you hire wouldn't be tempted to use AI themselves.
One thing I would suggest is to also post-process the pixel art with something like this tool to have it be even sharper. The details fall off as you get closer, but running this over larger patch areas may really drive the pixel art feel.
> If you can push a button and get content, then that content is a commodity. Its value is next to zero.
> Counterintuitively, that’s my biggest reason to be optimistic about AI and creativity. When hard parts become easy, the differentiator becomes love.
Love that. I've been struggling to succinctly put that feeling into words, bravo.
I expect artists will experiment with the new tools and produce incredibly creative works with them, far beyond the quality I can produce by typing in "a pelican riding a bicycle".
Very cool work and great write up.
Cool project!
I actually have a nice little water shader that renders waves on the water tiles via a "depth mask", but my fine-tunes for generating the shader mask weren't reliable enough and I'd spent far too much time on the project to justify going deeper. Maybe I'll try again when the next generation of smarter, cheaper models get released.
Reminds me of https://queensmuseum.org/exhibition/panorama-of-the-city-of-...
SF/Mountain View etc don't even have one! you get a little piece of the NYC brand just for you!
All told I probably put in less than 20 hours of actual software engineering work, though, which consisted entirely of writing specs and iterating with various coding agents.
Since the output is so cool and generally interesting, there might be an opportunity for those forking this to do other cities to deploy a web app to crowd source identifying broken tiles and maybe classifying the error or even providing manual hinting for the next run. It takes a village to make a (sim) city! :-)
You probably need to adjust how caching is handled with this.
I too have been giving cloudflare 5$ for a while now :D
Makes me feel insane that we're passing this off as art now.
To me, the appeal of pixel art is that each pixel looks deliberately placed, with clever artistic tricks to circumvent the limitations of the medium. For instance, look at the piano keys here [1]. They deliberately lack the actual groupings of real piano keys (since that wouldn't be feasible to render at this scale), but are asymmetrically spaced in their own way to convey the essence of a keyboard. It's the same sort of cleverness that goes into designing LEGO sets.
None of these clever tricks are apparent in the AI-generated NYC.
On another note, a big appeal of pixel art for me is the sheer amount of manual labor that went into it. Even if AI were capable of rendering pixel art indistinguishable from [0] or [1], I'm not sure I'd be impressed. It would be like watching a humanoid robot compete in the Olympics. Sure, a Boston Dynamics bot from a couple years in the future will probably outrun Usain Bolt and outgymnast Simone Biles, but we watch Bolt and Biles compete because their performance represents a profound confluence of human effort and talent. Likewise, we are extremely impressed by watching human weightlifters throw 200kg over their heads but don't give a second thought to forklifts lifting 2000kg or 20000kg.
OP touches on this in his blog post [2]:
I spent a decade as an electronic musician, spending literally thousands of hours dragging little boxes around on a screen. So much of creative work is defined by this kind of tedious grind. [...] This isn't creative. It's just a slog. Every creative field - animation, video, software - is full of these tedious tasks. In the end, the quality of art is defined by the quality of your decisions - how much work you put into something is just a proxy for how much you care and how much you have to say.
I would argue that in some case (e.g. pixel art), the slog is what makes the art both aesthetically appealing (the deliberately placed nature of each pixel is what defines the aesthetic) but also awe-inspiring (the slog represents an immense amount of sustained focus).[0] https://platform.theverge.com/wp-content/uploads/sites/2/cho...
[1] https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fu...
But I didn't want to call it a "SimCity" map, though that's really the vibe/inspiration I wanted to capture, because that implies other things, so I used the term "pixel art" even though I knew it'd get a lot of (valid) pushback...
As with all things art, labels are really difficult and the context / meaning / technique is at once completely tied to genre but also completely irrelevant. Think about the label "techno" - the label is deeply meaningful and subtle to some and almost meaningless to others
It's as if NYC was built in Transport Tycoon Deluxe.
I'll be honest, I've been pretty skeptical about AI and agentic coding for real-life problems and projects. But this one seems like the final straw that'll change my mind.
Thanks for making it, I really enjoy the result (and the educational value of the making-of post)!
At first I thought this was someone working thousands of hours putting this together, and I thought: I wonder if this could be done with AI…
Personally I'm extremely excited about all of the creative domains that this technology unlocks, and also extremely saddened/worried about all of the crafts it makes obsolete (or financially non-viable)...
[1] https://files.catbox.moe/1uphaw.png
This is a fairly cool and novel application of generative AI[2], but it did not generate pixel art and it's still wildly incoherent slop when you examine it closely. This mostly works because it uses scale to obfuscate the flaws; users are expected to be zoomed out and not looking at the details. But the details are what makes art work. You could not sell a game or an animation like this. This is not replacing anybody.
[2] It's also wholly unrepresentative of general use-cases. 99.99999999% of generative AI usage does not involve a ton of manual engineering labour fine-tuning a model and doing the things you did to get this set up. Even with all of that effort, what you've produced here is never replacing a commercially viable pixel artist. The rest of the world slapping a prompt into an online generator is even further away from doing that.
If you don’t see these tools as a way for ALL of us to more-intimately reach more of our intended audiences,
whether as a musician, marketer, small business, whatever,
then I don’t know if you were really passionate or excited about what you were doing in the first place.