Either kernel hackers unexpectedly love frontend, or more likely the people that write the code don't overlap much with the people that star Github projects!
They are similar because they are popular, not because there is semantic relationship.
It's the same problem I faced with the map of reddit (https://anvaka.github.io/map-of-reddit/ ) - all popular subreddits are just "similar" to each other.
Stil works great for smaller, non-celebrity projects :D
Wonder how you’d implement that in a heat map. Just call each pixel a document and see where it takes you?
A tf*idf matrix could be applied to the star-feature matrix too. Document = github repo. Term = name of user who starred it.
THUS, users who overstar are simply less important for computing similarities.
This would mitigate the phenomenon of massively popular github repos being clustered together because of folks who blithely star the most well known stuff.
Also, interesting how both Bevy and Veloren are in Rustland. Probably, the stars come more from the Rust community than the game dev community. Which I guess makes sense: the Rust ecosystem is still relatively small and feels like a lot of people doing X but in Rust.
Aiming to redo it some time in early 2025!
https://anvaka.github.io/map-of-github/#12/24.78947/18.85186
The fact that julialang/julia ended up near tensorflow and opencv, and actual Julia packages ended up elsewhere, probably reflects a difference between aspirational users and real users: a lot of people who starred the Julia project itself were numeric Python users who were looking for a new Python, but then mostly stuck to Python itself, so their other stars are in the numeric Python land. Those who starred the JuliaLand packages are the actual Julia users who aptly enough ended up near Moleculandia and AstroSpace and Quantumia.
That explanation sounds very plausible.
I like diagrams where the axes mean something. Lines, shape, boxes/groups, distance, X vs Y, colour, thickness, texture, background, foreground. I also like simple. So often it’s lines to be fancy with no meaning. This one is just a pic, with some grouping, and it has personality. Yay?
(Still love lines, just not everywhere always.)
Hm... unless maybe we do some sort of quantum clustering, which could be a fun project to explore!
It's a bit hazy now, but I remember trying hdbscan algorithm (hierarchical clustering), and on the graph of the GitHub size - I just couldn't fit it in memory.
I did end up using something similar to hierarchical clustering (mix of louvain/leiden/my own), and that's what we see in the final map.
Basically what others are guessing, lines represent the highest similarity scores based on "stargazers", which also forms the entire map. To anyone confused, the lines only appear once you click into a specific country.
And then some clustering algorithm makes sense of this giant graph by laying out sets of nodes that have a lot of edges to each other, close to each other
The closeness is just layout, the edges is the data structure that determines closeness.
Remember that feeling when deploying algorithms, especially when those affect people (which hopefully in not the case with this nice project.
A mechanism to explain how specific results came about is as much part of the project as the more technical machine learning choices involved.
A while back ngraph blew my mind. I built a taxonomy biz off ngraph:
That link you've shared - doesn't open for some reason
I found the (awesome) video where he presents ngraph: https://youtu.be/vZ6Yhlxv7Os
Edit: not loading? surge.sh has been less reliable lately, will get to finish that project some day and will publish elsewhere.
Does this mean that HTMX is mostly used by Django devs?
Ditto if you develop Django, you star Python libraries, not Django downstream plugins.
That might be the case, but the libraries seems to be more reusable!
2023
It was somewhat amusing that MicroPython isn't in MicroPythonia but Arduinoria...and CircuitPython is in PicoPythonia. :)
The only problem I see is that projects don't fit so nicely in the division between languages (Pythonia, Javaland, Clojuria, etc) and applications (Gamedonia, AILandia, etc). There's a lot of intersection between them.
But the visualization is super-cool nonetheless. :)
And as usual important libraries don't get as much attention as flash little leaf projects.
I tried something similar a few weeks ago, using the embedding vectors of the Github project descriptions.
Using what inputs? The repo seems to have only the frontend code.
I tried quite a few various similarity metrics, and Jaccard was giving me the best results. This is all very subjective, of course.
I love this sort of concept map and I am typically disappointed by the execution.
Definitely some unique naming choices there lol
Add to that the support for type annotations that can go all the way from fully untyped and dynamic, to runtime-enforced primitive constraints and object types, and you'll end up with a very good choice for web applications that evolve quickly.