A Gentle Introduction to Graph Neural Networks (2021)

362
33
misonic
6 months ago
distill.pub

cherryteastain
·
6 months ago
·
[ - ]

There are a lot of papers using GNNs for physics simulations (e.g. computational fluid dynamics) because the unstructured meshes used to discretize the problem domain for such applications map very neatly to a graph structure.

In practice, every such mesh/graph is used once to solve a particular problem. Hence it makes little sense to train a GNN for a specific graph. However, that's exactly what most papers did because no one found a way to make a GNN that can adjust well to a different mesh/graph and different simulation parameters. I wonder if there's a breakthrough waiting just around the corner to make such a generalization possible.

magicalhippo
·
6 months ago
·
[ - ]

Naive question:

Words in sentences kinda forms graphs, referencing other words or are leafs being referenced, both inside sentences and between sentences.

Given the success of the attention mechanism in modern LLMs, how well would they do if you trained a LLM to process an actual graph?

I guess you'd need some alternate tokenizer for optimal performance.

cherryteastain
·
6 months ago
·
[ - ]

For physics sims, I'd say it's useless.

Imagine you discretize a cube into 1000 gridpoints in each direction, that's 1000^3 = 1 billion nodes/"tokens". Plus you typically time-march some sort of equation so you need the solutions previous 3-5 timesteps as well so that's 3-5 billion tokens. If you are gonna do that in the first place, you may as well just use the traditional solver. Traditional solvers usually set up and solve a matrix equation like Ax=b with an iterative method like multigrid which is O(n) as opposed to transformer's O(n^2). It'll give you a much more accurate answer much quicker than it'll take a transformer to do attention on a sequence of length 3 billion.

The entire point of using GNNs/CNNs in this field is that people rely on their ability to make inference using local information. That means the value at each gridpoint/node can be inferred from neighbouring nodes only, which is O(n) like multigrid. Idea in most papers is that the GNN can do this faster than multigrid. Results so far are mixed, however [1].

[1] https://arxiv.org/abs/2407.07218

magicalhippo
·
6 months ago
·
[ - ]

Ah yes, for dense problems like that I wouldn't expect it to work well. The example graphs in the submission were mostly quite sparse, hence why I thought of LLMs. But perhaps that was just for illustrative purposes.

disattention
·
6 months ago
·
[ - ]

This is actually a good insight. It turns out that transformers are indeed a form of graph network, precisely because of the attention mechanism. Graph attention networks are actually a very popular GNN architecture. Generally, the issue with using an LLM style architecture for generic graphs is modeling the sparsity, but is possible by using the graph adjacency matrix to mask the attention matrix. There are a number of papers and articles which address this connection, and plenty of research into mechanisms for sparsifying attention in transformers.

There are also graph tokenizers for using more standard transformers on graphs for doing things like classification, generation, and community detection.

algo_trader
·
6 months ago
·
[ - ]

Any canonical papers on GNN for code graphs?

6gvONxR4sf7o
·
6 months ago
·
[ - ]

That's the kind of thing that I could imagine could dramatically speed up certain tasks, but not enable particularly new abilities. A ton of the challenge is converting a sequence into a graph. So if you need a huge clever model to turn a sequence into a graph, then your potential gain downstream is either a) in making certain computationally hard queries easier, or b) answering tons and tons of followup queries dramatically faster (enough to make the initial graphification overhead okay).

For (a), any imperfections in the graphification make the problem super hard and researchy.

whimsicalism
·
6 months ago
·
[ - ]

yep you're now pretty much at state-of-the-art

zmgsabst
·
6 months ago
·
[ - ]

A general graph solver has to be a general intelligence, since it would be able to successfully model category theory.

openrisk
·
6 months ago
·
[ - ]

Very high quality work, its a pity that distill.pub did not find a sustainable way forward [1].

On GNN's, the lack of datasets [2] might be a reason they are not as talked about. This is something that has affected also the semantic web domain.

[1] https://distill.pub/2021/distill-hiatus/

[2] https://huggingface.co/datasets?task_categories=task_categor...

wenc
·
6 months ago
·
[ - ]

My personal hack is that before jumping into papers in unfamiliar areas like these, I go on YouTube and search for “<topic> explained”.

The power of YouTube is that for any hot area, there are a bunch of people incentivized to make a short video that maximally engages. The quality can be quite high even at higher levels of math abstractions. The visuals are really helpful to get a feel for the abstractions (3 blue 1 brown proved this years ago).

There are some excellent videos of GNNS that in less than 10 mins gave me a launching point into the literature.

siver_john
·
6 months ago
·
[ - ]

Honestly, seeing this on the front page had my hopes up they had found out a way forward but alas.

samsartor
·
6 months ago
·
[ - ]

GNNs have been a bit of a disappointment to me. I've tried to apply them a couple times to my research but it has never worked out.

For a long time GNNs were pitched as a generalization of CNNs. But CNNs are more powerful because the "adjacency weights" (so to speak) are more meaningful: they learn relative positional relationships. GNNs usually resort to pooling, like described here. And you can output an image with a CNN. Good luck getting a GNN to output a graph. Topology still has to be decided up front, sometimes even during training. And the nail in the coffin is performance. It is incredible how slow GNNs are compared to CNNs.

These days I feel like attention has kinda eclipsed GNNs for a lot of those reasons. You can make GNNs that use attention instead of pooling, but there isn't much point. The graph is usually only traversed in order to create the mask matrix (ie attend between nth neighbors) and otherwise you are using a regular old transformer. Often you don't even need the graph adjacencies because some kind of distance metric is already available.

I'm sure GNNs are extremely useful to someone somewhere but my experience has been a hammer looking for a nail.

igorkraw
·
6 months ago
·
[ - ]

GNNs are useful at least in one case, when your data a set of atoms that define your datum through their interactions, specifically a set that is that is high cardinality (so you can't YOLO it with attention) with some notion of neighbourhood (i.e. geometry) within your set (defined by the interactions) which if maintained makes the data permutation equivariant, BUT you can't find a meaningful way to represent that geometry implicitly (for example because it changes between samples) => you YOLO it by just passing the neighourhood/interaction structure in as an input.

In almost every other case, you can exploit additional structure to be more efficient (can you define an order? sequence model. is it euclidean/riemanian? CNN or manifold aware models. no need to have global state? pointcloud networks. you have an explicit hierarchy? Unet version of your underlying modality. etc)

The reason why I find GNNs cool is that 1) they encode the very notion of _relations_ and 2) they have a very nice relationship to completely general discretized differential equations, which as a complex systems/dynamical systems guy is cool (but if you can specialize, there's again easier ways)

lmeyerov
·
6 months ago
·
[ - ]

Are you doing some regular like vision?

For the reasons you're saying, I don't think it's an accident that GNNs are popular mostly in domains like recommendations that feel graph-y for their domain model so getting to a useful topology isn't as big a leap.

A frustration for me has been more that many of these graph-y domains are about behavioral machine/people data like logs that contain a large amount of categorical dimensions. The graph part can help, but it is just as import to key on the categorical dimensions, and doing well there often end up outside of the model - random forest etc. It's easier to start with those, and then is a lot of work for the GNN part (though we & others have been trying to simplify) for "a bit more lift".

Of course, if this is your core business and this means many millions of dollars, it can be justified... but still, it's hard for most production teams. In practice, we often just do something with pygraphistry users like xgboost + umap and move on. Even getting an RGCN to perform well takes work..

energy123
·
6 months ago
·
[ - ]

Google's GraphCast is a GNN: https://deepmind.google/discover/blog/graphcast-ai-model-for...

eperfa
·
6 months ago
·
[ - ]

Google DeepMind's GenCast is based on diffusion: https://deepmind.google/discover/blog/gencast-predicts-weath...

(Partially) Google Research's/DeepMind's NeuralGCM is based on hybrid models using ODEs and learnt physics: https://www.nature.com/articles/s41586-024-07744-y

Microsoft Research's Aurora on vision transformers: https://www.microsoft.com/en-us/research/blog/introducing-au...

Huawei's Pangu Weather is also a 3D transformer I believe https://www.nature.com/articles/s41586-024-07744-y

I just wanted to highlight that there are multiple approaches in use for the same problem / in the same domain, and GNN does not seem to be the most widely used one.

stephantul
·
6 months ago
·
[ - ]

Same! I’ve seen many proposals to use a GNN for some problem for which we used a “flat” model, e.g., taking into account HTML structure when predicting labels for pages. Even when it seemingly made a lot of sense to use them, it didn’t work.

biomcgary
·
6 months ago
·
[ - ]

I've followed GNNs in biology and applied them in a couple domains, but have been disappointed by the results so far. I'm a bit surprised because I have successfully used other graph-based approaches in biology.

helltone
·
6 months ago
·
[ - ]

It seems GNNs operate on a fixed topology. What if I want to approximate some transformation of the topology of the graph? For example learning how to layout a graph, or converting program abstract syntax trees to data flow graphs.

igorkraw
·
6 months ago
·
[ - ]

The whole point of GNNs is that they generalize to arbitrary topologies by explicitly conditioning the idea of "neighbours" on the graph specifying the topology. Graph layout has been tried here https://github.com/limbo018/DREAMPlace to great fanfare although there is recent drama about it https://www.semanticscholar.org/paper/The-False-Dawn%3A-Reev... . Graph transformations are a thing as well https://arxiv.org/abs/2012.01470 but it's a tricky problem because you implicitly need to solve the graph matching problem

Xmd5a
·
6 months ago
·
[ - ]

Graph layout is extremely interesting for infographics, since a handmade graph will almost always beat what tools such as graphviz can come up with (and I'm not even mentioning algorithms that go beyond Sugiyama's for which there is only a paper).

Any progress on this front ?

FjordWarden
·
6 months ago
·
[ - ]

Maybe homology can help, it is a sort of calculus for discrete structures where you count how many N dimensional hole there are over time. Dunno about NN but that is what they can do with fMRI.

·
6 months ago
·
[ - ]

memhole
·
6 months ago
·
[ - ]

Would love to see distill come back

ziofill
·
6 months ago
·
[ - ]

It's very sad distill.pub doesn't accept new submissions... :/

esafak
·
6 months ago
·
[ - ]

(2021)

Executor
·
6 months ago
·
[ - ]

What interactive visualization software is that - D3.js?

hi_hi
·
6 months ago
·
[ - ]

I feel very dumb. There is an example on that page with 4 nodes (a,b,c,d) and it shows a total of 24 possible combinations.

What is the generalised formula for calculating this, given the number of nodes but also edges need to be considered.

It doesn't appear to be explained in the article. I think it may be a factorial?

TrueDuality
·
6 months ago
·
[ - ]

Combinatorial can be quickly calculated primarily using factorials. 4 possible options, each where you're picking all 4 exactly once is 4!. The reasoning is pretty intuitive, when you start selecting there are four options, when you go to pick the next one there are 3 left in the pool, then 2, and finally 1. This turns into 4 * 3 * 2 * 1 = 24.

This site seems to have a pretty good overview of them if you'd like to become more familiar: https://www.geeksforgeeks.org/mathematics-combinatorics-basi...

throwameme
·
6 months ago
·
[ - ]

i think one could use a binomial coefficient to do this or nested binomials

like (n choose 4)

maybe multiply the binomial by 2 because each edge can be present or absence in vertices

eachro
·
6 months ago
·
[ - ]

Is there consensus about whether gnn architectures are better than transformer based ones at this point? I am aware that transformers can be viewed as a gnn too.

Legend2440
·
6 months ago
·
[ - ]

GNNs let you build some of the structure of your problem into the network architecture. This can be useful if your problem has a well-understood structure, like physical simulations.

However in many cases we do not know the structure of our problem (that's why we want to use ML in the first place) and in these cases GNNs do not beat transformers.

leah_sun
·
6 months ago
·
[ - ]

good share

·
6 months ago
·
[ - ]