In practice, every such mesh/graph is used once to solve a particular problem. Hence it makes little sense to train a GNN for a specific graph. However, that's exactly what most papers did because no one found a way to make a GNN that can adjust well to a different mesh/graph and different simulation parameters. I wonder if there's a breakthrough waiting just around the corner to make such a generalization possible.
Words in sentences kinda forms graphs, referencing other words or are leafs being referenced, both inside sentences and between sentences.
Given the success of the attention mechanism in modern LLMs, how well would they do if you trained a LLM to process an actual graph?
I guess you'd need some alternate tokenizer for optimal performance.
Imagine you discretize a cube into 1000 gridpoints in each direction, that's 1000^3 = 1 billion nodes/"tokens". Plus you typically time-march some sort of equation so you need the solutions previous 3-5 timesteps as well so that's 3-5 billion tokens. If you are gonna do that in the first place, you may as well just use the traditional solver. Traditional solvers usually set up and solve a matrix equation like Ax=b with an iterative method like multigrid which is O(n) as opposed to transformer's O(n^2). It'll give you a much more accurate answer much quicker than it'll take a transformer to do attention on a sequence of length 3 billion.
The entire point of using GNNs/CNNs in this field is that people rely on their ability to make inference using local information. That means the value at each gridpoint/node can be inferred from neighbouring nodes only, which is O(n) like multigrid. Idea in most papers is that the GNN can do this faster than multigrid. Results so far are mixed, however [1].
There are also graph tokenizers for using more standard transformers on graphs for doing things like classification, generation, and community detection.
For (a), any imperfections in the graphification make the problem super hard and researchy.
On GNN's, the lack of datasets [2] might be a reason they are not as talked about. This is something that has affected also the semantic web domain.
[1] https://distill.pub/2021/distill-hiatus/
[2] https://huggingface.co/datasets?task_categories=task_categor...
The power of YouTube is that for any hot area, there are a bunch of people incentivized to make a short video that maximally engages. The quality can be quite high even at higher levels of math abstractions. The visuals are really helpful to get a feel for the abstractions (3 blue 1 brown proved this years ago).
There are some excellent videos of GNNS that in less than 10 mins gave me a launching point into the literature.
For a long time GNNs were pitched as a generalization of CNNs. But CNNs are more powerful because the "adjacency weights" (so to speak) are more meaningful: they learn relative positional relationships. GNNs usually resort to pooling, like described here. And you can output an image with a CNN. Good luck getting a GNN to output a graph. Topology still has to be decided up front, sometimes even during training. And the nail in the coffin is performance. It is incredible how slow GNNs are compared to CNNs.
These days I feel like attention has kinda eclipsed GNNs for a lot of those reasons. You can make GNNs that use attention instead of pooling, but there isn't much point. The graph is usually only traversed in order to create the mask matrix (ie attend between nth neighbors) and otherwise you are using a regular old transformer. Often you don't even need the graph adjacencies because some kind of distance metric is already available.
I'm sure GNNs are extremely useful to someone somewhere but my experience has been a hammer looking for a nail.
In almost every other case, you can exploit additional structure to be more efficient (can you define an order? sequence model. is it euclidean/riemanian? CNN or manifold aware models. no need to have global state? pointcloud networks. you have an explicit hierarchy? Unet version of your underlying modality. etc)
The reason why I find GNNs cool is that 1) they encode the very notion of _relations_ and 2) they have a very nice relationship to completely general discretized differential equations, which as a complex systems/dynamical systems guy is cool (but if you can specialize, there's again easier ways)
For the reasons you're saying, I don't think it's an accident that GNNs are popular mostly in domains like recommendations that feel graph-y for their domain model so getting to a useful topology isn't as big a leap.
A frustration for me has been more that many of these graph-y domains are about behavioral machine/people data like logs that contain a large amount of categorical dimensions. The graph part can help, but it is just as import to key on the categorical dimensions, and doing well there often end up outside of the model - random forest etc. It's easier to start with those, and then is a lot of work for the GNN part (though we & others have been trying to simplify) for "a bit more lift".
Of course, if this is your core business and this means many millions of dollars, it can be justified... but still, it's hard for most production teams. In practice, we often just do something with pygraphistry users like xgboost + umap and move on. Even getting an RGCN to perform well takes work..
(Partially) Google Research's/DeepMind's NeuralGCM is based on hybrid models using ODEs and learnt physics: https://www.nature.com/articles/s41586-024-07744-y
Microsoft Research's Aurora on vision transformers: https://www.microsoft.com/en-us/research/blog/introducing-au...
Huawei's Pangu Weather is also a 3D transformer I believe https://www.nature.com/articles/s41586-024-07744-y
I just wanted to highlight that there are multiple approaches in use for the same problem / in the same domain, and GNN does not seem to be the most widely used one.
Any progress on this front ?
What is the generalised formula for calculating this, given the number of nodes but also edges need to be considered.
It doesn't appear to be explained in the article. I think it may be a factorial?
like (n choose 4)
maybe multiply the binomial by 2 because each edge can be present or absence in vertices
However in many cases we do not know the structure of our problem (that's why we want to use ML in the first place) and in these cases GNNs do not beat transformers.