Representing Python notebooks as dataflow graphs

62
10
akshayka
3 days ago
marimo.io

tastyminerals2
·
19 minutes ago
·
[ - ]

Personally, I had good experience with marimos so far. Reactive execution, variable deduplication, clear business logic vs UI elements logic separation that is forced on you is good. It retrains ppl to write slightly better structured Python code which is a win in my eyes.

getnormality
·
1 hour ago
·
[ - ]

> You have to be very disciplined to make a Jupyter notebook that is actually reproducible

This seems not necessarily very hard to me? All you have to do is keep yourself honest by actually trying to reproduce the results of the notebook when you're done:

1. Copy the notebook

2. Run from first cell in the copy

3. Check that the results are the same

4. If not the same, debug and repeat

What makes it hard is when the feedback loop is slow because the data is big. But not all data is big!

Another thing that might make it hard is if your execution is so chaotic that debugging is impossible because what you did and what you think you did bear no resemblance. But personally I wouldn't define rising above that state as incredible discipline. For people who suffer from that issue, I think the best help would be a command history similar to that provided by RStudio.

All that said, Marimo seems great and I agree notebooks are dangerous if their results are trusted equally as fully explicit processing pipelines.

tastyminerals2
·
26 minutes ago
·
[ - ]

Not very hard to you, however the reproducibility numbers tell a different story. Back in the days, when we were searching for some ML model implementations in the public repos and found ipynb files in it, we skipped the repo without delving into details. Within the company data engineer research notebooks were never allowed inside a repo. Experiment, yes, but rewrite it in plain python and push.

getnormality
·
13 minutes ago
·
[ - ]

A lot of people don't put away shopping carts, but the conclusion from that isn't that putting shopping carts away requires very high discipline. (Maybe if what is meant by "very high" is "not so low that everyone will do it", which is perhaps the point)

nylonstrung
·
1 hour ago
·
[ - ]

Marimo seems really solid if you like tools like Streamlit or Observable

riedel
·
5 hours ago
·
[ - ]

Even with data flow extension (also like ipyflow [0] ) I am still struggling with the execution model of notebooks in general. I often still see people defining functions and classes in notebooks to somehow handle prototyping loops.

I would love to see DAGs like in SSA form of compilers, that also supports loop operators. However, IMHO also the notebook interface needs to adjust for that (cell indentation ?). However, the strength of notebooks rather shows in document authoring like quarto, which IMHO mostly contradicts more complex controll flow.

[0] https://github.com/ipyflow/ipyflow

PeterStuer
·
2 hours ago
·
[ - ]

Would you not need "volatile" markup for anything touching a python external system?

probablypower
·
5 hours ago
·
[ - ]

This is well written and an interesting read but, embedding notebooks into your data pipelines smells horrible.

iamwil
·
41 minutes ago
·
[ - ]

why does it smell horrible?

nojito
·
3 hours ago
·
[ - ]

Marimo is just a .py file.