This seems not necessarily very hard to me? All you have to do is keep yourself honest by actually trying to reproduce the results of the notebook when you're done:
1. Copy the notebook
2. Run from first cell in the copy
3. Check that the results are the same
4. If not the same, debug and repeat
What makes it hard is when the feedback loop is slow because the data is big. But not all data is big!
Another thing that might make it hard is if your execution is so chaotic that debugging is impossible because what you did and what you think you did bear no resemblance. But personally I wouldn't define rising above that state as incredible discipline. For people who suffer from that issue, I think the best help would be a command history similar to that provided by RStudio.
All that said, Marimo seems great and I agree notebooks are dangerous if their results are trusted equally as fully explicit processing pipelines.
I would love to see DAGs like in SSA form of compilers, that also supports loop operators. However, IMHO also the notebook interface needs to adjust for that (cell indentation ?). However, the strength of notebooks rather shows in document authoring like quarto, which IMHO mostly contradicts more complex controll flow.