Jupyter Notebooks as E2E Tests

(rakhim.exotext.com)

81 points | by freetonik 7 months ago ago

39 comments

  • tpoacher 7 months ago

    I really don't get (jupyter) notebooks. It always feels like an over-engineered solution to a simple problem. The only advantage they have over 'proper' code generating a 'proper' report is being able to modify code directly on the report (i.e. as opposed to having to change the code in a separate file instead). Which sounds good and all, but this is rarely what I see notebooks actually used for. For the most part, people use them as an IDE, and a notebook is a terrible choice of IDE compared to even something as simple as editing in nano. And the disadvantages of using one generally completely outweigh the benefit of live-editing in the first place, and encourage all sorts of bad programming habits / hacky code structure.

    Whenever I've required a report which intermingles code, text, and outputs, a simple bash-script processing my (literate programming) codebase has always done a far more impressive job of generating a useful report. And, without having to sacrifice proper packaging / code-organisation semantics just for the sake of a report.

    I find it a big shame that the iodide / pyodide notebooks didn't take off in the same way, at least. Those seemed tightly integrated to html in a way that was very elegant.

    (they're not completely gone though, it was nice to see this example on the front page earlier: https://news.ycombinator.com/item?id=42425489 )

    • cbondurant 7 months ago

      The main place that I find notebooks useful is for when I want to hack away at some decently sized dataset and poke around at its details.

      A lot of transformations can be very time consuming to run, and the ability to cache(for lack of a better word) the computation without having to write to disk, or utilize the python repl (which is very obnoxious to use for anything that extends past a single line of code) really speeds the process up.

      An example would be: pull a large json from a server. Then break out a new code block to do all of your different manipulations on that json. Lets you prototype around with the object you pulled from the server without having to worry about, among other things - Dealing with how slow reading it from server is - writing it to disk and then reading it from disk to deal with how slow that is - the latency of parsing the file on each run of your script - the list goes on.

      These arn't things that you want to worry about when your current questions are "what does the structure of this look like" and "what are some of the basic statistical properties of this data". Notebooks are like the python repl with the benefits of having a proper multi-line text input.

      I've never even heard of people using notebooks for report generation, and honestly I'd agree that sounds like a complete nightmare.

    • bwanab 7 months ago

      As a software developer, I completely agree.

      As a person who's often tasked with building reports for presentation, and more often, for passing on to analysts to expand on, I find that notebooks give a much more accessible workflow than other options. Excel is way too limiting, most of the downstream analysts aren't software devs and don't have, know, or want to know an IDE. Notebooks give a way to combine graphs, text and code (show your work) in a concise form.

    • d0mine 7 months ago

      Do you get REPL? Do you get debugger? Notebooks are between REPL and "proper code" on this spectrum. They can be used even when nobody else sees the results. They are useful for experimentation, to figure out things you don't know you don't know as an extension of your working memory. They are less temporary and disposable as REPL but less reusable than proper code. Another example is Org Babel in Emacs.

      Jupyter notebooks in the article are used for "learning-oriented tutorial" or "goal-oriented howtos" docs. You see explanation, code, results, and it is easy to try it. There are solutions with a single click that can get you the running editable copy.

      Misleading docs may be worse than no docs, so using notebooks as executable documentation is a plus. Though it is not the best format for e2e tests in general. Tests may be too complex for that. Software (proper code) is better at handling complexity in the general case.

    • sbrother 7 months ago

      IMO notebooks are hugely overused and don't have any place in the development of software.

      But they're fantastic for what they were designed for -- which is quite literally "notebooks". AFAIK the idea was first popularized by Mathematica, and I still reach for that when I have some highly iterative, undefined math/data problem to sketch on. IMO the real issue is that Python is used both for this purpose and for software development, which leads to people using notebooks inappropriately.

    • parpfish 7 months ago

      i mostly agree, but one way that i've found to make notebooks good/usable is this:

      - move all of the code out of the notebook and into a nearby python module where it can be linted/mypy-d/version-controlled/code-reviewed easily

      - tell the notebook to import that module and make a small number of function calls to get whatever data you need and make plots. at this point the notebook is really nothing more than a REPL with inlined plots/graphics

      • youainti 7 months ago

        This is what I've found works the best as well.

    • dr_kiszonka 7 months ago

      1) It sounds like you are maybe not in the target group, which largely comprises data scientists and the like.

      2) Reports are one of many reasons people use notebooks.

      3) You can work with notebooks in proper IDEs like PyCharm or VS Code or Emacs.

      Having said that, I do agree with you that notebooks enable poor programming habits. (I have seen quite a few notebooks with > 10k lines of code, which is insanity.) Among other things, diffing notebooks is typically a big pain too. It is just that for many folks, notebooks' pros outweigh their cons.

    • perrygeo 7 months ago

      Notebooks are not about producing a report. I've worked with hundreds of notebooks and very few of them were intended as polished documents.

      Notebooks are literally for notes, for exploring new ideas - not creating production artifacts from existing ideas. You get a stateful kernel that can incrementally build up state instead of re-executing. And you get visual artifacts and user interfaces inline. The value proposition is faster iteration and immediate feedback, not report writing.

    • cyanydeez 7 months ago

      they work for me because of the self documenting interface for reporting. Then theres the built in project workspace that makes it clear which files are doing what. I can run them on a commandline as scripts, and build libraries or even us other notebooks as libraries.

      This workflow works because my primary job is no developing software. It's developing solutions to solve multimodal industrial and science reporting issues, and as needs change, having a complicated stepping as just a series of code snippets is wonderful.

      Sure, text is cleaner but I treat much of my work as "living documents" so notebooks do that wonderfully.

    • SubjectToChange 7 months ago

      Mathematica notebooks give a glimpse of what Jupyter could be. But I have doubts whether or not Python could ever have something equivalent.

    • 7 months ago
      [deleted]
  • miohtama 7 months ago

    Notebooks are a wonderful tool, especially now when Visual Studio Code has superb editor for them. Before this, using web based Jupyter was a bit pain. I use PyCharm for .py files, which is still a bit broken for notebooks, but always go to VSC for notebooks.

    If someone needs here is an some sample code to run notebooks programically, and tune the output and formatting:

    https://github.com/tradingstrategy-ai/trade-executor/blob/ma...

    • qsort 7 months ago

      What's broken about them in pycharm? I've been using pycharm as my primary IDE for a while now and I haven't ever experienced major problems, this includes working on projects where other people were using VSC.

  • CJefferson 7 months ago

    I've used this in a few systems, using (in my case) nbconvert.

    As I write more code, I increasingly find the most important thing about tests early on is that they are easy to write and maintain. The help that, I find one of the best 'quick test suites' is "run program, save output, run 'git diff' to see if anything changes".

    This has several advantages. If you have lots of small programs it's trivial to parallelise. It's easy to see what outputs have changed. It's very easy to write weird one-off special tests that need to do something a bit unusual.

    Yes, eventually you will probably want some nicer test framework, but even then I often keep this framework around, as there will still often be a few tests that don't fit nicely in whatever fancy testing library I'm trying to use (for example, checking a program's front end produces correct error messages when given invalid input).

  • abdullahkhalids 7 months ago

    You can just use nbclient/nbconvert as a library instead of on cmd. Execute the notebook, and the cell outputs are directly available as python objects [1]. Error handling is also inbuilt.

    This should make integration with pytest etc, much simpler.

    [1] https://nbconvert.readthedocs.io/en/latest/execute_api.html#...

  • wodenokoto 7 months ago

    It looks like the final solution will have to run notebooks twice - once to check for errors and once more to render to documentation.

    The concept of running code examples inside documentation as a part of tests is well known, and extending it to end-to-end tests / user guides is a good idea.

    Next step might be to add hidden code cells with asserts, to check that the code not only runs, but creates the expected output.

    • freetonik 7 months ago

      The documentation is rendered without running the notebook in my case, because we store notebooks with all the outputs, so the doc builder (Sphinx) just converts them to HTML as is.

      • wodenokoto 7 months ago

        To me, the reasonable path would be to have the CI built the notebook, check it for errors and then render it to the docs.

        As per the article, they have to run the notebook locally, commit all the outputs, then the CI checks if the notebook can run, and then renders the _committed_ output to the docs. This means that the verified code and output can be out of sync, partly defeating the purpose of the CI.

    • foundart 7 months ago

      Having the user guides be notebooks is a very interesting idea and the step from there to e2e test as the author describes seems like a very logical one.

      • foundart 7 months ago

        The short article is definitely worth a read, as is the Netflix post it links to

  • batmansmk 7 months ago

    Maintaining e2e tests is a pain. Maintaining a notebook is a pain. It seems it was a given somebody would make this match made in heaven!

    • vvladymyrov 7 months ago

      I can suggest improvement - combining long running e2e tests with notebooks is even better match

  • zurfer 7 months ago

    for anybody who wants to schedule or automatically run jupyter notebooks I recommend also looking into papermill: https://papermill.readthedocs.io/en/latest/

    • rmholt 7 months ago

      Agreed! I use it and it's a breath of fresh air, I have to use Jupyter because of colleagues but I really prefer python scripts, and this let's me kind of run a Jupyter notebook as if it were a script, even with cli flags

  • remram 7 months ago

    Ever since Marimo launched, I haven't looked back at Jupyter: https://marimo.io/ https://news.ycombinator.com/item?id=38971966

    On top of their reactive goodness, Marimo notebooks are .py files, which makes it very suitable for this kind of (ab)use.

  • freetonik 7 months ago

    I’m confused. I’ve submitted this link almost two days ago, but now it says “3 hours ago”, as if the timestamp was modified.

    • abdullahkhalids 7 months ago

      This is intended HN behavior. Someone else submitted it again today and due to the closeness of times (within a few days), a new item is not created but the old item is boosted.

      • freetonik 7 months ago

        Ah, thanks for clarifying, did not know this. I thought when someone submits an existing link it either has no effect (and the submitter is redirected to the old post), or a new post is created if enough time had passed since the original submission.

        • bobnamob 7 months ago

          There’s also a chance it was placed on the second chance queue and reposted “automatically”

  • taeric 7 months ago

    I find the gigantic swing of a lot of software developers from the pure separation of data and styling into notebooks rather amusing.

    In particular, there is very little that a "notebook" style environment can get you that you couldn't have gotten as output from any previous testing regime. Styling test results used to be a thing people spent a fair amount of time on so that test reports were pretty and informative. Reports on aggregate results could show trends in how they have been executing in the past.

    Now, I grant that this article is subtly different. In particular, the notebooks are an artifact that they are testing anyway. So, having more reliance on that technology may itself be a good end goal. I still have a hard time shaking that notebooks are being treated as a hammer looking for nails by so many teams.

  • pplonski86 7 months ago

    Mixing code, markdown and execution results in one output file gives Jupyter notebooks superpower. You can really have anything in the ipynb file, stored in JSON. I wish there was two types of ipynb files, one for file with just code and markdown (for example ipynbc), and one for keeping code+markdown+results.

    BTW, sometime ago I wrote an article about surprising things that you can build with Jupyter notebook https://mljar.com/blog/how-to-use-jupyter-notebook/ You will find in the list: blog, book, website, web app, dashboards, REST API, even full packages :)

    • armanboyaci 7 months ago

      > I wish there was two types of ipynb files, one for file with just code and markdown (for example ipynbc), and one for keeping code+markdown+results.

      I believe you can achieve that if you use jupytext library, right?

  • udioron 7 months ago

    Very cool! One can write a pytest plugin that executes notebooks from a folder with custom pytest fixtures support.

  • mscolnick 7 months ago

    this sort of hackery to work with jupyter notebooks is exactly why we are building marimo (https://github.com/marimo-team/marimo). this should not be the standard

  • arminiusreturns 7 months ago

    I prefer emacs org-mode babel.