Optimizing Ruby's JSON, Part 1

(byroot.github.io)

173 points | by todsacerdoti 18 hours ago ago

35 comments

  • izietto 6 hours ago

    First, if the author is going to read this, let me thank you for your work. As a Rails developer, I find the premises very relatable.

    Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case. This forces me to perform tedious and often disliked transformations between these conventions in my Rails projects, requiring remapping for every JSON object. This process is both annoying and potentially performance-intensive. What alternative approaches exist, and are there ways to improve the performance of these transformations?

    • rajaravivarma_r 6 hours ago

      I don't have a solution for the performance problem. But for the camelCase to snake_case conversion, I can see potential solutions.

      1. If you are using axios or other fetch based library, then you can use an interceptor that converts the camelCase JavaScript objects to 'snake_case' for request and vice versa for response.

      2. If you want to control that on the app side, then you can use a helper method in ApplicationController, say `json_params`, that returns the JSON object with snake_case keys. Similarly wrap the `render json: json_object` into a helper method like `render_camel_case_json_response` and use that in all the controllers. You can write a custom Rubocop to make this behaviour consistent.

      3. Handle the case transformation in a Rack middleware. This way you don't have to enforce developers to use those helper methods.

      • thiago_fm 5 hours ago

        I believe his point is that this transformation could be done maybe in C and therefore have better performance, it could be a flag to the JSON conversion.

        I find the idea good, maybe it even already exists?

        • byroot 4 hours ago

          It could be done relatively efficiently in C indeed, but it would be yet another option, imposing et another conditional, and as I mention in the post (and will keep hammering in the followups) conditions is something you want to avoid for performance.

          IMO that's the sort of conversion that would be better handled by the "presentation" layer (as in ActiveModel::Serializers and al).

          In these gems you usually define something like:

              class UserSerializer < AMS::Serializer
                attributes :first_name, :email
              end
          
          It wouldn't be hard for these libraries to apply a transformation on the attribute name at almost zero cost.
    • ysavir 2 hours ago

      > Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case.

      Most APIs I've come across use snake_case for their keys in JSON requests and responses. I rarely come across camelCase in JSON keys. So I'm happy to just write snake_case keys and let my backend stay simple and easy, and let the API consumer handle any transformations.

      I use the same approach another comment points out, using Axios transformers to convert back and forth as necessary.

    • SkyPuncher 3 hours ago

      I love Rails, but if I could go back in time and tell them to avoid one thing it would be their strict adherence to naming conventions.

      I've spent more time in my career debugging the magic than I would have by simply defining explicit references.

      • caseyohara 2 hours ago

        > if I could go back in time and tell them to avoid one thing it would be their strict adherence to naming conventions

        Monkey paw curls. Rails probably wouldnā€™t have reached popularity were it not for the strict adherence to naming conventions. The Rails value prop is productivity and the ethos of ā€œconvention over configurationā€ is what makes that possible.

        ā€œConvention over configurationā€ was coined by DHH himself. https://en.wikipedia.org/wiki/Convention_over_configuration Without it, you donā€™t have Rails.

      • jkmcf 2 hours ago

        I'm ok with most of the naming conventions, but the pluralization is one I loathe. The necessity of custom inflections should have been a strong smell, IMO.

        • regularfry an hour ago

          Semantically it makes sense. It's not Rails' fault the English language is a terrible serialisation format.

          • dmurray an hour ago

            But it can be Rails' fault for choosing to default to a terrible serialisation format.

            • ysavir 40 minutes ago

              What would you propose as an alternative?

    • BurningFrog 28 minutes ago

      Sounds like something you could fix once and for all with some metaprogramming converting between the two case cases?

    • xcskier56 4 hours ago

      We use a gem called olive branch. Yes itā€™s going to give you a performance hit, but it keeps you sane which is very worthwhile

      • izietto 4 hours ago

        This one? https://github.com/vigetlabs/olive_branch Looks interesting, unfortunately its latest update is from 3 years ago

        • sensanaty 4 hours ago

          If you look at the source [1], you'll see what it's doing is very simple (the file I linked is basically the whole library, everything else is Gem-specific things and tests). You can even skip the gem and implement it yourself, not a big dependency at all, so no need for constant maintenance in this case :p

          [1] https://github.com/vigetlabs/olive_branch/blob/main/lib/oliv...

        • werdnapk 4 hours ago

          I don't understand the issue with it not being updated for 3 years. Perhaps it's stable and requires no updates?

          If the author says it's no longer maintained, then that's something different.

    • revskill 4 hours ago

      Me too. And even crystal language has the same issue.

  • semiquaver an hour ago
  • meisel an hour ago

    Very fun read! Iā€™m curious though, when it comes to non-Ruby-specific optimizations, like the lookup table for escape characters, why not instead leverage an existing library like simdjson thatā€™s already doing this sort of thing?

    • byroot an hour ago

      I somewhat answered that in https://news.ycombinator.com/item?id=42450085

      In short, since `ruby/json` ships with Ruby, it has to be compatible with its constraints, which today means plain c99, and no c++. There would also probably be a licensing issue with simdjson (Apache 2), but not sure.

      Overall there's a bunch of really nice c++ libraries I'd love to use, like dragonbox, but just can't.

      Another thing is that last time I checked, simdjson only provided a parser, the ruby/json gem does both parsing and encoding so it would only help on half the problem space.

  • hahahacorn 8 hours ago

    Great read & great work from the author, is there any reason to use Oj going forward?

    • byroot 8 hours ago

      Author here.

      Oj has an extremely large API that I have no intent on emulating in the default json gem, things such as "SAJ" (SAX style parsing), various escaping schemes etc.

      My goal is only to make it unnecessary for the 95% or so use case, so yes, Oj will remain useful to some people for a bunch of uses cases.

      • onli 4 hours ago

        Sax style parsing is a godsend when dealing with large files, regardless of json or xml. It's indeed what made me switch to a different json library in a Ruby project of mine (I'd have to look it up, but probably to oj).

  • mfkp 7 hours ago

    Love the write-up on this topic, very easy to follow and makes me want to benchmark and optimize some of my ruby code now. Thanks for putting in the effort and also writing the post, byroot!

  • thiago_fm 8 hours ago

    I love byroot's work. I'm always surpised not only by the kind of contributions he does but the sheer size of how much he does, insane productivity.

    Wish he would write more often, I've tried to get into ruby-core type of work more than once, but never found something that matched my skills so I could positively contribute and after a few weeks of no results the motivation would wear off, as it's really difficult to have the context he has shared in the article, for example.

    If more Ruby C people would write more often, I bet there'd be more people with the skills that are needed to improve Ruby further.

    The C profiler advice was great. Maybe I could just get a Ruby gem with C code and start playing again on optimizations :-)

    • benoittgt 6 hours ago

      There is this great serie of Peter Zhu too. https://blog.peterzhu.ca/ruby-c-ext/ Even if it's C extension, it helps understanding some concepts.

      But I agree with you.

      • thiago_fm 5 hours ago

        That's awesome, I personally wasn't aware of that series from Peter Zhu. Thanks!

    • wkjagt 5 hours ago

      > insane productivity

      He's insanely productive, but also insanely smart. I used to work in the same office as him at Shopify, and he's the kind of person whose level just seems unattainable.

      • richardlblair 2 hours ago

        Fully agree. He's also really patient and kind. He's the type of person who's really really smart and doesn't make you feel really really dumb. He takes the time to thoroughly explain things, without being condescending. I always enjoyed when our paths crossed because I knew I was going to learn something.

  • rfl890 3 hours ago

    IIRC the branch predictor hint is useless on modern CPUs

  • bornelsewhere 7 hours ago

    Does ruby json use intrinsics? Could it?

    Also, how does this play with the various JITs?

    • byroot 6 hours ago

      Not too sure what you mean by intrinsincs.

      The `json` gem is implemented in C, so it's a black box for YJIT (the reference implementation's JIT).

      The TruffleRuby JIT used to interpret C extensions with sulong so it could JIT across languages barrier, but AFAIK they recently stopped doing that because of various compatibility issues.

      Also on TruffleRuby the JSON parser is implemented in C, but the encoder is in pure Ruby [0]

      [0] https://github.com/ruby/json/blob/e1f6456499d497f33f69ae4c1a...

      • bornelsewhere 6 hours ago

        Thanks!

        Sorry about misuse of ā€œintrinsicsā€. There is a simdjson library that uses SIMD instructions for speed. Would such an approach be feasible in the ruby json library?

        • byroot 5 hours ago

          Ah I see.

          TL;DR; it's possible, but lots of work, and not that huge of a gain in the context of a Ruby JSON parser.

          `ruby/json` doesn't use explicit SIMD instructions, some routines are written in a way that somewhat expects compilers to be able to auto-vectorize, but it's never a given.

          In theory using SIMD would be possible as proven by SIMDjson, but it's very (edit) UNlikely we'll do it because of multiple reasons.

          First for portability, we have to stick with raw C99, no C++ allowed, so that prevent using SIMDjson outright.

          In theory, we could implement the same sort of logic with support for various processors that have various level of SIMD support and have runtime dispatch for it would be terribly tedious. So it's not a reasonable amount of complexity for the amount of time I and other people are willing to spend on the library.

          Then there's the fact that it wouldn't do as big as a difference as you'd think. I do happen to have made some bindings for simdjson in https://github.com/Shopify/heap-profiler, because I had an use case for parsing gigabytes of JSON, and it helps quite a bit there.

          But I'll hopefully touch on that in a future blog post, the actual JSON parsing part is entirely dwarfed by the work needed to build the resulting Ruby objects tree.

          • thiago_fm 4 hours ago

            Curious about the next post.

            My naive/clueless mind always wonders if it wouldn't make sense to make a new class of Ruby objects that are much simpler and would yield both less memory consumption and GC optimizations that could be used for such cases.

            Without a different object model it's hard to imagine optimizations that could greatly improve Ruby execution speed for CRuby, or make the GC much faster (huge issue for big applications), but maybe it's because I don't know much :-)