First, if the author is going to read this, let me thank you for your work. As a Rails developer, I find the premises very relatable.
Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case. This forces me to perform tedious and often disliked transformations between these conventions in my Rails projects, requiring remapping for every JSON object. This process is both annoying and potentially performance-intensive. What alternative approaches exist, and are there ways to improve the performance of these transformations?
I don't have a solution for the performance problem. But for the camelCase to snake_case conversion, I can see potential solutions.
1. If you are using axios or other fetch based library, then you can use an interceptor that converts the camelCase JavaScript objects to 'snake_case' for request and vice versa for response.
2. If you want to control that on the app side, then you can use a helper method in ApplicationController, say `json_params`, that returns the JSON object with snake_case keys. Similarly wrap the `render json: json_object` into a helper method like `render_camel_case_json_response` and use that in all the controllers. You can write a custom Rubocop to make this behaviour consistent.
3. Handle the case transformation in a Rack middleware. This way you don't have to enforce developers to use those helper methods.
I believe his point is that this transformation could be done maybe in C and therefore have better performance, it could be a flag to the JSON conversion.
I find the idea good, maybe it even already exists?
It could be done relatively efficiently in C indeed, but it would be yet another option, imposing et another conditional, and as I mention in the post (and will keep hammering in the followups) conditions is something you want to avoid for performance.
IMO that's the sort of conversion that would be better handled by the "presentation" layer (as in ActiveModel::Serializers and al).
In these gems you usually define something like:
class UserSerializer < AMS::Serializer
attributes :first_name, :email
end
It wouldn't be hard for these libraries to apply a transformation on the attribute name at almost zero cost.
> Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case.
Most APIs I've come across use snake_case for their keys in JSON requests and responses. I rarely come across camelCase in JSON keys. So I'm happy to just write snake_case keys and let my backend stay simple and easy, and let the API consumer handle any transformations.
I use the same approach another comment points out, using Axios transformers to convert back and forth as necessary.
> if I could go back in time and tell them to avoid one thing it would be their strict adherence to naming conventions
Monkey paw curls. Rails probably wouldnāt have reached popularity were it not for the strict adherence to naming conventions. The Rails value prop is productivity and the ethos of āconvention over configurationā is what makes that possible.
I'm ok with most of the naming conventions, but the pluralization is one I loathe. The necessity of custom inflections should have been a strong smell, IMO.
If you look at the source [1], you'll see what it's doing is very simple (the file I linked is basically the whole library, everything else is Gem-specific things and tests). You can even skip the gem and implement it yourself, not a big dependency at all, so no need for constant maintenance in this case :p
Very fun read! Iām curious though, when it comes to non-Ruby-specific optimizations, like the lookup table for escape characters, why not instead leverage an existing library like simdjson thatās already doing this sort of thing?
In short, since `ruby/json` ships with Ruby, it has to be compatible with its constraints, which today means plain c99, and no c++. There would also probably be a licensing issue with simdjson (Apache 2), but not sure.
Overall there's a bunch of really nice c++ libraries I'd love to use, like dragonbox, but just can't.
Another thing is that last time I checked, simdjson only provided a parser, the ruby/json gem does both parsing and encoding so it would only help on half the problem space.
Oj has an extremely large API that I have no intent on emulating in the default json gem, things such as "SAJ" (SAX style parsing), various escaping schemes etc.
My goal is only to make it unnecessary for the 95% or so use case, so yes, Oj will remain useful to some people for a bunch of uses cases.
Sax style parsing is a godsend when dealing with large files, regardless of json or xml. It's indeed what made me switch to a different json library in a Ruby project of mine (I'd have to look it up, but probably to oj).
Love the write-up on this topic, very easy to follow and makes me want to benchmark and optimize some of my ruby code now. Thanks for putting in the effort and also writing the post, byroot!
I love byroot's work. I'm always surpised not only by the kind of contributions he does but the sheer size of how much he does, insane productivity.
Wish he would write more often, I've tried to get into ruby-core type of work more than once, but never found something that matched my skills so I could positively contribute and after a few weeks of no results the motivation would wear off, as it's really difficult to have the context he has shared in the article, for example.
If more Ruby C people would write more often, I bet there'd be more people with the skills that are needed to improve Ruby further.
The C profiler advice was great. Maybe I could just get a Ruby gem with C code and start playing again on optimizations :-)
He's insanely productive, but also insanely smart. I used to work in the same office as him at Shopify, and he's the kind of person whose level just seems unattainable.
Fully agree. He's also really patient and kind. He's the type of person who's really really smart and doesn't make you feel really really dumb. He takes the time to thoroughly explain things, without being condescending. I always enjoyed when our paths crossed because I knew I was going to learn something.
The `json` gem is implemented in C, so it's a black box for YJIT (the reference implementation's JIT).
The TruffleRuby JIT used to interpret C extensions with sulong so it could JIT across languages barrier, but AFAIK they recently stopped doing that because of various compatibility issues.
Also on TruffleRuby the JSON parser is implemented in C, but the encoder is in pure Ruby [0]
Sorry about misuse of āintrinsicsā. There is a simdjson library that uses SIMD instructions for speed. Would such an approach be feasible in the ruby json library?
TL;DR; it's possible, but lots of work, and not that huge of a gain in the context of a Ruby JSON parser.
`ruby/json` doesn't use explicit SIMD instructions, some routines are written in a way that somewhat expects compilers to be able to auto-vectorize, but it's never a given.
In theory using SIMD would be possible as proven by SIMDjson, but it's very (edit) UNlikely we'll do it because of multiple reasons.
First for portability, we have to stick with raw C99, no C++ allowed, so that prevent using SIMDjson outright.
In theory, we could implement the same sort of logic with support for various processors that have various level of SIMD support and have runtime dispatch for it would be terribly tedious. So it's not a reasonable amount of complexity for the amount of time I and other people are willing to spend on the library.
Then there's the fact that it wouldn't do as big as a difference as you'd think. I do happen to have made some bindings for simdjson in https://github.com/Shopify/heap-profiler, because I had an use case for parsing gigabytes of JSON, and it helps quite a bit there.
But I'll hopefully touch on that in a future blog post, the actual JSON parsing part is entirely dwarfed by the work needed to build the resulting Ruby objects tree.
My naive/clueless mind always wonders if it wouldn't make sense to make a new class of Ruby objects that are much simpler and would yield both less memory consumption and GC optimizations that could be used for such cases.
Without a different object model it's hard to imagine optimizations that could greatly improve Ruby execution speed for CRuby, or make the GC much faster (huge issue for big applications), but maybe it's because I don't know much :-)
First, if the author is going to read this, let me thank you for your work. As a Rails developer, I find the premises very relatable.
Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case. This forces me to perform tedious and often disliked transformations between these conventions in my Rails projects, requiring remapping for every JSON object. This process is both annoying and potentially performance-intensive. What alternative approaches exist, and are there ways to improve the performance of these transformations?
I don't have a solution for the performance problem. But for the camelCase to snake_case conversion, I can see potential solutions.
1. If you are using axios or other fetch based library, then you can use an interceptor that converts the camelCase JavaScript objects to 'snake_case' for request and vice versa for response.
2. If you want to control that on the app side, then you can use a helper method in ApplicationController, say `json_params`, that returns the JSON object with snake_case keys. Similarly wrap the `render json: json_object` into a helper method like `render_camel_case_json_response` and use that in all the controllers. You can write a custom Rubocop to make this behaviour consistent.
3. Handle the case transformation in a Rack middleware. This way you don't have to enforce developers to use those helper methods.
I believe his point is that this transformation could be done maybe in C and therefore have better performance, it could be a flag to the JSON conversion.
I find the idea good, maybe it even already exists?
It could be done relatively efficiently in C indeed, but it would be yet another option, imposing et another conditional, and as I mention in the post (and will keep hammering in the followups) conditions is something you want to avoid for performance.
IMO that's the sort of conversion that would be better handled by the "presentation" layer (as in ActiveModel::Serializers and al).
In these gems you usually define something like:
It wouldn't be hard for these libraries to apply a transformation on the attribute name at almost zero cost.> Again, as a Rails developer, a pain point is different naming conventions regarding Ruby hash keys versus JS/JSON object keys. JavaScript/JSON typically uses camelCase, while Ruby uses snake_case.
Most APIs I've come across use snake_case for their keys in JSON requests and responses. I rarely come across camelCase in JSON keys. So I'm happy to just write snake_case keys and let my backend stay simple and easy, and let the API consumer handle any transformations.
I use the same approach another comment points out, using Axios transformers to convert back and forth as necessary.
I love Rails, but if I could go back in time and tell them to avoid one thing it would be their strict adherence to naming conventions.
I've spent more time in my career debugging the magic than I would have by simply defining explicit references.
> if I could go back in time and tell them to avoid one thing it would be their strict adherence to naming conventions
Monkey paw curls. Rails probably wouldnāt have reached popularity were it not for the strict adherence to naming conventions. The Rails value prop is productivity and the ethos of āconvention over configurationā is what makes that possible.
āConvention over configurationā was coined by DHH himself. https://en.wikipedia.org/wiki/Convention_over_configuration Without it, you donāt have Rails.
I'm ok with most of the naming conventions, but the pluralization is one I loathe. The necessity of custom inflections should have been a strong smell, IMO.
Semantically it makes sense. It's not Rails' fault the English language is a terrible serialisation format.
But it can be Rails' fault for choosing to default to a terrible serialisation format.
What would you propose as an alternative?
Sounds like something you could fix once and for all with some metaprogramming converting between the two case cases?
We use a gem called olive branch. Yes itās going to give you a performance hit, but it keeps you sane which is very worthwhile
This one? https://github.com/vigetlabs/olive_branch Looks interesting, unfortunately its latest update is from 3 years ago
If you look at the source [1], you'll see what it's doing is very simple (the file I linked is basically the whole library, everything else is Gem-specific things and tests). You can even skip the gem and implement it yourself, not a big dependency at all, so no need for constant maintenance in this case :p
[1] https://github.com/vigetlabs/olive_branch/blob/main/lib/oliv...
I don't understand the issue with it not being updated for 3 years. Perhaps it's stable and requires no updates?
If the author says it's no longer maintained, then that's something different.
Me too. And even crystal language has the same issue.
Part 2 is available: https://byroot.github.io/ruby/json/2024/12/18/optimizing-rub...
Very fun read! Iām curious though, when it comes to non-Ruby-specific optimizations, like the lookup table for escape characters, why not instead leverage an existing library like simdjson thatās already doing this sort of thing?
I somewhat answered that in https://news.ycombinator.com/item?id=42450085
In short, since `ruby/json` ships with Ruby, it has to be compatible with its constraints, which today means plain c99, and no c++. There would also probably be a licensing issue with simdjson (Apache 2), but not sure.
Overall there's a bunch of really nice c++ libraries I'd love to use, like dragonbox, but just can't.
Another thing is that last time I checked, simdjson only provided a parser, the ruby/json gem does both parsing and encoding so it would only help on half the problem space.
Great read & great work from the author, is there any reason to use Oj going forward?
Author here.
Oj has an extremely large API that I have no intent on emulating in the default json gem, things such as "SAJ" (SAX style parsing), various escaping schemes etc.
My goal is only to make it unnecessary for the 95% or so use case, so yes, Oj will remain useful to some people for a bunch of uses cases.
Sax style parsing is a godsend when dealing with large files, regardless of json or xml. It's indeed what made me switch to a different json library in a Ruby project of mine (I'd have to look it up, but probably to oj).
Love the write-up on this topic, very easy to follow and makes me want to benchmark and optimize some of my ruby code now. Thanks for putting in the effort and also writing the post, byroot!
I love byroot's work. I'm always surpised not only by the kind of contributions he does but the sheer size of how much he does, insane productivity.
Wish he would write more often, I've tried to get into ruby-core type of work more than once, but never found something that matched my skills so I could positively contribute and after a few weeks of no results the motivation would wear off, as it's really difficult to have the context he has shared in the article, for example.
If more Ruby C people would write more often, I bet there'd be more people with the skills that are needed to improve Ruby further.
The C profiler advice was great. Maybe I could just get a Ruby gem with C code and start playing again on optimizations :-)
There is this great serie of Peter Zhu too. https://blog.peterzhu.ca/ruby-c-ext/ Even if it's C extension, it helps understanding some concepts.
But I agree with you.
That's awesome, I personally wasn't aware of that series from Peter Zhu. Thanks!
> insane productivity
He's insanely productive, but also insanely smart. I used to work in the same office as him at Shopify, and he's the kind of person whose level just seems unattainable.
Fully agree. He's also really patient and kind. He's the type of person who's really really smart and doesn't make you feel really really dumb. He takes the time to thoroughly explain things, without being condescending. I always enjoyed when our paths crossed because I knew I was going to learn something.
IIRC the branch predictor hint is useless on modern CPUs
Does ruby json use intrinsics? Could it?
Also, how does this play with the various JITs?
Not too sure what you mean by intrinsincs.
The `json` gem is implemented in C, so it's a black box for YJIT (the reference implementation's JIT).
The TruffleRuby JIT used to interpret C extensions with sulong so it could JIT across languages barrier, but AFAIK they recently stopped doing that because of various compatibility issues.
Also on TruffleRuby the JSON parser is implemented in C, but the encoder is in pure Ruby [0]
[0] https://github.com/ruby/json/blob/e1f6456499d497f33f69ae4c1a...
Thanks!
Sorry about misuse of āintrinsicsā. There is a simdjson library that uses SIMD instructions for speed. Would such an approach be feasible in the ruby json library?
Ah I see.
TL;DR; it's possible, but lots of work, and not that huge of a gain in the context of a Ruby JSON parser.
`ruby/json` doesn't use explicit SIMD instructions, some routines are written in a way that somewhat expects compilers to be able to auto-vectorize, but it's never a given.
In theory using SIMD would be possible as proven by SIMDjson, but it's very (edit) UNlikely we'll do it because of multiple reasons.
First for portability, we have to stick with raw C99, no C++ allowed, so that prevent using SIMDjson outright.
In theory, we could implement the same sort of logic with support for various processors that have various level of SIMD support and have runtime dispatch for it would be terribly tedious. So it's not a reasonable amount of complexity for the amount of time I and other people are willing to spend on the library.
Then there's the fact that it wouldn't do as big as a difference as you'd think. I do happen to have made some bindings for simdjson in https://github.com/Shopify/heap-profiler, because I had an use case for parsing gigabytes of JSON, and it helps quite a bit there.
But I'll hopefully touch on that in a future blog post, the actual JSON parsing part is entirely dwarfed by the work needed to build the resulting Ruby objects tree.
Curious about the next post.
My naive/clueless mind always wonders if it wouldn't make sense to make a new class of Ruby objects that are much simpler and would yield both less memory consumption and GC optimizations that could be used for such cases.
Without a different object model it's hard to imagine optimizations that could greatly improve Ruby execution speed for CRuby, or make the GC much faster (huge issue for big applications), but maybe it's because I don't know much :-)