Larry’s Blog

Objects Serialization in Ruby

| Comments

Let’s talk about serializing objects in Ruby today.

Built-In Serialization Mechanisms

Ruby has two object serialization mechanisms built into the lauguage. One is what we are very familiar of, YAML(YAML Ain’t Markup Language), which is also human readable format, and the other one is binary format.

YAML Serialization

In Ruby, any objects can be serialized into YAML format. And it’s really easy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
require 'yaml'

class A
  def initialize(string, number)
    @string = string
    @number = number
  end
end

class B
  def initialize(number, a_object)
    @number   = number
    @a_object = a_object
  end
end

class C
  def initialize(b_object, a_object)
    @b_object = b_object
    @a_object = a_object
  end
end

a = A.new("hello world", 5)
b = B.new(7, a)
c = C.new(b, a)

serialized_object = YAML::dump(c)

puts serialized_object

d = YAML::load(serialized_object)
require 'pry'; binding.pry

And the serialized_object looks like this:

1
2
3
4
5
6
7
--- !ruby/object:C
b_object: !ruby/object:B
  number: 7
  a_object: &1 !ruby/object:A
    string: hello world
    number: 5
a_object: *1

And from the pry console, we can get what d is:

1
2
3
4
5
6
7
8
9
10
[1] pry(main)> d
=> #<C:0x007fb93306ed18
 @a_object=#<A:0x007fb93306e3e0 @number=5, @string="hello world">,
 @b_object=#<B:0x007fb93306ea98 @a_object=#<A:0x007fb93306e3e0 @number=5, @string="hello world">, @number=7>>
[2] pry(main)> d.class
=> C
[3] pry(main)> c
=> #<C:0x007fb933077120
 @a_object=#<A:0x007fb933077170 @number=5, @string="hello world">,
 @b_object=#<B:0x007fb933077148 @a_object=#<A:0x007fb933077170 @number=5, @string="hello world">, @number=7>>

Pretty easy, right?

Binary Serialization

The other serialization mechanism built into Ruby is binary serialization using Marshal. Its only difference with YAML is the not human readable format as it stores objects in a binary format.

1
2
3
4
a = A.new("hello world", 5)
serialized_object = Marshal::dump(a)
e = Marshal::load(serialized_object)
require 'pry'; binding.pry
1
2
[1] pry(main)> e
=> #<A:0x007f83890840d0 @number=5, @string="hello world">

The disadvantage of Marshal is obvious: output is not human-readable. And the advantage is its speed compared to YAML.

JSON

So, What else? Actually In Ruby community, the most commonly used serialization mechanism is JSON. The JSON support in Ruby is provided by serveral libraries, such as json, json_pure (Pure Ruby Implementation), Yajl, Oj.

Then, why JSON? We already have YAML format, right? We choose JSON not only because it’s human readable(more than YAML), but also it can be used to transport data by AJAX calls.

Now let’s see what differences are between these implementations.

json

With json or json_pure gem, if you are serializing an object which is not a hash, arrray, or primitive, you have to write more codes to make sure the object is serializable. Let’s take class A for an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
require 'json'

class A
  def initialize(string, number)
    @string = string
    @number = number
  end

  def to_json(*a)
    {
      "json_class" => self.class.name,
      "data"       => {"string" => @string, "number" => @number}
    }.to_json(*a)
  end

  def self.json_create(o)
    new(o["data"]["string"], o["data"]["number"])
  end
end

a = A.new("hello world", 5)
json_string = a.to_json
puts json_string

f = JSON.load(json_string)
require 'pry'; binding.pry

We can see the output like this:

1
{"json_class":"A","data":{"string":"hello world","number":5}}

And from the pry console:

1
2
3
4
[1] pry(main)> f
=> #<A:0x007fb3da955cc8 @number=5, @string="hello world">
[2] pry(main)> f.class
=> A

As you can see, we should implement two methods to make our custom object serialization work:

  • to_json – called on the object instance and allows us to convert an object into a JSON string.
  • json_create – allows us to call JSON.load passing in a JSON string which will convert the string into an instance of our object

YAJL

YAJL is a C binding to the excellent YAJL JSON parsing and generation library. And according to author’s benchmarks, it’s faster than JSON, YAML, and Marshal.

  • ~3.5x faster than JSON.generate
  • ~1.9x faster than JSON.parse
  • ~4.5x faster than YAML.load
  • ~377.5x faster than YAML.dump
  • ~1.5x faster than Marshal.load
  • ~2x faster than Marshal.dump

But unfortunately, yajl-ruby doesn’t support serializing custom objects, and you could see the benchmark results don’t include comparison with oj.

Oj

Oj stands for Optimized JSON. Here is a simple example:

1
2
3
4
5
6
7
require 'oj'

h = {:one => 1, :array => [true, false]}
json_string = Oj.dump h
puts json_string
h2 = Oj.load json_string
puts "Same? #{h == h2}"

Output:

1
2
{":one":1,":array":[true,false]}
Same? true

And remember our custom class A? You don’t have to write two more methods like json gem. Oj would take care for you, and you can see how it does that through the json_string it generates.

1
2
3
4
5
6
a = A.new("hello world", 42)
json_string = Oj.dump a
puts json_string

a2 = Oj.load json_string
require 'pry'; binding.pry

Output:

1
{"^o":"A","string":"hello world","number":42}

It uses a special string “^o” to mark your custom object’s class!

Console:

1
2
3
4
[1] pry(main)> a2
=> #<A:0x007fa55b9f57b0 @number=42, @string="hello world">
[2] pry(main)> a2.class
=> A

Performance

Here is the fun part, let’s write some benchmarks to compare these mechanisms’ performance.

Basic Ruby data types deserialization:

benchmark_strict_mode.rb
1
2
3
4
5
6
7
8
9
10
11
12
require 'oj'
require 'json/ext'
require 'yajl'
require 'benchmark/ips'

json_string = %q({"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]})

Benchmark.ips do |x|
  x.report("oj") { Oj.load json_string }
  x.report("json") { JSON.load json_string }
  x.report("yajl") { Yajl::Parser.parse json_string }
end

Result output:

1
2
3
4
5
6
7
8
Calculating -------------------------------------
                  oj     7.522k i/100ms
                json     4.506k i/100ms
                yajl     4.173k i/100ms
-------------------------------------------------
                  oj     79.842k (± 5.7%) i/s -    398.666k
                json     46.061k (± 5.1%) i/s -    229.806k
                yajl     43.923k (± 7.8%) i/s -    221.169k

And you could see, oj has the best performance.

What about custom objects?

benchmark_compact_mode.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
require 'oj'
require 'json/ext'
require 'benchmark/ips'

module One
  module Two
    module Three
      class Empty; end
    end
  end
end

json_string = %q({"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"g":{"json_class":"One::Two::Three::Empty"},"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]})

Benchmark.ips do |x|
  x.report("oj") { Oj.load json_string }
  x.report("json") { JSON.load json_string }
end

Result output:

1
2
3
4
5
6
Calculating -------------------------------------
                  oj     7.009k i/100ms
                json     3.359k i/100ms
-------------------------------------------------
                  oj     73.791k (± 4.7%) i/s -    371.477k
                json     31.266k (± 8.3%) i/s -    157.873k

Still, oj beats json gem.

Actually, the author of oj gem did a more detailed performance comparisons before, also compared with native Marshal, and still a win.

  • Strict Mode Performance: To benchmark the Oj strict parser a Ruby Object was selected that included all the various types that can be expected in a JSON document as well as some nested arrays and hashes to add some variety.
  • Compat Mode Performance: Yajl was not included as it does not support the feature of encoding Object that respond to the to_json() or to_hash() method and does not support the create_id feature.
  • Object Mode Performance: There are no other JSON parsers that support Object encoding and decoding but the Ruby Marshal module does encode and decode Ruby Objects into a binary format (@larry: by no other JSON parsers, I think the author was tring to even json kind of suports it, but it needs more implementations.). Another efficient Object encoder is Ox which encodes Objects into XML. Oj is compared to these modules.

So apparently, if you use MRI, the oj gem might be your best choice.

And this week, I will blog more about JSON serialization in Rails.

See you soon!

Comments