Dealing with JSON with non-homogeneous types in GO

Here at Bitnami we have to interact a lot with the messy world out there, so you don't have to. We do a lot of stuff behind the scenes in order to provide our users with a simple way to run your favourite applications. Today I want to talk about the problem of dealing with JSON files with non-homogeneous schemas in Go.

In particular, we noticed that people coming from dynamic languages like Python, JavaScript or Ruby find Go hard to use, often just because they want to keep doing things like they are used to. However, learning a new language also means embracing different approaches.

So, here's a tale about one possible way to solve a practical problem where things start to get ugly quickly if you don't let yourself Go.

The empty interface pitfall

Let's consider this simple case of non-homogeneous JSON schema:

    [ { "value": "123" }, { "value": 123 } ]

Developers using dynamic languages are used to dealing with this sort of things in a straightforward way: just parse the JSON file into a native in memory structure and then handle it at runtime.

When those developers switch to Go they soon notice that Go's static typing appears to get in the way. Let's imagine, that what you really want is to do something like:

    type Foo struct {
      Value int
    } 

But obviously, if you do that and the input contains a string, the "Go JSON stdlib" will fail with a friendly message:

json: cannot unmarshal string into Go struct field Foo.Value of type int

A common approach is to use Go's equivalent of the untyped field, interface{}:

    type Foo struct {
      Value interface{}
    }

But as you can see things get messy pretty fast:

    func demo() (int, error) {
       var f Foo
       if err := json.Unmarshal([]byte(`{ "value": 123 }`), &f); err != nil {
        return 0, err
           } 

       var n int
       if i, ok := f.Value.(float64); ok { // yeah, JSON numbers are floats, gotcha!
         n = int(i)
       } else if s, ok := f.Value.(string); ok {
          var err error
          n, err := strconv.Atoi(s[1 : len(s)-1])
          if err != nil {
             return 0, err
          }
       }

       // do something with `n`
       ...
    }

... so you'll soon factor this out into some kind of accessor method, e.g.:

    func (f *Foo) getValue() (int, error) {
     ....
    }

But, now you have to deal with the error every time you access it. Some people will be tempted to either ignore the error or cause a panic, which I think we can all agree is not the best thing that can be done.

Furthermore, when it fails, all you can do is to say that the field called Value has some bad content. But how would you know which value is that? What if you have a big JSON with plenty of strings that look like that? Wouldn't it be nice to know the position in the JSON input that breaks our expectations?

Ideally, if the input is bad (according to our, application dependent, definition of bad), we want to fail early, right where we have the most context available: during parsing.

Custom Unmarshalers

Luckily the Go JSON standard library allows us to run some custom code right during parsing. All you have to do is implement the json.Unmarshaler interface. Let’s see this example:

    type Foo struct {
      Value     FlexInt
    }

    // A FlexInt is an int that can be unmarshalled from a JSON field
    // that has either a number or a string value.
    // E.g. if the json field contains an string "42", the
    // FlexInt value will be "42".
    type FlexInt int

    // UnmarshalJSON implements the json.Unmarshaler interface, which
    // allows us to ingest values of any json type as an int and run our custom conversion

    func (fi *FlexInt) UnmarshalJSON(b []byte) error {
           if b[0] != '"' {
                  return json.Unmarshal(b, (*int)(fi))
           }
           var s string
           if err := json.Unmarshal(b, &s); err != nil {
                  return err
           }
           i, err := strconv.Atoi(s)
           if err != nil {
                  return err
    }
    *fi = FlexInt(i)
    return nil
  }

Now, when you use this FlexInt type, you still have to do explicit type conversion to int (because Go doesn't do implicit conversions).

    func demo() (int, error) {
           var f Foo
           if err := json.Unmarshal([]byte(`{ "value": 123 }`), &f); err != nil {
           return 0, err
    }

    n = int(f.Value)
    // do something with `n`
    ...
 } 

This code is still type safe: the compiler won't allow you to convert any type to an int using type conversion (it's not like a "cast" in other languages). Furthermore, once the unmarshaling returns without error, you can rely on the fact that the value has been correctly parsed.

More

This was just a simplified example. You can see how this becomes more interesting when the datatypes involved get more complex:

    [ { "value": 123 }, { "value": [123] } }, { "value": { "port": 123 } }]

You can also do the same trick with a structure instead of a type alias: you can use a structure that captures what the actual type was. For example you could want to be dynamic but only allow a subset of possible types (e.g. int, and string, but not array). I left it as an exercise for the reader.

Hopefully you'll find the pattern to be generally useful.