Skip to content
Advertisement

Problem with repeated objects while importing JSON into Google BigQuery

I have been trying to manually upload the JSON into BigQuery, but I get the following error message.

Error while reading data, error message: JSON parsing error in row starting at position 0: Repeated field must be imported as a JSON array. Field: custom_fields.value.

I already converted the file into newline delimited JSON, so that is not the problem. When looking at the custom_field.value from the error I get this:

$ cat convert2working.json | jq .custom_fields[].value
0
"Basics of information securityn"
"2021"

The problem seems to be that the custom_fields.value has different data types.

How can I “homogenize” those data types? or do you have another solution. I would prefer to stay in javascript

Here is a shortened version of my JSON code:

{
    "id": "example",
    "custom_fields": [
        {
            "id": "example",
            "name": "Interval",
            "type": "drop_down",
            "type_config": {
                "default": 0,
                "placeholder": null,
                "options": [
                    {
                        "id": "example",
                        "name": "yearly",
                        "color": null,
                        "orderindex": 0
                    }
                ]
            },
            "date_created": "1611228211303",
            "hide_from_guests": false,
            "value": 0,
            "required": false
        },
        {
            "id": "example",
            "name": "Description",
            "type": "text",
            "type_config": {},
            "date_created": "1611228263444",
            "hide_from_guests": false,
            "value": "Basics of information securityn",
            "required": false
        },
        {
            "id": "example",
            "name": "Year",
            "type": "number",
            "type_config": {},
            "date_created": "1611228275285",
            "hide_from_guests": false,
            "value": "2021",
            "required": false
        }
    ]
}

Advertisement

Answer

You need to normalize your data structure so that BigQuery is able to auto-detect a consistent schema. Because the value property is once a number and once a string, this auto-detection fails.

There are multiple ways to normalize your data. I’m not 100% sure which way will work best for BigQuery, which claims to analyze up to the first 100 rows for schema auto-detection.

The first attempt is to put different types of values into different fields

const fields = data.custom_fields.map(x => {

    const f = {
        id: x.id,
    name: x.name
  };
  
  f[x.type] = x.value;
  
  return f;
});

This will yield:

[{
  id: "example",
  name: "Interval",
  value_drop_down: 0
}, {
  id: "example",
  name: "Description",
  value_text: "Basics of information security↵"
}, {
  id: "example",
  name: "Year",
  value_number: "2021"
}]

I’m not sure if this is a structure where BigQuery can reliably merge the inferred type schema for fields, because it might e.g. only encounter value_number in the first 100 rows, and will therefore not handle value_dropdown.

A more reliable approach (assuming you know all the different values of type) is to transform the records explicitly into the same structure. This also has the advantage of being able to run any specialized transformations on field values (such as conversions, lookups etc.)

const fields2 = data.custom_fields.map(x => ({
    id: x.id,
    name: x.name,
    value_text: x.type === 'text' ? x.value : null,
    value_number: x.type === 'number' ? parseInt(x.value, 10) : null,
    value_dropdown: x.type === 'drop_down' ? x.type_config.options.find(o => o.orderindex === x.value).name : null
  })
);

You might have to make some of the transform logic a bit more robust depending on your data (e.g. if values are optional or can be empty). Using your example data this transformation yields:

[{
  "id": "example",
  "name": "Interval",
  "value_text": null,
  "value_number": null,
  "value_dropdown": "yearly",
  "value_drop_down": 0
}, {
  "id": "example",
  "name": "Description",
  "value_text": "Basics of information securityn",
  "value_number": null,
  "value_dropdown": null
}, {
  "id": "example",
  "name": "Year",
  "value_text": null,
  "value_number": "2021",
  "value_dropdown": null
}]

I’ve created a JSFiddle where you can play around with this code.

Advertisement