Skip to content

Serialize a list of numpy arrays and read back/deserialize into Javascript

Consider a list of numpy arrays:

arr = [np.linspace(a1,a2,11) for a1,a2 in [(1,10),(20,30)]]
nparr = np.array(arr)

I would like to serialize this to transmit to a Javascript REST client. The preferred approach is

  • Efficiently serialize into a binary-safe format and bake that into a Base64 encoded field in a JSON object
  • Transmit the JSON object over http
  • Receive the JSON object into javascript listener.
  • Base64 decode the field and deserialize into binary array using an efficient javascript deserialization library

I have done an initial investigation into apache arrow that has support in both languages.

Note: I tried the following:

  • convert to two dimensional numpy array
  • convert to pyarrow

Following happened

pyarr = pya.array(nparr)


ArrowInvalid                              Traceback (most recent call last)
<ipython-input-11-68eb3e5f578f> in <module>
----> 1 pyarr = pya.array(nparr)

ArrowInvalid: only handle 1-dimensional arrays

So pyarrow seems pretty limited in terms of the structures of the data it can serialize. I also am looking into the apache parquet format : but that seems to require actually writing to disk/filesystem?

Working code for those two technologies or possibly a different library/approach would be welcome.

Answer

Arrow is capable of serializing list of arrays of float. But I think it needs a little help if the list is multi dimension numpy array:

pa.array(
    arr.tolist(),
    pa.list_(pa.float64())
)

But given your use case, since all arrays have got the same length, I’d recomment using a Table instead an Array

schema = pa.schema(
        [
            pa.field(str(i), pa.float64())
            for i in range(len(nparr))
        ]
        
    )

table = pa.Table.from_arrays(
    nparr,
    schema=schema
)