Consider a list of numpy arrays:
arr = [np.linspace(a1,a2,11) for a1,a2 in [(1,10),(20,30)]] nparr = np.array(arr)
I would like to serialize this to transmit to a Javascript REST client. The preferred approach is
- Efficiently serialize into a binary-safe format and bake that into a Base64 encoded field in a JSON object
- Transmit the JSON object over http
- Receive the JSON object into javascript listener.
- Base64 decode the field and deserialize into binary array using an efficient javascript deserialization library
I have done an initial investigation into apache arrow that has support in both languages.
Note: I tried the following:
- convert to two dimensional numpy array
- convert to pyarrow
Following happened
pyarr = pya.array(nparr) ArrowInvalid Traceback (most recent call last) <ipython-input-11-68eb3e5f578f> in <module> ----> 1 pyarr = pya.array(nparr) ArrowInvalid: only handle 1-dimensional arrays
So pyarrow seems pretty limited in terms of the structures of the data it can serialize. I also am looking into the apache parquet format : but that seems to require actually writing to disk/filesystem?
Working code for those two technologies or possibly a different library/approach would be welcome.
Advertisement
Answer
Arrow is capable of serializing list of arrays of float. But I think it needs a little help if the list is multi dimension numpy array:
pa.array( arr.tolist(), pa.list_(pa.float64()) )
But given your use case, since all arrays have got the same length, I’d recomment using a Table
instead an Array
schema = pa.schema( [ pa.field(str(i), pa.float64()) for i in range(len(nparr)) ] ) table = pa.Table.from_arrays( nparr, schema=schema )