Skip to content Skip to sidebar Skip to footer

How To Insert Column Of Different Type To Numpy Array?

I would like to append two numpy arrays of type np.datetime64 and int to another. This leads to an error. What do I have to do to correct this? It works without error, if I append

Solution 1:

Probably easiest - work with a Pandas DataFrame instead of an array

Truthfully, while Numpy arrays can be made to work with heterogenous columns, they may not be what most users actually need in this case. For many use cases, you may be better off using a Pandas DataFrame. Here's how to convert your two columns to a DataFrame called df:

import numpy as np
import pandas as pd

a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)


df = pd.DataFrame(dict(date=a.ravel(), val=c.ravel()))
print(df)
# output:#                      date  val#     0 2018-04-01 15:30:00    0#     1 2018-04-01 15:31:00    1#     2 2018-04-01 15:32:00    2#     3 2018-04-01 15:33:00    3#     4 2018-04-01 15:34:00    4

You can then work with each of your columns like so:

print(df['date'])
# output:#     0   2018-04-01 15:30:00#     1   2018-04-01 15:31:00#     2   2018-04-01 15:32:00#     3   2018-04-01 15:33:00#     4   2018-04-01 15:34:00#     Name: date, dtype: datetime64[ns]

DataFrame objects provide a ton of methods that make it pretty easy to analyze this kind of data. See the Pandas docs (or other QAs on this site) for more info about DataFrame objects.

Numpy only solution - structured arrays

Generally, you should avoid arrays of dtype=object if you can. They cause performance issues with many of the basic Numpy operations (such as arithmetic, eg arr0 + arr1), and they may behave in ways you don't expect.

A better Numpy only solution is structured arrays. These arrays have a compound dtype, with one part per field (for the sake of this discussion, "field" is equivalent to "column", though you can do more interesting things with fields). Given your a and c arrays, here's how you can create a structured array:

# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))

# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)

# populate the structured arraywith the data from your column arrays
struct['date'], struct['val'] = a.T, c.T

print(struct)
# output:
#     array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
#            ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
#            ('2018-04-01T15:34:00', 4)],
#           dtype=[('date', '<M8[s]'), ('val', '<i8')])

You can then access the specific columns by indexing them with their name (just like you could with the DataFrame):

print(struct['date'])
# output:
#     ['2018-04-01T15:30:00''2018-04-01T15:31:00''2018-04-01T15:32:00'
#      '2018-04-01T15:33:00''2018-04-01T15:34:00']

Structured array pitfalls

You can't, for example, add two structured arrays:

# doesn't work
struct0 + struct1

but you can add the fields of two structured arrays:

# works great
struct0['val'] + struct1['val']

In general, the fields behave just like standard Numpy arrays.

Solution 2:

Taking into account the statements of the other users, leads to the insight, that converting the first array to dtype object is at least a workaround.

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
       ['2018-04-01T15:31:00'],
       ['2018-04-01T15:32:00'],
       ['2018-04-01T15:33:00'],
       ['2018-04-01T15:34:00']], dtype='datetime64[s]')
a = a.astype("object")
c = np.array([0,1,2,3,4]).reshape(-1,1)
d = np.append(a,c,axis=1)
d

.

array([[datetime.datetime(2018, 4, 1, 15, 30), 0],
   [datetime.datetime(2018, 4, 1, 15, 31), 1],
   [datetime.datetime(2018, 4, 1, 15, 32), 2],
   [datetime.datetime(2018, 4, 1, 15, 33), 3],
   [datetime.datetime(2018, 4, 1, 15, 34), 4]], dtype=object)

Post a Comment for "How To Insert Column Of Different Type To Numpy Array?"