How To Insert Column Of Different Type To Numpy Array?
Solution 1:
Probably easiest - work with a Pandas DataFrame
instead of an array
Truthfully, while Numpy arrays can be made to work with heterogenous columns, they may not be what most users actually need in this case. For many use cases, you may be better off using a Pandas DataFrame
. Here's how to convert your two columns to a DataFrame
called df
:
import numpy as np
import pandas as pd
a = np.array([['2018-04-01T15:30:00'],
['2018-04-01T15:31:00'],
['2018-04-01T15:32:00'],
['2018-04-01T15:33:00'],
['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
df = pd.DataFrame(dict(date=a.ravel(), val=c.ravel()))
print(df)
# output:# date val# 0 2018-04-01 15:30:00 0# 1 2018-04-01 15:31:00 1# 2 2018-04-01 15:32:00 2# 3 2018-04-01 15:33:00 3# 4 2018-04-01 15:34:00 4
You can then work with each of your columns like so:
print(df['date'])
# output:# 0 2018-04-01 15:30:00# 1 2018-04-01 15:31:00# 2 2018-04-01 15:32:00# 3 2018-04-01 15:33:00# 4 2018-04-01 15:34:00# Name: date, dtype: datetime64[ns]
DataFrame
objects provide a ton of methods that make it pretty easy to analyze this kind of data. See the Pandas docs (or other QAs on this site) for more info about DataFrame
objects.
Numpy only solution - structured arrays
Generally, you should avoid arrays of dtype=object
if you can. They cause performance issues with many of the basic Numpy operations (such as arithmetic, eg arr0 + arr1
), and they may behave in ways you don't expect.
A better Numpy only solution is structured arrays. These arrays have a compound dtype
, with one part per field (for the sake of this discussion, "field" is equivalent to "column", though you can do more interesting things with fields). Given your a
and c
arrays, here's how you can create a structured array:
# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))
# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)
# populate the structured arraywith the data from your column arrays
struct['date'], struct['val'] = a.T, c.T
print(struct)
# output:
# array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
# ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
# ('2018-04-01T15:34:00', 4)],
# dtype=[('date', '<M8[s]'), ('val', '<i8')])
You can then access the specific columns by indexing them with their name (just like you could with the DataFrame
):
print(struct['date'])
# output:
# ['2018-04-01T15:30:00''2018-04-01T15:31:00''2018-04-01T15:32:00'
# '2018-04-01T15:33:00''2018-04-01T15:34:00']
Structured array pitfalls
You can't, for example, add two structured arrays:
# doesn't work
struct0 + struct1
but you can add the fields of two structured arrays:
# works great
struct0['val'] + struct1['val']
In general, the fields behave just like standard Numpy arrays.
Solution 2:
Taking into account the statements of the other users, leads to the insight, that converting the first array to dtype object
is at least a workaround.
import numpy as np
a = np.array([['2018-04-01T15:30:00'],
['2018-04-01T15:31:00'],
['2018-04-01T15:32:00'],
['2018-04-01T15:33:00'],
['2018-04-01T15:34:00']], dtype='datetime64[s]')
a = a.astype("object")
c = np.array([0,1,2,3,4]).reshape(-1,1)
d = np.append(a,c,axis=1)
d
.
array([[datetime.datetime(2018, 4, 1, 15, 30), 0],
[datetime.datetime(2018, 4, 1, 15, 31), 1],
[datetime.datetime(2018, 4, 1, 15, 32), 2],
[datetime.datetime(2018, 4, 1, 15, 33), 3],
[datetime.datetime(2018, 4, 1, 15, 34), 4]], dtype=object)
Post a Comment for "How To Insert Column Of Different Type To Numpy Array?"