How Do I Load Heterogeneous Data (np.genfromtxt) As A 2d Array?
Solution 1:
In this case use pandas and then converting pandas dataframe to numpy matrix would be easier.
import pandas as pd
foo = pd.read_csv('table.dat', sep='\t')
type(foo)
<class 'pandas.core.frame.DataFrame'>
bar = foo.as_matrix()
array([[10, 7, 6, 7, 10],
[ 5, 10, 2, 1, 3],
[ 7, 6, 5, 3, 6],
[ 5, 8, 5, 2, 7],
[ 1, 2, 2, 10, 8],
[10, 5, 9, 3, 8],
[ 5, 2, 4, 4, 2]])
bar.shape
(7,5)
Solution 2:
I got this to work with:
import numpy as np
table = np.genfromtxt('table.dat',
dtype=None,
skip_header=1)
Here's why it works:
- You should consecutive whitespace as the delimiter (the default) not tabs (unless the snippet you posted has lost formatting).
- You should let NumPy infer the
dtype
, rather than using the defaultfloat
. - To get the desired output in your question you want to simply skip the header column rather than get the function to create a structured
dtype
.
Check out the docs: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.genfromtxt.html for more details.
I agree a Pandas DataFrame may be more appropriate if you are essentially reading in a csv file.
Solution 3:
Your data looks homogeneous - all int except for the header. But by saying header=True
you force it to load it as a structured array. Look at the dtype
.
Try skip_header=1
(check the syntax). Omit names
(or make it false).
In other words you want to load integers, ignoring the header line.
The tab delimiter appears to be working ok.
I see from a comment that you have discovered the view
method of converting a structured array. That gives you both header names and a 2d view.
Post a Comment for "How Do I Load Heterogeneous Data (np.genfromtxt) As A 2d Array?"