How To Use Np.genfromtxt And Fill In Missing Columns?
Solution 1:
Pandas has more robust readers and you can use the DataFrame
methods to handle the missing values.
You'll have to figure out how many columns to use first:
columns = max(len(l.split()) for l in open('data.txt'))
To read the file:
import pandas
df = pandas.read_table('data.txt',
delim_whitespace=True,
header=None,
usecols=range(columns),
engine='python')
To convert to a numpy array:
import numpy
a = numpy.array(df)
This will fill in NaNs in the blank positions. You can use .fillna()
to get other values for blanks.
filled = numpy.array(df.fillna(999))
Solution 2:
You need to modify the filling_values
argument to np.nan
(which is considered of type float so you won't have the string conversion issue) and specify the delimiter to be comma since by default genfromtxt
expects only white space as delimiters:
trainData = np.genfromtxt('data.txt', usecols = range(0, 5), invalid_raise=False, missing_values = "", filling_values=np.nan, delimiter=',')
Solution 3:
I managed to figure out a solution.
df = pandas.DataFrame([line.strip().split() for line in open('data.txt', 'r')])
data = np.array(df)
Solution 4:
With the copy-n-paste of the 3 big lines, this pandas reader works:
In [149]: pd.read_csv(BytesIO(txt), delim_whitespace=True,header=None,error_bad_
...: lines=False,names=list(range(91)))
Out[149]:
0 1 2 3 4 5 6 7 8 9 ... 81 82 \
0 0.79 0.1 0.91 -0.17 0.1 0.33 -0.9 0.1 -0.19 -0.0 ... 515 163
1 0.79 0.1 0.91 -0.17 0.1 0.33 -0.9 0.1 -0.19 -0.0 ... 515 163
2 0.79 0.1 0.91 -0.17 0.1 0.33 -0.9 0.1 -0.19 -0.0 ... 125 30
83 84 85 86 87 88 89 90
0 535 NaN NaN NaN NaN NaN NaN NaN
1 509 112.0 535.0 NaN NaN NaN NaN NaN
2 412 422.0 556.0 55.0 355.0 485.0 112.0 515.0
_.values
to get the array.
The key is specifying a big enough names
list. Pandas can fill incomplete lines, while genfromtxt
requires explicit delimiters.
Post a Comment for "How To Use Np.genfromtxt And Fill In Missing Columns?"