Skip to content Skip to sidebar Skip to footer

Numpy Genfromtxt Issues In Python3

I'm trying to use genfromtxt with Python3 to read a simple csv file containing strings and numbers. For example, something like (hereinafter 'test.csv'): 1,a 2,b 3,c with Python2,

Solution 1:

The answer to my problem is using the dtype for unicode strings (U2, for example).

Thanks to the answer of E.Kehler, I found the solution. If I use str in place of S8 in the dtype definition, then the output for the 2nd column is empty:

numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,str')

the output is:

array([(1.0, ''), (2.0, ''), (3.0, '')], dtype=[('f0', '<f16'), ('f1', '<U0')])

This suggested me that correct dtype to solve my problem is an unicode string:

numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,U2')

that gives the expected output:

array([(1.0, 'a'), (2.0, 'b'), (3.0, 'c')], dtype=[('f0', '<f16'), ('f1', '<U2')])

Useful information can be also found at the numpy datatype doc page .


Solution 2:

In python 3, writing

dtype="S8"

(or any variation of "S#") in NumPy's genfromtxt yields a byte string. To avoid this and get just an old fashioned string, write

dtype=str

instead.


Solution 3:

training = np.genfromtxt('twitter_train.csv', delimiter=',', usecols=(0,1), dtype='U')

In my case, the first column contains a sentiment value of either 0 or 1 and the second column is a string of many characters representing a tweet in this ex. dtype='U' removed the b' from being included.

So in your case it would be: data=numpy.genfromtxt("test.csv", delimiter=",", dtype='U')


Post a Comment for "Numpy Genfromtxt Issues In Python3"