Skip to content Skip to sidebar Skip to footer

Extract Array (column Name, Data) From Pandas Dataframe

This is my first question at Stack Overflow. I have a DataFrame of Pandas like this. a b c d one 0 1 2 3 two 4 5 6 7 three 8 9 0 1 four

Solution 1:

Many gensim functions accept numpy arrays, so there may be a better way...

In [11]: is_one = np.where(df == 1)

In [12]: is_one
Out[12]: (array([0, 2, 3, 3, 4, 4]), array([1, 3, 1, 2, 0, 1]))

In [13]: df.index[is_one[0]], df.columns[is_one[1]]
Out[13]:
(Index([u'one', u'three', u'four', u'four', u'five', u'five'], dtype='object'),
 Index([u'b', u'd', u'b', u'c', u'a', u'b'], dtype='object'))

To groupby each row, you could use iterrows:

from itertools import repeat

In [21]: [list(zip(df.columns[np.where(row == 1)], repeat(1.0)))
          for label, row in df.iterrows()
          if1in row.values]  # if you don't want empty [] for rows without 1
Out[21]:
[[('b', 1.0)],
 [('d', 1.0)],
 [('b', 1.0), ('c', 1.0)],
 [('a', 1.0), ('b', 1.0)]]

In python 2 the list is not required since zip returns a list.

Solution 2:

Another way would be

In[1652]: [[(c, 1) for c in x[x].index] for_, xindf.eq(1).iterrows() ifx.any()]
Out[1652]: [[('b', 1)], [('d', 1)], [('b', 1), ('c', 1)], [('a', 1), ('b', 1)]]

Post a Comment for "Extract Array (column Name, Data) From Pandas Dataframe"