How To Take Out The Column Index Name In Dataframe

January 29, 2023 Post a Comment

Open High Low Close Volume Adj Close Date 1990-01-02 00:00:00 35.25 37.50 35.00 37.25 6555600 8.70 1990-0

Solution 1:

Try using the reset_index method which moves the DataFrame's index into a column (which is what you want, I think).

Solution 2:

Short answer: you can't and it's not clear why this could ever "cause problems". The 'Date' name is naming the Index of the DataFrame, which is different from any of the columns. It gets printed with this offset specifically so you will not confuse it with a column of the frame. You would not slice into the date with DataFrame['Date'] as per below:

>>> import numpy as np; import pandas; import datetime

>>> dfrm = pandas.DataFrame(np.random.rand(10,3), 
... columns=['A','B','C'], 
... index = pandas.Index(
... [datetime.date(2012,6,elem) for elem in range(1,11)],
... name="Date"))

>>> dfrm
                   A         B         C
Date                                    
2012-06-01  0.283724  0.863012  0.798891
2012-06-02  0.097231  0.277564  0.872306
2012-06-03  0.821461  0.499485  0.126441
2012-06-04  0.887782  0.389486  0.374118
2012-06-05  0.248065  0.032287  0.850939
2012-06-06  0.101917  0.121171  0.577643
2012-06-07  0.225278  0.161301  0.708996
2012-06-08  0.906042  0.828814  0.247564
2012-06-09  0.733363  0.924076  0.393353
2012-06-10  0.273837  0.318013  0.754807

>>> dfrm['Date']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1458, in __getitem__
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 294, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 625, in get
    _, block = self._find_block(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 715, in _find_block
    self._check_have(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 722, in _check_have
    raise KeyError('no item named %s' % str(item))
KeyError: 'no item named Date'

Longer answer:

You can change your DataFrame by adding the index into its own column if you'd like it to print that way. For example:

>>> dfrm['Date'] = dfrm.index

>>> dfrm
                   A         B         C        Date
Date                                                
2012-06-01  0.283724  0.863012  0.798891  2012-06-01
2012-06-02  0.097231  0.277564  0.872306  2012-06-02
2012-06-03  0.821461  0.499485  0.126441  2012-06-03
2012-06-04  0.887782  0.389486  0.374118  2012-06-04
2012-06-05  0.248065  0.032287  0.850939  2012-06-05
2012-06-06  0.101917  0.121171  0.577643  2012-06-06
2012-06-07  0.225278  0.161301  0.708996  2012-06-07
2012-06-08  0.906042  0.828814  0.247564  2012-06-08
2012-06-09  0.733363  0.924076  0.393353  2012-06-09
2012-06-10  0.273837  0.318013  0.754807  2012-06-10

After this, you could simply change the name of the index so that nothing prints:

>>> dfrm.reindex(pandas.Series(dfrm.index.values, name=''))
                   A         B         C        Date

2012-06-01  0.283724  0.863012  0.798891  2012-06-01
2012-06-02  0.097231  0.277564  0.872306  2012-06-02
2012-06-03  0.821461  0.499485  0.126441  2012-06-03
2012-06-04  0.887782  0.389486  0.374118  2012-06-04
2012-06-05  0.248065  0.032287  0.850939  2012-06-05
2012-06-06  0.101917  0.121171  0.577643  2012-06-06
2012-06-07  0.225278  0.161301  0.708996  2012-06-07
2012-06-08  0.906042  0.828814  0.247564  2012-06-08
2012-06-09  0.733363  0.924076  0.393353  2012-06-09
2012-06-10  0.273837  0.318013  0.754807  2012-06-10

This seems a bit overkill. Another option is to just change the index to integers or something after adding the Date as a column:

>>> dfrm.reset_index()

or if you already moved the index into a column manually, then just

>>> dfrm.index = range(len(dfrm))

>>> dfrm
          A         B         C        Date
0  0.283724  0.863012  0.798891  2012-06-01
1  0.097231  0.277564  0.872306  2012-06-02
2  0.821461  0.499485  0.126441  2012-06-03
3  0.887782  0.389486  0.374118  2012-06-04
4  0.248065  0.032287  0.850939  2012-06-05
5  0.101917  0.121171  0.577643  2012-06-06
6  0.225278  0.161301  0.708996  2012-06-07
7  0.906042  0.828814  0.247564  2012-06-08
8  0.733363  0.924076  0.393353  2012-06-09
9  0.273837  0.318013  0.754807  2012-06-10

Or the following if you care about the order the columns appear:

>>> dfrm.ix[:,[-1]+range(len(dfrm.columns)-1)]
         Date         A         B         C
0  2012-06-01  0.283724  0.863012  0.798891
1  2012-06-02  0.097231  0.277564  0.872306
2  2012-06-03  0.821461  0.499485  0.126441
3  2012-06-04  0.887782  0.389486  0.374118
4  2012-06-05  0.248065  0.032287  0.850939
5  2012-06-06  0.101917  0.121171  0.577643
6  2012-06-07  0.225278  0.161301  0.708996
7  2012-06-08  0.906042  0.828814  0.247564
8  2012-06-09  0.733363  0.924076  0.393353
9  2012-06-10  0.273837  0.318013  0.754807

Added

Here are a few helpful functions to include in an iPython configuration script (so that they are loaded upon startup), or to put in a module you can easily load when working in Python.

###########
# Imports #
###########
import pandas
import datetime
import numpy as np
from dateutil import relativedelta
from pandas.io import data as pdata


############################################
# Functions to retrieve Yahoo finance data #
############################################

# Utility to get generic stock symbol data from Yahoo finance.
# Starts two days prior to present (or most recent business day)
# and goes back a specified number of days.
def getStockSymbolData(sym_list, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):

    dReader = pdata.DataReader
    start_date = end_date + relativedelta.relativedelta(days=-num_dates)
    return dict( (sym, dReader(sym, "yahoo", start=start_date, end=end_date)) for sym in sym_list )                     
###

# Utility function to get some AAPL data when needed
# for testing.
def getAAPL(end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):

    dReader = pdata.DataReader
    return getStockSymbolData(['AAPL'], end_date=end_date, num_dates=num_dates)
###

I also made a class below to hold some data for common stocks:

#####
# Define a 'Stock' class that can hold simple info
# about a security, like SEDOL and CUSIP info. This
# is mainly for debugging things and quickly getting
# info for a single security.
class MyStock():

    def __init__(self, ticker='None', sedol='None', country='None'):
        self.ticker = ticker
        self.sedol=sedol
        self.country = country
    ###


    def getData(self, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
        return pandas.DataFrame(getStockSymbolData([self.ticker], end_date=end_date, num_dates=num_dates)[self.ticker])
    ###


#####
# Make some default stock objects for common stocks.
AAPL = MyStock(ticker='AAPL', sedol='03783310', country='US')
SAP  = MyStock(ticker='SAP',  sedol='484628',   country='DE')

Python Guru

How To Take Out The Column Index Name In Dataframe

Solution 1:

Solution 2:

Post a Comment for "How To Take Out The Column Index Name In Dataframe"