Skip to content Skip to sidebar Skip to footer

Typeerror: Unsupported Operand Type(s) For -: 'str' And 'str' In Python 3.x Anaconda

I am trying to count some instances per hour time in a large dataset. The code below seems to work fine on python 2.7 but I had to upgrade it to 3.x latest version of python with a

Solution 1:

I think you need change header=0 for select first row to header - then column names are replace by list cols.

If still problem, need to_numeric, because some values in StartTime and StopTime are strings, which are parsed to NaN, replace by 0 an last convert column to int:

cols = ['UserId', 'UserMAC', 'HotspotID', 'StartTime', 'StopTime']
df = pd.read_csv('canada_mini_unixtime.csv', header=0, names=cols)
#print (df)df['StartTime'] = pd.to_numeric(df['StartTime'], errors='coerce').fillna(0).astype(int)
df['StopTime'] =  pd.to_numeric(df['StopTime'], errors='coerce').fillna(0).astype(int)

No change:

df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime
start= pd.to_datetime(df.StartTime.min(), unit='s').date()
end= pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)

freq ='1H'  # 1Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)

# 1hourin seconds, minus onesecond (so that we will not count it twice)
interval=60*60-1

r['LogCount'] =0
r['UniqueIDCount'] =0

ix is deprecated in last version of pandas, so use loc and column name is in []:

for i, row in r.iterrows():
        # intervals overlap test# https://en.wikipedia.org/wiki/Interval_tree#Overlap_test# i've slightly simplified the calculations of m and d# by getting rid of division by 2,# because it can be done eliminating common terms
    u = df.loc[np.abs(df.m - 2*row.start - interval) < df.d + interval, 'UserId']
    r.loc[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]

r['Date'] = pd.to_datetime(r.start, unit='s').dt.date
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time

print (r)

Solution 2:

df['d'] = df.StopTime - df.StartTime is attempting to subtract a string from another string. I don't know what your data looks like, but chances are that you want to parse StopTime and StartTime as dates. Try

df = pd.read_csv(fn, header=None, names=cols, parse_dates=[3,4])

instead of df = pd.read_csv(fn, header=None, names=cols).

Post a Comment for "Typeerror: Unsupported Operand Type(s) For -: 'str' And 'str' In Python 3.x Anaconda"