Credit Card Transaction Classification In Python
Solution 1:
I came up with a solution but it could probably take long time for large DataFrames:
def func(x):
global df_lookup
for i in df_lookup['Name'].values:
if i in x:
return df_lookup.loc[df_lookup['Name'] == i, 'Category'].values[0]
df_lookup = df_lookup.append({'Name': x, 'Category': 'Needs Category'}, ignore_index=True)
return'Needs Category'
df1['Category'] = df1['Description'].apply(lambda x: func(x))
If you have Data for which there is no category in df_lookup
, e.g. GOOGLE 5555555555
, then you would get the following outputs.
output for df1
:
Description Amount Category
0 AMAZON.COM*ajlja09ja AMZN.COM 10 Amazon
1 AMZN Mktp US *ajlkadf 15 Amazon
2 AMZN Prime *an9adjah 20 Amazon
3 Shell Oil 4106541031 20 Gas
4 Shell Oil 4163046510 25 Gas
5 GOOGLE 5555555 10 Needs Category
output for df_lookup
:
Name Category
0 AMAZON Amazon
1 AMZN Amazon
2 Shell Oil Gas
3 GOOGLE 5555555 Needs Category
With this code you iterate over df_lookup
for each row in df1
so it could not be the most efficient method with lots of categories in df_lookup
Solution 2:
You can try the following. It makes a Series
that contains sets with all the matching categories (empty if none are matching, or with multiple values if there are multiple matches). There is an explicit loop, but it is on the lookup table (presumably much smaller than df1
, the DataFrame to categorize):
result = pd.Series([set()] * len(df1), index=df1.index, name='Categories')
dstr = df1['Description'].strfor k, name in df_lookup.set_index('Category')['Name'].items():
idx = dstr.contains(name)
result.loc[idx] = result.loc[idx].apply(lambda s: s|{k})
You could assign this to a new column of df1
, or use it in any way you like.
On your example:
>>> df1.assign(categories=result)
Description Amount categories
0 AMAZON.COM*ajlja09ja AMZN.COM 10 {Amazon}
1 AMZN Mktp US *ajlkadf 15 {Amazon}
2 AMZN Prime *an9adjah 20 {Amazon}
3 Shell Oil 410654103120 {Gas}
4 Shell Oil 416304651025 {Gas}
Post a Comment for "Credit Card Transaction Classification In Python"