Skip to content Skip to sidebar Skip to footer

Pandas: For Groups Of Rows Where 2 Or More Particular Columns Values Are Exactly The Same, How To Assign A Unique Integer As A New Column

In a Pandas dataframe, I have groups of rows where the values for 2 particular columns are exactly the same. How do I add a new column for those rows, that assigns a unique integer

Solution 1:

Using groupby with sort=False and ngroup

df[3] = df.groupby([1,2], sort=False).ngroup()+1

Out[1261]:
          0   1   2  3
0    plane1      az  1
1    plane2      az  1
2    plane3  az      2
3    plane4  az      2
4    plane5  ny      3
5    plane6  ny      3
6    plane7  fl  fl  4
7    plane8  fl  fl  4
8   plane10      de  5
9   plane11      de  5
10  plane12      mo  6
11  plane13      mo  6

Solution 2:

In your case factorize after convert to tuple

df[3]=pd.factorize(df[[1,2]].apply(tuple,1))[0]+1
df
          01230    plane1      az  11    plane2      az  12    plane3  az      23    plane4  az      24    plane5  ny      35    plane6  ny      36    plane7  fl  fl  47    plane8  fl  fl  48   plane10      de  59   plane11      de  510  plane12      mo  611  plane13      mo  6

Or

pd.factorize(df[[1,2]].replace('',' ').sum(1))[0]+1

Or using category with cat.codes

df[[1,2]].apply(tuple,1).astype(category).cat.codes

And if you just want the unique values you can check with hash

df[[1,2]].apply(tuple,1).apply(hash,1)

Post a Comment for "Pandas: For Groups Of Rows Where 2 Or More Particular Columns Values Are Exactly The Same, How To Assign A Unique Integer As A New Column"