Skip to content Skip to sidebar Skip to footer

How To Calculate Dictionaries Of Lists Using Pandas Dataframe?

I have two strings in Python3.x, which are defined to be the same length: string1 = 'WGWFTSJKPGP' string2 = 'DORKSRQKYJG' I am also given an integer which is meant to represent th

Solution 1:

Use -

defdict_op(x):
    string1 = x['column1']
    string2 = x['column2']
    start_pos = x['start']
    x['val'] = {i: i + start_pos for i, _ inenumerate(string1)}
    return x

defzip_dict(x):
    b=pd.DataFrame(x)
    return {i:b.loc[:,i].tolist() for i in b.columns }

op = df.apply(dict_op, axis=1).groupby('column1')['val'].apply(list).apply(zip_dict)
print(op)

Output

column1LJNVTJOY      {0: [31, 52, 84], 1: [32, 53, 85], 2: [33, 54,...MXRBMVQDHF    {0: [79], 1: [80], 2: [81], 3: [82], 4: [83], ...WHLAOECVQR    {0: [18], 1: [19], 2: [20], 3: [21], 4: [22], ...Name:val, dtype:object

Explanation

The dict_op reuses your code to create the dict for every row and then the .apply(list) zips the dicts together to form a list of dicts.

The zip_dict() then creates the output dict out of the interim output.

The last piece that I haven't included is the part where if the length of the list is 1 then you can include the first element only, taking the output from {0: [79], 1: [80], 2: [81], 3: [82], 4: [83], ... to {0: 79, 1: 80, 2: 81, 3: 82, 4: 83, ...

Solution 2:

First apply groupby function to aggregate the "start" column as a list

df2 = df.groupby("column1")["start"].apply(list).reset_index()

Now, you can write a function to create the new dictionary column

def create_dict(row):
    new_dict = {}
    for i, j in enumerate(row["column1"]):
        if len(row["start"]) == 1:
            new_dict[i] = row["start"][0]+i
        else:
            for k in row["start"]:
                if i in new_dict:
                    new_dict[i].append(k + i)
                else:
                    new_dict[i] = [k + i]
    return new_dict

Finally, apply this function to all the rows of df2

df2["new_column"] = df2.apply(create_dict, axis = 1)

Solution 3:

Here's a slightly different approach using a lambda and two zips.

df2 = df.groupby('column1')['start'].agg([('s', list)]).reset_index()
df2['l'] = df.column1.str.len()

df2.apply(lambda x: dict(zip(range(x['l'] + 1), zip(*[range(s, s + x['l'] + 1) for s in x['s']]))), axis = 1)

The truncated output of that can be seen here (note that it returns tuples rather than lists):

0    {0:(31, 52, 84), 1:(32, 53, 85), 2:(33, 54,...1    {0:(79,), 1:(80,), 2:(81,), 3:(82,), 4:(8...2    {0:(18,), 1:(19,), 2:(20,), 3:(21,), 4:(2...

First, to cut down on the length of the apply step, create a DataFrame with the column1 values and the associated starting positions. In addition, add a column with the length of column1 (assuming that the equal length assertion holds).

After that, it's a matter of combining the range of column1 letter indices (0 through len(column1), which serves as the keys, and the same range offset by the start value(s).

Things get a little dicey with the second zip because [range(s, s + x['l'] + 1) for s in x['s']] returns something that looks like this (for 'LJNVTJOY'):

[[31, 32, 33, 34, 35, 36, 37, 38, 39],
 [52, 53, 54, 55, 56, 57, 58, 59, 60],
 [84, 85, 86, 87, 88, 89, 90, 91, 92]]

When we really want to group the elements aligned vertically, so we use the 'splat' or 'unpacking' operator to feed these lists into zip. Once we've combined those lists, we have a list of keys and a list (of tuples) of values, which can be zipped into a dict.

Post a Comment for "How To Calculate Dictionaries Of Lists Using Pandas Dataframe?"