Sample Maximum Possible Data Points From Distribution To New Distribution
Context Assume there is a distribution of three nominal classes over each calendar week from an elicitation, e.g. like this: | Week | Class | Count | Distribution | Desired Distrib
Solution 1:
You can try calculate the maximal total count for each week, then multiply that with the desired distribution. The idea is
- Devide the
Count
byDesired Distribution
to get the possible total - Calculate the minimal possible total for each week with
groupby
- Then multiply the possible totals with the
Desired Distribution
to get the sample numbers.
In code:
df['new_count'] = (df['Count'].div(df['Desired Distribution'])
.groupby(df['Week']).transform('min')
.mul(df['Desired Distribution'])
//1
).astype(int)
Output:
Week Class Count Distribution Desired Distribution new_count
01A9540.360.5595411B5540.210.2950321 C 11450.430.1627732A4540.210.5545442B9440.440.2923952 C 7480.350.16132
Post a Comment for "Sample Maximum Possible Data Points From Distribution To New Distribution"