Pipelining Transformer Stages In IBEX, Column Access Problems In Scikit-Learn And Pandas
I'm trying to create a scikit-learn based pipeline to pipeline through a pandas dataframe. At each stage, only a subset of features should be touched, the rest should pass through
Solution 1:
ibex
(which I co-wrote) makes extensive use of Pandas multilevel indexes.
Suppose we start with
import pandas as pd
df = pd.DataFrame({'source': [2, 44], 'class': [0, 1], 'x': [0, 5], 'y': [0, 6], 'z': [0, 8], 'w': 10})
>>> df
class source w x y z
0 0 2 10 0 0 0
1 1 44 10 5 6 8
Then the beginning of your pipeline gives
>>> (trans(LabelEncoder(), in_cols=['class']) + trans(None, ['source', 'x','y','z'])).fit_transform(df)
functiontransformer_0 functiontransformer_1
0 source x y z
0 0 2 0 0 0
1 1 44 5 6 8
This is by design.
You can achieve what you want by writing the pipeline as:
p = (
trans(LabelEncoder(), in_cols="class")
+ trans(StandardScaler(), in_cols=["x", "y", "z"])
+ trans(None, in_cols="source")
)
>>> p.fit_transform(df)
functiontransformer_0 functiontransformer_1 functiontransformer_2
0 x y z source
0 0 -1.0 -1.0 -1.0 2
1 1 1.0 1.0 1.0 44
Post a Comment for "Pipelining Transformer Stages In IBEX, Column Access Problems In Scikit-Learn And Pandas"