Skip to content Skip to sidebar Skip to footer

Python Parse Dataframe Element

I have a pandas dataframe column (Data Type) which I want to split into three columns target_table_df = LoadS_A [['Attribute Name', 'Data Type',

Solution 1:

Use target_table_df['Data Type'].str.extract(pattern)

You'll need to assign pattern to be a regular expression that captures each of the components you're looking for.

pattern = r'([^\(]+)(\(([^,]*),(.*)\))?'

([^\(]+) says grab as many non-open parenthesis characters you can up to the first open parenthesis.

\(([^,]*, says to grab the first set of non-comma characters after an open parenthesis and stop at the comma.

,(.*)\) says to grab the rest of the characters between the comma and the close parenthesis.

(\(([^,]*),(.*)\))? says the whole parenthesis thing may not even happen, grab it if you can.

Solution

everything together looks like this:

pattern = r'([^\(]+)(\(([^,]*),(.*)\))?'
df = s.str.extract(pattern, expand=True).iloc[:, [0, 2, 3]]

# Formatting to get it how you wanted
df.columns = ['Data Type', 'Precision', 'Scale']
df.index.name = Noneprint df

I put a .iloc[:, [0, 2, 3]] at the end because the pattern I used grabs the whole parenthesis in column 1 and I wanted to skip it. Leave it off and see.

  Data Type Precision Scale
0decimal1841    number        1102date       NaN   NaN
3decimal1844decimal1845    number        110

Post a Comment for "Python Parse Dataframe Element"