Python: How To Add A Secondary X Axis For A Single Trace?

July 25, 2024 Post a Comment

I have a DataFrame (see 'Test Data' section below) and I would like to add a secondary x axis (at the top). But this axis has to be from 0 to 38.24(ms). This is the sum of all valu

Solution 1:

Further to your positive comment regarding plotly, here is an example of how to achieve a multi-xaxis for your dataset.

The code is a lot simpler than it looks. The code appears 'lengthy' due to the way I've formatted the dicts for easier reading.

The key elements are:

Adding a cumulative sum of the time column (time_c) for use on xaxis2.
Adding a hidden trace which aligns to xaxis, and your time data which aligns to xaxis2. Without the hidden trace, either both axes do not appear, or they appear but are not aligned, due to only one trace being plotted.

(Updated) Sample Code:

The following code has been updated to address the issue OP was having with a larger (70k row) dataset.

The key change is an update to the layout['xaxis'] and layout['xaxis2'] dicts to contain 'type': 'category', 'nticks' and defined 'range' keys.

import pandas as pd
from plotly.offline import plot

# Create the dataset.
raw_data = {'time': [21.9235, 4.17876, 4.02168, 3.81504, 4.2972],
            'tpu': [33.3, 33.3, 33.3, 33.3, 33.3],
            'cpu': [32, 32, 32, 32, 32],
            'memused': [435.92, 435.90, 436.02, 436.02, 436.19]}

df = pd.DataFrame(raw_data)
df['time_c'] = df['time'].cumsum().round(2)

# Plotting code.
data = []
layout = {'margin': {'t': 105},
          'title': {'text': 'Example Showing use of Secondary X-Axis', 
                    'y': 0.97}}

# Create a (hidden) trace for the xaxis.
data.append({'x': df.index,
             'y': df['memused'],
             'showlegend': False,
             'mode': 'markers', 
             'marker': {'size': 0.001}})
# Create the visible trace for xaxis2.
data.append({'x': df['time_c'],
             'y': df['memused'],
             'xaxis': 'x2',
             'name': 'Inference'})

# Configure graph layout.
nticks = int(df.shape[0] // (df.shape[0] * 0.05))
layout['xaxis'] = {'title': 'Number of Inferences',
                   'nticks': nticks,
                   'range': [df.index.min(), df.index.max()],
                   'tickangle': 45,
                   'type': 'category'}
layout['xaxis2'] = {'title': 'Time(ms)', 
                    'nticks': nticks,
                    'overlaying': 'x1', 
                    'range': [df['time_c'].min(), df['time_c'].max()],
                    'side': 'top', 
                    'tickangle': 45,
                    'type': 'category'}
layout['yaxis'] = {'title': 'Memory Used (MB)'}

fig = {'data': data, 'layout': layout}
plot(fig, filename='/path/to/graph.html')

Example Graph (original dataset):

I've intentionally left out any additional appear configuration for code simplicity. However, referring to the top level plotly docs, the graphs are highly configurable.

Example Graph (new dataset):

This graph uses the (larger, 70k row) synthesised dataset from the other answer.

Solution 2:

Although generally discouraged, I'll post another answer to address the new dataset, as the previous answer works, given the original dataset.

This example diverges from the original request of a secondary x-axis for two reasons:

Due to the size of the (new) dataset, plotting a 'hidden' layer of data is not optimal.
For a secondary x-axis to display properly, a second trend must be plotted, and given the previous reason, this is no longer an option.

Therefore, a different approach has been taken - that of combined labeling of the x-axis. Rather than plotting two axes, the single x-axis features both required labels.

Example Graph:

Note: This is (obviously) synthesised data, in order to achieve the number of rows (70k) in the updated question.

Sample Code:

import numpy as np
import pandas as pd
from plotly.offline import plot

# Synthesised dataset. (This code can be ignored.)
np.random.seed(0)
a = np.random.exponential(size=70000)*4
t = pd.Series(a).rolling(window=2000, min_periods=50).mean().to_numpy()
r = np.arange(70000).astype(str)
m = t*100

df = pd.DataFrame({'run': r, 
                   'time': t,
                   'memused': m}).dropna()

# Add cumulative time column.
df['time_c'] = df['time'].cumsum().round(1)


# --- Graphing code starts here ---defcreate_labels(x):
    """Function to create xaxis labels."""returnf"({x['run']}): {x['time_c']}"# Create xaxis labels.
df['xaxis'] = df.apply(create_labels, axis=1)

# Create the graph.
data = []
layout = {'title': 'Combined X-Axis Labeling'}
data.append({'x': df['xaxis'], 
             'y': df['memused']})

layout['xaxis'] = {'title': '(Inference): Cumulative Time (ms)', 
                   'type': 'category', 
                   'nticks': df.shape[0] // 3500,
                   'tickangle': 45}
layout['yaxis'] = {'title': 'Memory Used (MB)'}


fig = {'data': data, 'layout': layout}
plot(fig, filename='/path/to/graph.html')

Python Guru