Water Hot Plot: Fine-Scale Water Quality Visualization with Plotnine

2024 Plotnine Competition Submission

Author

Cameron Roberts

Introduction

I am pleased to submit this Quarto document for the 2024 Plotnine competition. This visualization demonstrates the capabilities of Plotnine in creating compelling and informative data representations.

Visualization Approach

I’ve adapted a successful visualization concept I had originally developed in R using ggplot2, translating and enhancing it with Plotnine. While I initially explored using a ridgeline plot for data concentration visualization, I discovered that a heatmap implementation in Plotnine yielded surprisingly effective results, surpassing my expectations from earlier attempts.

Data Source and Description

The visualization is based on fine-scale water quality monitoring data from the Queensland Government Department of Environment, Science and Innovation (DESI). Specifically, it utilizes data from the Lower Herbert catchment near Ingham, Queensland, Australia.

Key data characteristics:

  • Source: Water Quality and Investigations team, DESI
  • Coverage: 17 in-situ nitrate (as N) sensors
  • Sensor read frequency: 15-minute intervals
  • Date Range: 2020 - 2024
  • Aggregation: Monthly, across all monitoring sites

For more information on fine-scale water quality monitoring in Queensland, please refer to the Queensland Governments Publications.

Data Import and Preparation

Data comes from DESI’s Water Quality and Investigations monitoring network for the lower Herbert catchment near Ingham, Queensland. The network includes 17 in-situ nitrate sensors reading every 15 minutes since 2020. Due to remote deployment challenges, sensor uptime varies. Data from 2020-2024 across all sites was extracted and provides sufficient coverage to attain monthly data summaries for each monitoring site.

Code
import pandas as pd
from datetime import datetime
from plotnine import *

#import concentration data
df = pd.read_csv('data/catchment_NNO3_all.csv')
df.head()
GSnum name ts Value Quality
0 1160116 Catherina Creek at Catherina Creek Road 2020-12-27T00:00:00Z 0.142 1
1 1160116 Catherina Creek at Catherina Creek Road 2020-12-27T00:15:00Z 0.153 1
2 1160116 Catherina Creek at Catherina Creek Road 2020-12-27T00:30:00Z 0.133 1
3 1160116 Catherina Creek at Catherina Creek Road 2020-12-27T00:45:00Z 0.129 1
4 1160116 Catherina Creek at Catherina Creek Road 2020-12-27T01:00:00Z 0.130 1

Site-specific metadata provides context for the site locations and the site ‘types’ to help understand drivers for the installations.

  • Site types are categorized as:
    • Reference: Situated in the upper catchment above contaminant sources, intended to provide a natural baseline.
    • Impact: Located directly downstream of land-use types known to contribute to nutrient to the waterways (ie. agriculture/urban land use)
    • End-of-system: Located at the most seaward practical monitoring point along the river or creek. The intent is to capture the maximum extent of upstream land use while avoiding the complexities of monitoring in the estuary.
Code
#import site information
sitelist = pd.read_csv('data/sitelist.csv')
sitelist.head()
Basin Catchment GSnum Site name Site code Latitude (GDA2020) Longitude (GDA2020) Site type Stream order Stream habit
0 Herbert Herbert River 1160115 Broadwater Creek at Day Use BCD -18.41633 145.94393 Reference 4 Natural
1 Herbert Catherina Creek 1160116 Catherina Creek at Catherina Creek Road CCC -18.59907 146.23627 End of System 3 Natural
2 Herbert Herbert River 1160117 Elphinstone Creek at Copley Road ECC -18.46506 145.96146 Impact 4 Natural
3 Herbert Francis Creek 1160118 Francis Creek at Weir FCW -18.76673 146.13407 Impact 5 Ephemeral
4 Herbert Herbert River 1160119 Herbert River at John Row Bridge HRJ -18.62831 146.16486 End of System 7 Tidal

Data joining and Preprocessing

In order to work with these data in Plotnine we must join the data frames to include the nessecary metadata for the visualisations we are wanting. The result is one dataframe that will allow for a simpler integration with the graphics.

Code
# Subset to include only 'GSnum' and 'Value' columns
subset_df = df[['GSnum', 'Value']]

# Group by 'GSnum' and calculate the median for the 'Value' and 'Quality' columns
median_summary = subset_df.groupby('GSnum').median().reset_index()

# Rename the 'Value' column to 'median'
median_summary.rename(columns={'Value': 'median'}, inplace=True)

# Perform a left join of the original DataFrame with the median summary DataFrame
merged_df = pd.merge(df, median_summary, on='GSnum', how='left')
merged_df = pd.merge(merged_df, sitelist[['GSnum', 'Site type', 'Site code']], on='GSnum', how='left')

# Concatenate columns with separator
merged_df['combined_col'] = merged_df['Site type'] + ' // ' + merged_df['Site code']

Date Parsing and Monthly Aggregation

The data needs to be aggregated by month. A seperate column has been created to allow for ‘prettier’ month names for the plot.

Code
# Parse the 'Date' column from the timestamp
merged_df['ts'] = pd.to_datetime(merged_df['ts'])

# Group by month and calculate median of 'Value' column
monthly_median = merged_df.groupby([merged_df['ts'].dt.month, 'name', 'combined_col'])['Value'].max()

# Convert the Series to a DataFrame
df = monthly_median.reset_index(name='value')

# Create a list of month names
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']

# Map the month numbers to month names
df['month_name'] = df['ts'].apply(lambda x: month_names[x-1])
df
ts name combined_col value month_name
0 1 Broadwater Creek at Day Use Reference // BCD 0.067 Jan
1 1 Catherina Creek at Catherina Creek Road End of System // CCC 3.072 Jan
2 1 Elphinstone Creek at Copley Road Impact // ECC 1.273 Jan
3 1 Francis Creek at Weir Impact // FCW 3.712 Jan
4 1 Herbert River at John Row Bridge End of System // HRJ 0.384 Jan
... ... ... ... ... ...
199 12 Stone River at Running Creek Reference // SRR 0.283 Dec
200 12 Stone River at Venables Crossing Impact // SRV 1.130 Dec
201 12 Trebonne Creek at Bruce Highway Impact // TBH 5.603 Dec
202 12 Waterview Creek at Jourama Road Reference // WCJ 0.282 Dec
203 12 Waterview Creek at Mammarellas Road End of System // WCM 1.481 Dec

204 rows × 5 columns

Visualization

Code
(
    ggplot(df, aes(x='ts', y='combined_col', fill='value')) +
    geom_tile(aes(width=0.95, height=0.95)) +
    theme_dark(base_size=11, base_family=None) +
    geom_text(aes(label="value.round(2).astype(str)"), size=6, color='#DDDEDF', show_legend=False) +
    scale_fill_cmap(cmap_name="inferno", name="Nitrate-N Concentration (mg/L)") +
    #scale_fill_gradient(cmap_name="viridis", low="#3BC4A4", high="#CC334E") +  # Adjust low and high colors as needed
    scale_x_continuous(breaks=range(1, 13), labels=month_names, name="") +
    scale_y_discrete(name="Site type and ID") +  # Change y-axis label
    theme(axis_text_x=element_text(angle=30, hjust=1.5),
          legend_position='top',  # Move legend to the top
          legend_direction='horizontal') +  # Horizontal orientation of legend
    labs(title="Seasonal and Spatial Variations in Maximum Monthly \nNitrate-N Concentration (2020-2024)",
         subtitle="A Heatmap Analysis of Queensland's Lower Herbert Catchment")
)

This graph displays the maximum monthly Nitrate-N concentration (mg/L) for various sites over 4 years of monitoring (2020 - 2024). Here are some key observations:

  • The Impact and End of System sites display higher nitrate concentrations.
  • Reference sites generally maintain lower concentrations throughout the year.
  • Highest concentrations are seen between November - January (coinciding with the wet season onset in the region)
  • Some sites show very high concentrations (above 4 mg/L) in certain months.
    • The highest recorded concentration appears to be 6.95 mg/L at the End of System site CCC in December.
  • Some sites show clear seasonal patterns, while others have more sporadic high concentration months.
    • Periods of elevated concentration outside of the wet season may indicate specific impacts from fertiliser re-application and unseasonal or unexpected rainfall driven runoff events.