Dashboards - Example #3

Using data from https://datahub.io/core/global-temp

The data runs from 1880 to 2016. Unfortunately not up to date. Will tackle this issue in a further example.

A new issue found in April 2024 is that the JSON file (https://datahub.io/core/global-temp/datapackage.json) no longer gives a resolvable path to the data that needs now to be manually prefixed with https://r2.datahub.io/clt98lqg6000el708ja5zbtz0/master/raw/. That URL does not look stable but we will see.

Data are included from the GISS Surface Temperature (GISTEMP) analysis and the global component of Climate at a Glance (GCAG). Anomalies in degrees Celsius.

GISTEMP: Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies [i.e. deviations from the corresponding 1951-1980 means].

GCAG: Global temperature anomaly data come from the Global Historical Climatology Network-Monthly (GHCN-M) data set and International Comprehensive Ocean-Atmosphere Data Set (ICOADS), which have data from 1880.

These two data-sets are blended into a single product to produce the combined global land and ocean temperature anomalies.

Data Extraction

Resource
name : annual
--------------------------------------------- 

Resource
name : monthly
       Source     Date    Mean
0        GCAG  2016-12  0.7895
1     GISTEMP  2016-12  0.8100
2        GCAG  2016-11  0.7504
3     GISTEMP  2016-11  0.9300
4        GCAG  2016-10  0.7292
...       ...      ...     ...
3283  GISTEMP  1880-03 -0.1800
3284     GCAG  1880-02 -0.1229
3285  GISTEMP  1880-02 -0.2100
3286     GCAG  1880-01  0.0009
3287  GISTEMP  1880-01 -0.3000

[3288 rows x 3 columns]
Converting to date type...
       Source       Date    Mean
0        GCAG 2016-12-01  0.7895
1     GISTEMP 2016-12-01  0.8100
2        GCAG 2016-11-01  0.7504
3     GISTEMP 2016-11-01  0.9300
4        GCAG 2016-10-01  0.7292
...       ...        ...     ...
3283  GISTEMP 1880-03-01 -0.1800
3284     GCAG 1880-02-01 -0.1229
3285  GISTEMP 1880-02-01 -0.2100
3286     GCAG 1880-01-01  0.0009
3287  GISTEMP 1880-01-01 -0.3000

[3288 rows x 3 columns]
Source            object
Date      datetime64[ns]
Mean             float64
dtype: object
---------------------------------------------

Global Temperature Time Series from the two data-sets; the same message is delivered by both and is obvious. The Source key can be toggled to reveal one or other data-set. The Zoom feature is however problematic - requiring two clicks e.g. Click “+” and a second click on Line Plot to show the plot change. Perhaps a HTML/CSS generation issue or web-browser compatibility issue. Issue saved to be addressed later.

Global Temperature Time Series from the two data-sets

The same data as in the Line Plot now represented in a different way in an attempt to present a more striking view of climate change. To some extent it works, but yet it is confusing for the casual observer since a color scale is produced using a histogram function on Mean (already an average of multiple observations). So for 6 months we have 6 observations of temperature anomalies and these are binned according to their value determined by a resolution parameter (nbinsy is set to 20), then a maximum is taken. That is why we get multiple observations per date range. Another issue is that all cells in the matrix(x,y) have a value even if they represent 0 observations and therefore max of Mean is 0 (this issue is revealed as mouse-over pop-ups are enabled). This plot is both a hit (yes, it’s striking) and a miss at the same time. Overall, it’s a flop because visualization should be straight forward to understand. In any case, need to solve the issue of retrieving contemporaneous data and will re-address visualization at this stage with a further example.

Plotted with 6 month resolution

---
title: "Example #3"
format: dashboard
---

```{r}
# use of reticulate allows variables created in python
# to be accessed by R
library(reticulate)
```

```{python}
# comments
import pandas as pd
import math
import plotly.express as px
import datapackage

```

# Global Temperature Data

## Row {height="30%"}

Using data from <https://datahub.io/core/global-temp>

The data runs from 1880 to 2016. Unfortunately not up to date. Will tackle this issue in a further example.

A new issue found in April 2024 is that the JSON file (https://datahub.io/core/global-temp/datapackage.json) no longer gives a
resolvable path to the data that needs now to be manually prefixed with
https://r2.datahub.io/clt98lqg6000el708ja5zbtz0/master/raw/. That URL does not look stable but we will see.

Data are included from the GISS Surface Temperature (GISTEMP) analysis and the global component of Climate at a Glance (GCAG).
Anomalies in degrees Celsius.

GISTEMP: Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies \[i.e. deviations from the corresponding 1951-1980
means\].

GCAG: Global temperature anomaly data come from the Global Historical Climatology Network-Monthly (GHCN-M) data set and
International Comprehensive Ocean-Atmosphere Data Set (ICOADS), which have data from 1880.

These two data-sets are blended into a single product to produce the combined global land and ocean temperature anomalies.

## Row {height="70%"}

```{python}
#| title: Data Extraction 


data_url = 'https://datahub.io/core/global-temp/datapackage.json'
new_data_url_prefix = 'https://r2.datahub.io/clt98lqg6000el708ja5zbtz0/master/raw/' 

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    print("Resource")
    desc = resource.descriptor
    for key in desc.keys():
        if (False):
          print(key, ":", desc[key])
        if (key == 'description' or key == 'name'):
          print(key, ":", desc[key])
    if resource.tabular and resource.name == "monthly":
        prefixed_data_url = new_data_url_prefix + resource.descriptor['path']
        df = pd.read_csv(prefixed_data_url)
        print(df)
        # convert to date
        print("Converting to date type...")
        df['Date'] = pd.to_datetime(df['Date'])
        print(df)
        print(df.dtypes)
    print('---------------------------------------------', "\n")
    
  

```

# Line Plot

## Row {height="10%"}

Global Temperature Time Series from the two data-sets; the same message is delivered by both and is obvious. The Source key can be
toggled to reveal one or other data-set. The Zoom feature is however problematic - requiring two clicks e.g. Click "+" and a
second click on Line Plot to show the plot change. Perhaps a HTML/CSS generation issue or web-browser compatibility issue. Issue
saved to be addressed later.

## Row {height="90%"}

```{python}
#| title: Global Temperature Time Series from the two data-sets

fig = px.line(
  df, x="Date", y="Mean", 
  color="Source", line_group="Source", 
  title='Mean Temp Difference to Reference Range', 
  markers=True)

# Set x-axis date range
start_date = '1860-01-01'
end_date = '2030-01-01'
fig.update_xaxes(range=[start_date, end_date])

```

# HeatMaps

The same data as in the Line Plot now represented in a different way in an attempt to present a more striking view of climate
change. To some extent it works, but yet it is confusing for the casual observer since a color scale is produced using a histogram
function on Mean (already an average of multiple observations). So for 6 months we have 6 observations of temperature anomalies
and these are binned according to their value determined by a resolution parameter (nbinsy is set to 20), then a maximum is taken.
That is why we get multiple observations per date range. Another issue is that all cells in the matrix(x,y) have a value even if
they represent 0 observations and therefore max of Mean is 0 (this issue is revealed as mouse-over pop-ups are enabled). This plot
is both a hit (yes, it's striking) and a miss at the same time. Overall, it's a flop because visualization should be straight
forward to understand. In any case, need to solve the issue of retrieving contemporaneous data and will re-address visualization
at this stage with a further example.

## Row

```{python}
#| title: Plotted with 6 month resolution

# 3288 is the number of rows
# divide by 2 data-sets
# Divide by 6 months 
n_rows_per_data_set = 3288/2
nbinsx = int(n_rows_per_data_set/6)
nbinsy=20
fig = px.density_heatmap(df, x="Date", y="Mean",
  nbinsx=nbinsx, nbinsy=nbinsy, 
  z="Mean", 
  histfunc="max",
 color_continuous_scale=["blue","lightgray","red"],
 color_continuous_midpoint=0,
 title = "number of date bins: " + str(nbinsx) + ", number of Mean bins: " + str(nbinsy),
 facet_col = "Source"
 )
fig.show()
    
```

# Code

## Row

```{python}
from pathlib import Path
from textwrap import wrap

# wrap just the long lines over a specified number of characters
def wrap_long_lines(input_text, max_line_chars):
    lines = input_text.split('\n')
    wrapped_lines = []
    for line in lines:
        if len(line) > max_line_chars:
            wrapped_lines.extend(wrap(line, width=max_line_chars))
        else:
            wrapped_lines.append(line)
    wrapped_text = '\n'.join(wrapped_lines)
    return wrapped_text

txt = Path('example_3.qmd').read_text()
# its a wrap!
# those long comments are now readable the Code tab
wrapped_txt = wrap_long_lines(txt, 130)
print(wrapped_txt)

```