--- title: "Using the rIACI Package for the Iberian Actuarial Climate Index Calculations" author: "Nan Zhou" date: "r Sys.Date()" output: rmarkdown::html_vignette: toc: true toc_depth: 4 keep_md: true bibliography: references.bib vignette: > %\VignetteIndexEntry{Introduction to rIACI} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include = FALSE} if (interactive()) { knitr::opts_chunk$set( collapse = TRUE, comment = "#>", root.dir = getwd() ) } else { knitr::opts_chunk$set( collapse = TRUE, comment = "#>", root.dir = tempdir() ) } options(rmarkdown.html_vignette.check_title = FALSE) ``` # Introduction In our paper [@Zhou2023], we follow the work of North American actuaries, [**Actuaries Climate Index (ACI)**], and have created an index to show how climate changes in the Iberian Peninsula, the Iberian Actuarial Climate Index. The rIACI package is designed for climatologists and researchers working with climate data, particularly those interested in calculating climate indices such as the Iberian Actuarial Climate Index (IACI). This package provides tools to: - Download ERA5-Land data from the ECMWF Climate Data Store. - Process and merge NetCDF files. Export data to CSV format for individual grid points. - Calculate various standardized climate indices. - Aggregate indices to compute the IACI. This vignette will guide you through the steps to use the rIACI package effectively. ## Accessing Example Data Files The package includes example data files stored in the inst/extdata folder. To ensure that these files can be correctly accessed regardless of where the package is installed, you should use the system.file() function. For example: ```{r, eval=FALSE} # Get the full path of an example NetCDF file from inst/extdata/testdata example_nc <- system.file("extdata", "testdata", "1960_1.nc", package = "rIACI") cat("Example NetCDF file path:", example_nc, "\n") # Get the full path of an example CSV file from inst/extdata/testcsv example_csv <- system.file("extdata", "testcsv", "36.2_-5.6.csv", package = "rIACI") cat("Example CSV file path:", example_csv, "\n") ``` ## Prerequisites Before using the **rIACI** package, ensure you have the following: - An ECMWF user ID and API key to access the Climate Data Store. - Python installed on your system, along with the necessary Python packages (`xarray`, `pandas`, `numpy`, etc.). - The `ecmwfr` package installed in R for downloading data. - The `reticulate` package in R for interfacing with Python scripts. # Installation {#installation} You can install the **rIACI** package from GitHub using the `devtools` package: ```{r, eval=FALSE} # Install devtools if you haven't already install.packages("devtools") # Install rIACI from GitHub devtools::install_github("https://github.com/Nan-Z-byte/rIACI") ``` # Workflow Overview The general workflow using **rIACI** involves the following steps: 1. **Download ERA5-Land data** using the `download_data()` function. 2. **Process the downloaded data** with `process_data()`, `export_data_to_csv()`, and `csv_to_netcdf()`. 3. **Create a climate input object** using `climate_input()`. 4. **Calculate various climate indices** such as TX90p, TX10p, TN90p, TN10p, Rx5day, CDD, W90p, and Sea. 5. **Integrate sea level data** using `sea_input()`. 6. **Generate the IACI output** with `iaci_output()` or `output_all()`. # Downloading ERA5-Land Data {#downloading-era5-land-data} The `download_data()` function allows you to download ERA5-Land data from the ECMWF Climate Data Store for specified variables, years, months, and geographical areas.. ```{r, eval=FALSE} download_data(start_year, end_year, start_month = 1, end_month = 12, variables = c("10m_u_component_of_wind", "10m_v_component_of_wind", "2m_temperature", "total_precipitation"), dataset = "reanalysis-era5-land", area = c(North, West, South, East), output_dir = "cds_data", user_id, user_key, max_retries = 3, retry_delay = 5, timeout = 7200) ``` ## Parameters - **start_year** (`Integer`): Starting year for data download. - **end_year** (`Integer`): Ending year for data download. - **start_month** (`Integer`, default `1`): Starting month. - **end_month** (`Integer`, default `12`): Ending month. - **variables** (`Character vector`): Variables to download. Default includes common variables: - `"10m_u_component_of_wind"` - `"10m_v_component_of_wind"` - `"2m_temperature"` - `"total_precipitation"` - **dataset** (`Character`, default `"reanalysis-era5-land"`): Dataset short name. - **area** (`Numeric vector`): Geographical area as `c(North, West, South, East)`. Default is global. - **output_dir** (`Character`, default `"cds_data"`): Directory to save downloaded data. - **user_id** (`Character`): Your ECMWF user ID. - **user_key** (`Character`): Your ECMWF API key. - **max_retries** (`Integer`, default `3`): Maximum retry attempts for download failures. - **retry_delay** (`Numeric`, default `5`): Delay between retries in seconds. - **timeout** (`Numeric`, default `7200`): Timeout duration for each request in seconds. ## Example ```{r, eval=FALSE} # Set your ECMWF user ID and key user_id <- "your_user_id" user_key <- "your_api_key" # Define the geographical area (North, West, South, East) # Example: Iberian Peninsula roughly bounded by 44N, -10W, 35N, 5E area_iberia <- c(44, -10, 35, 5) # Download data form the year 1960 to 2023 download_data( start_year = 1960, end_year = 2023, area = area_iberia, user_id = user_id, user_key = user_key ) ``` # Processing Climate Data {#processing-climate-data} After downloading the data, you may need to process it before analysis. The package provides functions to handle NetCDF files and convert them to CSV format for easier manipulation. ## Process Data Function ### `process_data()` Processes NetCDF files in the input directory and saves merged and processed data to the output directory. #### Usage ```{r, eval=FALSE} process_data(input_dir, output_dir) ``` #### Parameters - **input_dir** (`Character`): Directory containing input NetCDF files. - **output_dir** (`Character`): Directory to save output files. #### Example ```{r, eval=FALSE} input_directory <- "cds_data" output_directory <- "processed_data" process_data(input_dir = input_directory, output_dir = output_directory) ``` ## Export Data to CSV Function ### `export_data_to_csv()` Exports data from a NetCDF file to CSV files, one for each latitude and longitude point. #### Usage ```{r, eval=FALSE} export_data_to_csv(nc_file, output_dir) ``` #### Parameters - **nc_file** (`Character`): Path to the NetCDF file. - **output_dir** (`Character`): Output directory to save CSV files. #### Example ```{r, eval=FALSE} netcdf_file <- "processed_data/2020_01.nc" csv_output_directory <- "csv_output" export_data_to_csv(nc_file = netcdf_file, output_dir = csv_output_directory) ``` ## CSV to NetCDF Function ### `csv_to_netcdf()` Merges CSV files in a specified directory into a single NetCDF file. #### Usage ```{r, eval=FALSE} csv_to_netcdf(csv_dir, output_file) ``` #### Parameters - **csv_dir** (`Character`): Directory containing CSV files. Filenames should follow the `'lat_lon.csv'` format. - **output_file** (`Character`): Path to the output NetCDF file. #### Example ```{r, eval=FALSE} csv_directory <- "csv_output" output_netcdf_file <- "final_data/merged_data.nc" csv_to_netcdf(csv_dir = csv_directory, output_file = output_netcdf_file) ``` # Creating a Climate Input Object {#creating-a-climate-input-object} Before calculating climate indices, create a climate input object that organizes your climate data. ## `climate_input()` Creates a climate input object containing processed climate data and relevant statistics. ### Usage ```{r, eval=FALSE} climate_input(tmax = NULL, tmin = NULL, prec = NULL, wind = NULL, dates = NULL,base.range = c(1961, 1990), n = 5, quantiles = NULL, temp.qtiles = c(0.10, 0.90), wind.qtile = 0.90, max.missing.days = c(annual = 15, monthly = 3), min.base.data.fraction_present = 0.1) ``` ### Parameters - **tmax** (`Numeric vector`): Maximum temperature data. - **tmin** (`Numeric vector`): Minimum temperature data. - **prec** (`Numeric vector`): Precipitation data. - **wind** (`Numeric vector`): Wind speed data. - **dates** (`Date vector`): Dates corresponding to the data. - **base.range** (`Numeric vector`, default `c(1961, 1990)`): Base range years for calculations. - **n** (`Integer`, default `5`): Window size for running averages. - **quantiles** (`List`, optional): Pre-calculated quantiles. - **temp.qtiles** (`Numeric vector`, default `c(0.10, 0.90)`): Temperature quantiles to calculate. - **wind.qtile** (`Numeric`, default `0.90`): Wind quantile to calculate. - **max.missing.days** (`Named numeric vector`, default `c(annual = 15, monthly = 3)`): Maximum allowed missing days. - **min.base.data.fraction_present** (`Numeric`, default `0.1`): Minimum fraction of data required in base range. ### Example ```{r, eval=FALSE} # Assume you have a CSV file with climate data climate_data <- read.csv("processed_data/climate_data.csv") # Create climate input object ci <- climate_input( tmax = climate_data$TMAX, tmin = climate_data$TMIN, prec = climate_data$PRCP, wind = climate_data$WIND, dates = as.Date(climate_data$DATE, format = "%Y-%m-%d") ) ``` # Calculating Climate Indices {#calculating-climate-indices} The **rIACI** package provides functions to calculate various climate indices, both standardized and non-standardized. ## Available Indices - **TX90p**: Percentage of days when maximum temperature is above the 90th percentile. - **TX10p**: Percentage of days when maximum temperature is below the 10th percentile. - **TN90p**: Percentage of days when minimum temperature is above the 90th percentile. - **TN10p**: Percentage of days when minimum temperature is below the 10th percentile. - **Rx5day**: Maximum consecutive 5-day precipitation amount. - **CDD**: Maximum length of consecutive dry days. - **W90p**: Percentage of days when wind speed is above the 90th percentile. - **Sea**: Sea level data integration. - **T90p**, **T10p**: Combined temperature indices. - **IACI**: Iberian Actuarial Climate Index ## Example: Calculating TX90p ```{r, eval=FALSE} # Calculate monthly TX90p index tx90p_values <- tx90p(ci, freq = "monthly") # View the results head(tx90p_values) ``` ## Example: Calculating standardized T90p ```{r, eval=FALSE} # Calculate standardized T90p index on a monthly basis t90p_std_values <- t90p_std(ci, freq = "monthly") # View the results head(tx90p_std_values) ``` ## Example: Calculating monthly TX90p ```{r, eval=FALSE} # Calculate seasonal TN10p index tx90p_seasonal <- monthly_to_seasonal(tx90p_values) # View the results head(tx90p_seasonal) ``` # Integrating Sea Level Data {#integrating-sea-level-data} To incorporate sea level data into the IACI, use the `sea_input()` function. ## `sea_input()` Creates a data frame for sea level data input. ### Usage ```{r, eval=FALSE} sea_input(Date = levels(ci$date_factors$monthly), Value = NA) ``` ### Parameters - **Date** (`Character vector`): Dates in "YYYY-MM" format. - **Value** (`Numeric vector`, default `NA`): Sea level values. ### Example ```{r, eval=FALSE} # Create sea level data sea_dates <- c("2020-01", "2020-02", "2020-03") sea_values <- c(1.2, 1.3, 1.4) sea_data <- sea_input(Date = sea_dates, Value = sea_values) ``` # Generating IACI Output {#generating-iaci-output} The final step is to compute the IACI by integrating all standardized indices. ## `iaci_output()` Integrates various standardized indices to compute the IACI. ### Usage ```{r, eval=FALSE} iaci_output(ci, si, freq = c("monthly", "seasonal")) ``` ### Parameters - **ci** (`List`): Climate input object created by `climate_input()`. - **si** (`Data frame`): Sea level input data created by `sea_input()`. - **freq** (`Character`, default `c("monthly", "seasonal")`): Frequency of calculation. ### Example ```{r, eval=FALSE} # Generate IACI iaci <- iaci_output(ci, sea_data, freq = "monthly") # View the IACI head(iaci) ``` ## `output_all()` Processes all CSV files in the input directory and outputs the IACI results to the output directory. ### Usage ```{r, eval=FALSE} output_all(si, input_dir, output_dir, freq = c("monthly", "seasonal"), base.range = c(1961, 1990), time.span = c(1961, 2022)) ``` ### Parameters - **si** (`Data frame`): Sea level input data. - **input_dir** (`Character`): Directory containing input CSV files. - **output_dir** (`Character`): Directory to save output files. - **freq** (`Character`, default `c("monthly", "seasonal")`): Frequency of calculation. - **base.range** (`Numeric vector`, default `c(1961, 1990)`): Base range years. - **time.span** (`Numeric vector`, default `c(1961, 2022)`): Time span for output data. ### Example ```{r, eval=FALSE} # Define input and output directories input_dir <- "csv_output" output_dir <- "iaci_results" # Run the output_all function with monthly frequency output_all( si = sea_std_values, input_dir = input_dir, output_dir = output_dir, freq = "monthly", base.range = c(1961, 1990), time.span = c(1961, 2022) ) ``` # Complete Workflow Example {#complete-workflow-example} Below is a comprehensive example demonstrating the complete workflow from downloading data to generating the IACI. ```{r, eval=FALSE} # Load the package library(rIACI) # Step 1: Download ERA5-Land data user_id <- "your_user_id" user_key <- "your_api_key" area_iberia <- c(44, -10, 35, 5) # Approximate bounds of Iberian Peninsula download_data( start_year = 2020, end_year = 2020, variables = c("2m_temperature", "total_precipitation", "10m_u_component_of_wind", "10m_v_component_of_wind"), area = area_iberia, user_id = user_id, user_key = user_key ) # Step 2: Process downloaded data input_directory <- "cds_data" output_directory <- "processed_data" process_data(input_dir = input_directory, output_dir = output_directory) # Step 3: Export processed NetCDF to CSV netcdf_file <- "processed_data/2020_01.nc" csv_output_directory <- "csv_output" export_data_to_csv(nc_file = netcdf_file, output_dir = csv_output_directory) # Step 4: Create climate input object climate_data <- read.csv("processed_data/climate_data.csv") ci <- climate_input( tmax = climate_data$TMAX, tmin = climate_data$TMIN, prec = climate_data$PRCP, wind = climate_data$WIND, dates = as.Date(climate_data$DATE, format = "%Y-%m-%d") ) # Step 5: Integrate sea level data sea_dates <- c("2020-01", "2020-02", "2020-03") sea_values <- c(1.2, 1.3, 1.4) sea_data <- sea_input(Date = sea_dates, Value = sea_values) sea_std_values <- sea_std(sea_data, freq = "monthly") # Step 6: Generate IACI iaci <- iaci_output(ci, sea_std_values, freq = "monthly") print(head(iaci)) # Step 7: Output all results output_all( si = sea_std_values, input_dir = csv_output_directory, output_dir = "iaci_results", freq = "monthly", base.range = c(1961, 1990), time.span = c(1961, 2022) ) # Step 8: Merge CSVs into NetCDF (optional) merged_netcdf <- "iaci.nc" csv_to_netcdf(csv_dir = iaci_results_directory, output_file = merged_netcdf) ``` # Additional Resources {#additional-resources} - **ECMWF Climate Data Store**: <https://cds.climate.copernicus.eu/> - **NetCDF Format**: <https://www.unidata.ucar.edu/software/netcdf/> - **Rcpp Package**: <https://cran.r-project.org/package=Rcpp> - **Reticulate Package**: <https://cran.r-project.org/package=reticulate> - **dplyr Package**: <https://cran.r-project.org/package=dplyr> - **tidyr Package**: <https://cran.r-project.org/package=tidyr> For further assistance, please refer to the package documentation or contact the package maintainer. # Conclusion The **rIACI** package offers a comprehensive suite of tools for climate data analysis, enabling users to compute the Iberian Actuarial Climate Index effectively. By following this guide, you can seamlessly download, process, and analyze climate data to gain valuable insights into climate variability and extremes in the Iberian Peninsula. # Acknowledgements This package benefited fundamentally from the collective expertise and encouragement of **Jose Luis Vilar-Zanon**, **Jose Garrido**, and **Antonio Jose Heras Martinez**. # Bibliography