This code illustrates how we can make a map of the different regions, so you can discover which ones you need to get for your project. It assumes you have read the tutorial at least through the fourth section, on data structures and basic plotting.

An important note: In this section we read in the connectivity data, process it, and save it to a directory. As before, it is important that you run the code in a predictable way from a known location. In Rstudio, the easiest way to do this is to go to the “Session” menu, “Set Working Directory” item, and choose “To Source File Location.” Your data files will then be saved into the directory this file is in, and the code will find the model_depth_and_distance.nc in the connectivityData directory within it. You can also explicitly call the setwd(‘/directory/with/data’) function to specify where the data will be kept.

The functions to retrieve the connectivity data is contained in the file connectivityUtilities.R, so we must source that file to load the functions into R. More details on this process can be found at 03_getData_Subset_and_Combine.

source('connectivityUtilities.R')
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.1     ✔ stringr 1.4.1
## ✔ readr   2.1.3     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
## 
## 
## Attaching package: 'foreach'
## 
## 
## The following objects are masked from 'package:purrr':
## 
##     accumulate, when
## 
## 
## Loading required package: parallel
## [1] "file connectivityData/EZfateFiles/model_depth_and_distance.nc exists, no download"

As described in 03_getData_Subset_and_Combine, we can use the getConnectivityData package to get a connectivity matrix for a particular region and PLD. Here we choose a specific year. This choice of year does not matter, since we are only interested in the starting locations. We save the matrix into the variable E.

  regionName<-'theAmericas'
  depth<-1
  year<-2007
  verticalBehavior<-'starts'
  month<-5
  minPLD<-18; maxPLD<-minPLD
  E<-getConnectivityData(regionName,depth,year,verticalBehavior,month,minPLD,maxPLD)
## [1] "file connectivityData/theAmericas/1m/starts/year_2007_month05_minPLD18_maxPLD18.RDS exists, no download"

To save disk space, and to reduce the amount of data to be transfered across the network, the latitude and longitude data is not included in the connectivity data – instead, the data is all stored as indices to the circulation model grid. However, for manipulating the connectivity and plotting it, it is necessary to have the latitude and longitude data. So we must add it, using the addLatLon() function in the connectivityUtilities.R file.

  E<-addLatLon(E)

Now we can plot the starting locations (lonFrom and latFrom) in the matrix. First, we load the plotting libraries and initialize them.

    library("rnaturalearth")
    library("rnaturalearthdata")
    world <- ne_countries(scale = "medium", returnclass = "sf") #get coastline data
    class(world)
## [1] "sf"         "data.frame"

and then we plot the resulting data:

    p<-ggplot(data = world) + geom_sf() +
      coord_sf(xlim= c(-180, 180), ylim = c(-85, 85), expand = FALSE)+
      geom_point(data = E, aes(x = lonFrom, y = latFrom), size = 1, shape = 23, fill = "green",color='green')
    print(p) #this makes the figure appear

Ok, now lets put this into a loop so we can plot all the regions at once. This code gathers the data and adds it to a single data.frame().

#now loop over regions, get the data, add latitude and longitude, and then plot the points
    regionList<-list('theAmericas','EuropeAfricaMiddleEast','AsiaPacific','Antarctica')
    nColor<-0
    Eall=data.frame()
    for (regionName in regionList) {
      nColor<-nColor+1
      print(paste('working on',regionName))
      E<-getConnectivityData(regionName,depth,year,verticalBehavior,month,minPLD,maxPLD) #the rest of the parameters are defined in the first code block. 
      E<-addLatLon(E) 
      E<-E[c('lonFrom','latFrom')] #save lots of memory by throwing away what we don't need
      E['region']<-regionName
      Eall=rbind(Eall,E)
    }
## [1] "working on theAmericas"
## [1] "file connectivityData/theAmericas/1m/starts/year_2007_month05_minPLD18_maxPLD18.RDS exists, no download"
## [1] "working on EuropeAfricaMiddleEast"
## [1] "file connectivityData/EuropeAfricaMiddleEast/1m/starts/year_2007_month05_minPLD18_maxPLD18.RDS exists, no download"
## [1] "working on AsiaPacific"
## [1] "file connectivityData/AsiaPacific/1m/starts/year_2007_month05_minPLD18_maxPLD18.RDS exists, no download"
## [1] "working on Antarctica"
## [1] "file connectivityData/Antarctica/1m/starts/year_2007_month05_minPLD18_maxPLD18.RDS exists, no download"

and now make the plot appear:

#first, make the map
    p<-ggplot(data = world) + geom_sf() +
      coord_sf(xlim= c(-180, 180), ylim = c(-85, 85), expand = FALSE)

#and now make the plot appear
  p<-p+geom_point(data = Eall, aes(x = lonFrom, y = latFrom,color=region), size = 1, shape = 23)
  print(p)