Lets load the data saved in 03_GetData_Subset_and_Combine.Rmd and examine how connectivity is represented in the data.frame.

E<-readRDS('mayJuneEastCoast_connectivity.RDS')

You should find a data.frame E of many observations and 10 variables. Each row contains four columns which contain a single number: nxFrom and nyFrom which define the grid location of a single particle release location, and lonFrom and latFrom, which give the longitude and latitude of that release location. The next five columns contain lists of numbers which define where particles that were released at (nxFrom,nyFrom) and (lonFrom,latFrom) ended up. (nxTo,nyTo) are the grid cells the particles get to, and (lonTo,latTo) are the latitudes they get to. Now, many particles can get to the same grid-cell. The number of particles that get to each (nxTo,nyTo) and (lonTo,latTo) location are stored in numTo. The final column is numLaunched, which is the total number of particles released from a given (nxTo,nyTo) and (lonTo,latTo) starting location. The sum of all the numTo in a single row should always be equal to or less than numLaunched. The sum of numTo will be less than numLaunched if, when computing the subset of the connectivity data, particles that settle outside of the starting points are discarded (trimTo=FALSE in subsetConnectivity_*()).

Before starting to plot the connectivity data, it is useful to load the rnaturalfeatures data that provides coastline data, along with the simple features library sf which provides functions to define geometric features and the ggplot2 plotting library.

    library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
    library(ggplot2)
    library("rnaturalearth")
    library("rnaturalearthdata")
## 
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
## 
##     countries110
    world <- ne_countries(scale = "medium", returnclass = "sf") #get coastline data
    class(world)
## [1] "sf"         "data.frame"

Plot starting locations

To plot all of the starting locations, plot all the (lonTo,latTo) points. In the plot below, we zoom in on the South Atlantic Bight by changing the limits in coord_sf to better see the individual points.

    p<-ggplot(data = world) + geom_sf() +
      coord_sf(xlim= c(-82, -70), ylim = c(24, 36), expand = FALSE)+
      geom_point(data = E, aes(x = lonFrom, y = latFrom), size = 1, shape = 23, fill = "green",color='black')
    print(p) #this makes the figure appear

Plot where particles starting in a polygon go

Now lets look at where all the particles released in a small region go in the 18 day drifting duration of this data. First, define a polygon that encompasses the region where the particles are released:

  #define limits of a box that defines the release locations
  lonLimits=c(-82.0,-80) #longitude of corners
  latLimits=c(28.0,30.0) #latitude of corners
  limitPoly=st_polygon(list(cbind(lonLimits[c(1,2,2,1,1)],latLimits[c(1,1,2,2,1)])))
  limitPoly<-st_sfc(limitPoly,crs=4326) #make into a surface, 4326 is WGS84

Now further subset E to only include particle starting location. Set trimTo to FALSE so we also include particles that leave the starting locations. Because subsetConnectivity_byPolygon() comes from the connectivityUtilities.R, we must source connectivityUtilities.R to load the functions in it.

  #subset connectivity data
  source('connectivityUtilities.R')
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'foreach'
## 
## 
## The following objects are masked from 'package:purrr':
## 
##     accumulate, when
## 
## 
## Loading required package: parallel
## [1] "file connectivityData/EZfateFiles/model_depth_and_distance.nc exists, no download"
  EstartPoints<-subsetConnectivity_byPolygon(E,limitPoly,trimTo=FALSE)

Now combine each row in EstartPoints and make a data.frame with each location particles get to, and the total number of particles that get there.

    allTo<-data.frame(lonTo=c(EstartPoints$lonTo,recursive=TRUE),
                      latTo=c(EstartPoints$latTo,recursive=TRUE),
                      numTo=c(EstartPoints$numTo,recursive=TRUE))

and them add together the total number of points that get to the same (lonTo,latTo) points

    whereAllWent<-allTo %>% group_by(lonTo,latTo) %>% summarise(numTo=sum(numTo))

numTo in whereAllWent is the number of points that got to each destination location. It is often better to know the fraction of points launched that go to each point. sum(EstartPoints$numLaunched) gives the total number of particles launched, and so we can use it to calculate the fraction of particles that go to each point:

    whereAllWent$fracTo<-whereAllWent$numTo/sum(EstartPoints$numLaunched)

and now we can plot the starting points from (EstartPoints$lonFrom,EstartPoints$latFrom) in bright green and the log10(density) of where particles released from those points ended up with log10(whereAllWent$fracTo). :

    p<-ggplot(data = world) + geom_sf() +
      coord_sf(xlim= c(-82, -55), ylim = c(24, 42), expand = FALSE)
    #p<-p+geom_point(data = EstartPoints, aes(x = lonFrom, y = latFrom), size = 1, shape = 23, fill = "green",color='black')
    p<-p+geom_point(data = whereAllWent, aes(x = lonTo, y = latTo, fill=log10(fracTo),colour=log10(fracTo)), size = 1, shape = 23)
    p<-p+geom_point(data = EstartPoints, aes(x = lonFrom, y = latFrom), size = 0.5, shape = 23, fill = "green",color='green')
    print(p) #this makes the figure appear

Plot where particles starting at a point go

Sometimes you just want to know where particles from a particular point go. This is also straightforward. First define a latitude and longitude where you want the particles to be released from, and then calculate the closest release point in the EZfate grid.

#specify a point you are interested in
lonP<--70.600 ; latP<-42.995 #Isle of Shoals in NH and Maine

#compute squared distance between lonP and latP and starting points approximately, but
#accurately enough for this work. Answer in units of degrees lat SQUARED. This works because 
#the length of a degree of longitude scales as cos(latitude)
dist<-(E$latFrom-latP)^2+((E$lonFrom-lonP)*cos(E$latFrom*2*pi/360.0))**2

#what row of E is the point of the minimum distance?
nMin<-which.min(dist)

Now lets extract the data. Note the code to extract where the particles go, which looks like c(E$lonTo[nMin],recursive=TRUE). This awkward code is necessary to extract a vector from a dataFrame. The following code also makes the data.frames to point.

#obtain latitude and longitude where particles released from (lonP,latP) have gone
plotLon<-c(E$lonTo[nMin],recursive=TRUE)
plotLat<-c(E$latTo[nMin],recursive=TRUE)
numTo<-c(E$numTo[nMin],recursive=TRUE)

#make data frames to plot
startP=data.frame(lonP=lonP,latP=latP)
endP=data.frame(plotLon=plotLon,plotLat=plotLat,numTo=numTo)

Now plot where the particles started from and where they go.

p<-ggplot(data = world) + geom_sf() +
  coord_sf(xlim= c(-72, -64), ylim = c(41, 44.2), expand = FALSE)+
  geom_point(data = endP, aes(x = plotLon, y = plotLat,color=numTo), size = 1)+
  geom_point(data=startP, aes(x = lonP, y = latP), size = 1, shape = 23, fill = "red",color='red')+
  labs(title='Red is starting point, shaded blue points are ending locations')
print(p) #this makes the figure appear

What next?

Often we will want to model the dispersal of an organism over multiple generations, or something similar. Let us, for now, imagine modeling a semelparous organism, and imagine it starts at a specific point, and each generation it will release a certain number of larvae. Where will those larvae end up, and how many will make it?

In the next section connectivity data will be used to quantitatively examine the spread of an organism from an initial location over multiple generations.