Title: | Fast Rolling Functions for Seismology using 'Rcpp' |
---|---|
Description: | Fast versions of seismic analysis functions that 'roll' over a vector of values. See the 'RcppRoll' package for alternative versions of basic statistical functions such as rolling mean, median, etc. |
Authors: | Jonathan Callahan [aut], Rob Casey [aut], Mary Templeton [aut], Gillian Sharer [aut, cre] |
Maintainer: | Gillian Sharer <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.5 |
Built: | 2025-03-07 03:48:26 UTC |
Source: | https://github.com/cran/seismicRoll |
This package implements fast versions of 'roll'-ing functions primarily for use in
seismology. It is intended for use with the IRISSeismic and IRISMustangMetrics packages
originally developed for the IRIS DMC, and now EarthScope Consortium (https://www.earthscope.org).
One advantage of the seismicRoll package is that all returned values are of
the same dimension as the incoming data with NA
s where the rolling function
could not be calculated.
Currently exported functions include:
findOutliers
– outlier detection wrapper
roll_hampel
– outlier detection
roll_mean
– rolling mean
roll_median
– rolling median (for outlier replacement)
roll_sd
– rolling standard deviation
roll_stalta
– first break picker
roll_range
– rolling difference of max/min values
History
version 1.1.5
fixed issue with package documentation, to meet CRAN requirements
updated documentation for IRIS and UNAVCO merger to Earthscope, fixed urls
version 1.1.4
bug fix for roll_range when NA values present
version 1.1.3
minor changes to how the code compiles
add roll_range function
version 1.1.2 – bug fix
findOutliers()
exits if the roll_hampel()
return vector consists entirely of NA
values.
version 1.1.0 – findOutliers update and bug fix
findOutliers()
function default argument values changed. Now thresholdMin=10
,
selectivity=NA
, and fixedThreshold=TRUE
.
Bug fix in roll_hampel()
handling NA
values.
version 1.0.3 – findOutliers update
Added fixedThreshold
argument to findOutliers()
function.
version 1.0.0 – initial release
A wrapper for the roll_hampel() function that counts outliers using either a user specified threshold value or a threshold value based on the statistics of the incoming data.
findOutliers( x, n = 41, thresholdMin = 10, selectivity = NA, increment = 1, fixedThreshold = TRUE )
findOutliers( x, n = 41, thresholdMin = 10, selectivity = NA, increment = 1, fixedThreshold = TRUE )
x |
an R numeric vector |
n |
integer window size |
thresholdMin |
initial value for outlier detection |
selectivity |
value between [0-1] used in determining outliers, or |
increment |
integer shift to use when sliding the window to the next location |
fixedThreshold |
logical specifying whether outlier detection uses |
The thresholdMin
level is similar to a sigma value for normally distributed data.
Hampel filter values above 6 indicate a data value that is extremely unlikely
to be part of a normal distribution (~ 1/500 million) and therefore very likely to be an outlier. By
choosing a relatively large value for thresholdMin
we make it less likely that we
will generate false positives. False positives can include high frequency environmental noise.
With the default setting of fixedThreshold=TRUE
any value above the threshold is considered an outlier
and the selectivity
is ignored.
The selectivity
is a value between 0 and 1 and is used to generate an appropriate
threshold for outlier detection based on the statistics of the incoming data. A lower value
for selectivity
will result in more outliers while a value closer to 1.0 will result in
fewer. If fixedThreshold=TRUE
, selectivity
may have a value of NA
.
When the user specifies fixedThreshold=FALSE
, the thresholdMin
and selectivity
parameters work like squelch and volume on a CB radio: thresholdMin
sets a noise threshold
below which you don't want anything returned while selectivity
adjusts the number of points defined as outliers
by setting a new threshold defined by the maximum value of roll_hampel
multiplied by selectivity
.
n
, the windowSize, is a parameter that is passed to roll_hampel()
.
The default value of increment=1
should not be changed. Outliers are defined
as individual points that stand apart from their neighbors. Applying the Hampel filter to
every other point by using increment
> 1 will invariably miss some of the outliers.
A vector of indices associated with outliers in the incoming data x
.
# Noisy sinusoid with outliers a <- jitter(sin(0.1*seq(1e4)),amount=0.2) indices <- sample(seq(1e4),20) a[indices] <- a[indices]*10 # Outlier detection should identify many of these altered indices sort(indices) findOutliers(a)
# Noisy sinusoid with outliers a <- jitter(sin(0.1*seq(1e4)),amount=0.2) indices <- sample(seq(1e4),20) a[indices] <- a[indices]*10 # Outlier detection should identify many of these altered indices sort(indices) findOutliers(a)
Fast, center-aligned hampel filter using C++/Rcpp.
The Hampel filter is a robust outlier detector using Median Absolute Deviation (MAD).
Additional performance gains can be achieved by skipping increment
values between calculations.
roll_hampel(x, n, increment = 1)
roll_hampel(x, n, increment = 1)
x |
an R numeric vector |
n |
integer window size |
increment |
integer shift to use when sliding the window to the next location |
Unlike the version in the pracma package, this version does not return the corrected timeseries. Instead, it returns a vector of values that can be tested against different threshold values. Higher values in the return are associated with a higher likelihood that the associated point is an outlier when compared with its neighbors. Outliers can be picked out by comparing the return values against some threshold as seen in the example.
Also unlike the pracma version, n
is interpreted as the full window length
and will be increased by one if necessary to have a window representing an odd number of indices.
A vector of values of the same length as x
.
A pure R version of the filter is found in the pracma package.
a <- sin(0.1*seq(100)) a[20] <- 50 b <- roll_hampel(a,10) threshold <- 6 which(b > threshold) ## Not run: require(microbenchmark) require(pracma) microbenchmark(hampel(a,10), roll_hampel(a,10), times=10) # Unit: microseconds # expr min lq median uq max neval # hampel(a, 10) 7610.688 7784.374 8037.4035 9453.928 16176.535 10 # roll_hampel(a, 10) 36.530 37.443 58.7165 65.418 90.403 10 ## End(Not run)
a <- sin(0.1*seq(100)) a[20] <- 50 b <- roll_hampel(a,10) threshold <- 6 which(b > threshold) ## Not run: require(microbenchmark) require(pracma) microbenchmark(hampel(a,10), roll_hampel(a,10), times=10) # Unit: microseconds # expr min lq median uq max neval # hampel(a, 10) 7610.688 7784.374 8037.4035 9453.928 16176.535 10 # roll_hampel(a, 10) 36.530 37.443 58.7165 65.418 90.403 10 ## End(Not run)
Fast rolling means with aligment using C++/Rcpp.
Additional performance gains can be achieved by skipping increment
values between calculations.
roll_mean(x, n = 7, increment = 1, align = "center")
roll_mean(x, n = 7, increment = 1, align = "center")
x |
an R numeric vector |
n |
integer window size |
increment |
integer shift to use when sliding the window to the next location |
align |
window alignment, one of |
The window size n
is interpreted as the full window length.
Setting increment
to a value greater than one will result in NA
s for all skipped-over indices.
The align
parameter determines the alignment of the current index within the window. Thus:
align="left" [*------]
will cause the returned vector to have n-1 NA
values at the right end.
align="center" [---*---]
will cause the returned vector to have (n-1)/2 NA
values at either end.
align="right" [------*]
will cause the returned vector to have n-1 NA
values at the left end.
A vector of rolling mean values of the same length as x
.
For align="center"
, the window size is increased by one if necessary to guarantee an odd window size.
x <- rep(1:5,each=10) plot(x,cex=0.8,pch=17,main="Test of roll_mean alignment with a 5-point window") points(roll_mean(x,5,1,'left'),cex=1.5,col='blue',type='b') points(roll_mean(x,5,1,'center'),cex=1.5,col='forestgreen',type='b') points(roll_mean(x,5,1,'right'),cex=1.5,col='red',type='b') legend("topleft", legend=c('data','align=left','align=center','align=right'), col=c('black','blue','forestgreen','red'), pch=c(17,1,1,1))
x <- rep(1:5,each=10) plot(x,cex=0.8,pch=17,main="Test of roll_mean alignment with a 5-point window") points(roll_mean(x,5,1,'left'),cex=1.5,col='blue',type='b') points(roll_mean(x,5,1,'center'),cex=1.5,col='forestgreen',type='b') points(roll_mean(x,5,1,'right'),cex=1.5,col='red',type='b') legend("topleft", legend=c('data','align=left','align=center','align=right'), col=c('black','blue','forestgreen','red'), pch=c(17,1,1,1))
Fast, center-aligned rolling medians using C++/Rcpp.
Additional performance gains can be achieved by skipping increment
values between calculations.
The roll_median
function can be used to replace outliers detected by the roll_hampel
function. See example below.
roll_median(x, n = 7, increment = 1)
roll_median(x, n = 7, increment = 1)
x |
an R numeric vector |
n |
integer window size |
increment |
integer shift to use when sliding the window to the next location |
The window size n
is interpreted as the full window length.
Values within n/2
of the beginning or end of x
are set to NA
.
Setting increment
to a value greater than one will result in NA
s for all skipped-over indices.
A vector of rolling median values of the same length as x
.
a <- jitter(sin(0.1*seq(1e4)),amount=0.2) indices <- sample(seq(1e4),20) a[indices] <- a[indices]*10 # Outlier detection b <- roll_hampel(a,10) threshold <- 6 outliers <- which(b > threshold) # Outlier replacement with median values a_fixed <- a a_fixed[outliers] <- roll_median(a,10)[outliers] # Set up two plots layout(matrix(seq(2))) plot(a,type='l', col='gray60', main="Outliers detected") points(outliers,a[outliers], col='red', lwd=2) plot(a_fixed,type='l', col='gray60', main="Outliers replaced with rolling median") points(outliers,a_fixed[outliers], col='red', lwd=2)
a <- jitter(sin(0.1*seq(1e4)),amount=0.2) indices <- sample(seq(1e4),20) a[indices] <- a[indices]*10 # Outlier detection b <- roll_hampel(a,10) threshold <- 6 outliers <- which(b > threshold) # Outlier replacement with median values a_fixed <- a a_fixed[outliers] <- roll_median(a,10)[outliers] # Set up two plots layout(matrix(seq(2))) plot(a,type='l', col='gray60', main="Outliers detected") points(outliers,a[outliers], col='red', lwd=2) plot(a_fixed,type='l', col='gray60', main="Outliers replaced with rolling median") points(outliers,a_fixed[outliers], col='red', lwd=2)
Fast rolling range with aligment using C++/Rcpp.
Additional performance gains can be achieved by skipping increment
values between calculations.
roll_range(x, n = 7, increment = 1, align = "center")
roll_range(x, n = 7, increment = 1, align = "center")
x |
an R numeric vector |
n |
integer window size |
increment |
integer shift to use when sliding the window to the next location |
align |
window alignment, one of |
The window size n
is interpreted as the full window length.
Setting increment
to a value greater than one will result in NA
s for all skipped-over indices.
The align
parameter determines the alignment of the current index within the window. Thus:
align="left" [*------]
will cause the returned vector to have n-1 NA
values at the right end.
align="center" [---*---]
will cause the returned vector to have (n-1)/2 NA
values at either end.
align="right" [------*]
will cause the returned vector to have n-1 NA
values at the left end.
A vector of rolling values that difference the maximum and minimum values, of the same length as x
.
For align="center"
, the window size is increased by one if necessary to guarantee an odd window size.
Fast rolling standard deviations with aligment using C++/Rcpp.
Additional performance gains can be achieved by skipping increment
values between calculations.
roll_sd(x, n = 7, increment = 1, align = "center")
roll_sd(x, n = 7, increment = 1, align = "center")
x |
an R numeric vector |
n |
integer window size |
increment |
integer shift to use when sliding the window to the next location |
align |
window alignment, one of |
The window size n
is interpreted as the full window length.
Setting increment
to a value greater than one will result in NA
s for all skipped-over indices.
The align
parameter determines the alignment of the current index within the window. Thus:
align="left" [*------]
will cause the returned vector to have n-1 NA
values at the right end.
align="center" [---*---]
will cause the returned vector to have (n-1)/2 NA
values at either end.
align="right" [------*]
will cause the returned vector to have n-1 NA
values at the left end.
A vector of rolling standard deviation values of the same length as x
.
For align="center"
, the window size is increased by one if necessary to guarantee an odd window size.
Fast rolling STA/LTA using C++/Rcpp. Additional performance gains can be achieved by
skipping increment
values between calculations.
The STA/LTA ratio method is used for automatic detection of seismic signal arrival times.
roll_stalta(x, n_sta, n_lta, increment = 1)
roll_stalta(x, n_sta, n_lta, increment = 1)
x |
an R numeric vector |
n_sta |
integer STA window size |
n_lta |
integer LTA window size |
increment |
integer shift to use when sliding the window to the next location |
The roll_stalta
function described here does no preprocessing of the incoming
data and merely calculates the ratio of the average value in the STA window to the average value
in the LTA window. Windows are aligned so that the index is at the left edge of the STA window and
at the right edge of the LTA window.
[---------- LTA --------*]........
.......................[*- STA --]
For proper use of this algorithm seismic data should be preprocessed as in the example below with:
demean, detrend and taper the raw signal
square the processed signal to get power
With increment=1
, this function is equivalent to, eg:
sta <- roll_mean(x,3,1,"left")
lta <- roll_mean(x,30,1,"right")
r <- sta/lta
For increments greater than one, the rolling means above will not align properly,
hence the need for a dedicated roll_stalta
function.
Values within n_lta-1
of the beginning or
n_sta-1
of the end of x
are set to NA
.
Setting increment
to a value greater than one will result in NA
s for
all skipped-over indices.
A vector of values of the same length as x
with each point containing the STA/LTA
ratio at that point.
# Contrived example x <- rep(c(1,5,3,2,1),each=20) p <- roll_stalta(x,3,6) plot(x, pch=17, cex=0.8, ylim=c(0,max(x)), main="Test of roll_stalta on artificial data") points(p,cex=1.5,col='red',type='b') legend('topleft', legend=c('data','STA/LTA'), pch=c(17,1), col=c('black','red')) # Real example requiring the 'seismic' package ## Not run: require(seismic) # Create a new IrisClient iris <- new("IrisClient") # Seismic data with a large quake starttime <- as.POSIXct("2010-02-27 06:30:00", tz="GMT") endtime <- as.POSIXct("2010-02-27 07:00:00", tz="GMT") st <- getDataselect(iris,"IU","ANMO","00","BHZ",starttime,endtime) tr <- st@traces[[1]] # Preprocess the data x <- DDT(tr)@data # Calculate the first break 'picker' n_sta <- 3 * tr@stats@sampling_rate n_lta <- 10 * n_sta p <- roll_stalta(x^2,n_sta,n_lta) first_break <- which(p == max(p,na.rm=TRUE)) plot(x,type='l', main='Test of STA/LTA first break picker on raw seismic data') abline(v=first_break,col='red') ## End(Not run)
# Contrived example x <- rep(c(1,5,3,2,1),each=20) p <- roll_stalta(x,3,6) plot(x, pch=17, cex=0.8, ylim=c(0,max(x)), main="Test of roll_stalta on artificial data") points(p,cex=1.5,col='red',type='b') legend('topleft', legend=c('data','STA/LTA'), pch=c(17,1), col=c('black','red')) # Real example requiring the 'seismic' package ## Not run: require(seismic) # Create a new IrisClient iris <- new("IrisClient") # Seismic data with a large quake starttime <- as.POSIXct("2010-02-27 06:30:00", tz="GMT") endtime <- as.POSIXct("2010-02-27 07:00:00", tz="GMT") st <- getDataselect(iris,"IU","ANMO","00","BHZ",starttime,endtime) tr <- st@traces[[1]] # Preprocess the data x <- DDT(tr)@data # Calculate the first break 'picker' n_sta <- 3 * tr@stats@sampling_rate n_lta <- 10 * n_sta p <- roll_stalta(x^2,n_sta,n_lta) first_break <- which(p == max(p,na.rm=TRUE)) plot(x,type='l', main='Test of STA/LTA first break picker on raw seismic data') abline(v=first_break,col='red') ## End(Not run)