Data Illustration for the 2021 QSR Data Challenge Competition

We provide illustrative code that demonstrate how the data for the 2021 QSR Data Challenge Competition can be accessed in R.

The data for each layer are provided in the HDF5 format, and can be accessed at the following link:

Link for data

To access the variables in the dataset in R, we utilize the rhdf5 library, which is part of the Bioconductor suite of R libraries.

Installing and Loading the rhdf5 Package

In [1]:
install.packages("BiocManager")
BiocManager::install("rhdf5")
library(rhdf5)
Installing package into 'C:/Users/Arman/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

package 'BiocManager' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Arman\AppData\Local\Temp\RtmpA3EIKj\downloaded_packages
'getOption("repos")' replaces Bioconductor standard repositories, see
'?repositories' for details

replacement repositories:
    CRAN: https://cran.r-project.org


Bioconductor version 3.10 (BiocManager 1.30.15), R 3.6.3 (2020-02-29)

Warning message:
"package(s) not installed when version(s) same as current; use `force = TRUE` to
  re-install: 'rhdf5'"
Installation paths not writeable, unable to update packages
  path: C:/Program Files/R/R-3.6.3/library
  packages:
    boot, class, cluster, codetools, KernSmooth, lattice, MASS, Matrix, mgcv,
    nlme, nnet, spatial, survival

Old packages: 'arm', 'BH', 'cachem', 'callr', 'car', 'cli', 'clipr', 'coda',
  'colorspace', 'crayon', 'curl', 'data.table', 'desc', 'devtools', 'digest',
  'doSNOW', 'dplyr', 'DT', 'ellipsis', 'fansi', 'farver', 'forcats', 'foreach',
  'formatR', 'gert', 'ggplot2', 'gh', 'glmnet', 'glue', 'gtools', 'haven',
  'highr', 'hms', 'htmltools', 'inline', 'IRkernel', 'isoband', 'iterators',
  'knitr', 'labeling', 'later', 'lifecycle', 'lme4', 'loo', 'magrittr',
  'maptools', 'Matching', 'MatchIt', 'MatrixModels', 'matrixStats', 'mime',
  'misc3d', 'multcomp', 'nloptr', 'openssl', 'openxlsx', 'pbdZMQ', 'pbkrtest',
  'pillar', 'pkgbuild', 'pkgload', 'plot3D', 'processx', 'promises', 'ps',
  'quantreg', 'R6', 'rappdirs', 'Rcpp', 'RcppArmadillo', 'RcppEigen',
  'RcppParallel', 'readr', 'remotes', 'rio', 'rlang', 'rmarkdown', 'rprojroot',
  'rstudioapi', 'rversions', 'sandwich', 'shape', 'sp', 'SparseM',
  'StanHeaders', 'statmod', 'stringi', 'testthat', 'tibble', 'tidyselect',
  'tinytex', 'usethis', 'utf8', 'V8', 'vctrs', 'vioplot', 'viridisLite',
  'withr', 'xfun', 'zip', 'zoo'

Accessing the Variables

Once the libraries are installed and loading, we read the data into R by means of the h5read function. Note that to execute this function, you must provide the name in addition to the filename of the file. For the illustrative data, the name is "OpenData".

In [2]:
sample_dataset = h5read(name="OpenData", file="DATA_PORTION_layer352.hdf5")
In [3]:
head(sample_dataset)
A matrix: 6 × 9 of type dbl
-22.71916-80.4749348001500100 016619
-22.72852-80.4738048001500100145516619
-22.73900-80.4734348001500100311447919
-22.75135-80.4730548001500100402478019
-22.75135-80.4730548001500100429994219
-22.77904-80.4719348001500100442998919

In this data matrix, the columns correspond to (from left to right)

  • X,
  • Y,
  • NominalPower,
  • NominalSpeed,
  • NominalSpotDiameter,
  • LaserPowerCurrent,
  • SignalInGaAs,
  • IDbulkLayer, and
  • IDoocLayer.

We provide a scatterplot visualization of the X, Y, and SignalInGaAs variables.

In [4]:
install.packages("scatterplot3d")
library(scatterplot3d)

scatterplot3d(sample_dataset[,1],sample_dataset[,2],sample_dataset[,7],
             xlab="X", ylab="Y", zlab="SignalInGaAs",
             pch=".")
Installing package into 'C:/Users/Arman/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

package 'scatterplot3d' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Arman\AppData\Local\Temp\RtmpA3EIKj\downloaded_packages