This tutorial demonstrates how to use the
SEAGLE package when the user inputs a matrix \({\bf G}\). We’ll begin by loading the
SEAGLE package.
As an example, we’ll generate some synthetic data for usage in this tutorial. Let’s consider a dataset with \(n=5000\) individuals and \(L=100\) loci, where the first \(40\) are causal.
The makeSimData function generates a covariate matrix
\(\widetilde{\bf X} \in \mathbb{R}^{n \times
3}\), where the first column is the all ones vector for the
intercept and the second and third columns are \({\bf X} \sim \text{N}(0,1)\) and \({\bf E} \sim \text{N}(0,1)\),
respectively. The last two columns are scaled to have \(0\) mean and unit variance.
The makeSimData function additionally generates the
genetic marker matrix \({\bf G}\) with
synthetic haplotype data from the COSI software. Detailed procedures for
generating \({\bf G}\) can be found in
the accompanying journal manuscript. Finally, the
makeSimData function also generates a continuous phenotype
\({\bf y}\) according to the following
fixed effects model \[
{\bf y} = \tilde{\bf X} \boldsymbol{\gamma}_{\widetilde{\bf X}} + {\bf
G}\boldsymbol{\gamma}_{G} + \text{diag}(E){\bf
G}\boldsymbol{\gamma}_{GE} + {\bf e}.
\] Here, \(\boldsymbol{\gamma}_{\tilde{\bf X}}\) is
the all ones vector of length \(P=3\),
\(\boldsymbol{\gamma}_{G} \in
\mathbb{R}^{L}\), \(\boldsymbol{\gamma}_{GE}\in
\mathbb{R}^{L}\), and \({\bf e} \sim
\text{N}({\bf 0}, \sigma\, {\bf I}_{n})\). The entries of \(\boldsymbol{\gamma}_{G}\) and \(\boldsymbol{\gamma}_{GE}\) pertaining to
causal loci are set to be \(\gamma_{G}\) = gammaG and
\(\gamma_{GE}\) = gammaGE,
respectively. The remaining entries of \(\boldsymbol{\gamma}_{G}\) and \(\boldsymbol{\gamma}_{GE}\) pertaining to
non-causal loci are set to \(0\).
Now that we have our data, we can prepare it for use in the SEAGLE
algorithm. We will input our \({\bf
y}\), \({\bf X}\), \({\bf E}\), and \({\bf G}\) into the prep.SEAGLE
function. The intercept = 1 parameter indicates that the
first column of \({\bf X}\) is the all
ones vector for the intercept.
This preparation procedure formats the input data for the
SEAGLE function by checking the dimensions of the input
data. It also pre-computes a QR decomposition for \(\widetilde{\bf X} = \begin{pmatrix} {\bf 1}_{n}
& {\bf X} & {\bf E} \end{pmatrix}\), where \({\bf 1}_{n}\) denotes the all ones vector
of length \(n\).
Finally, we’ll input the prepared data into the SEAGLE
function to compute the score-like test statistic \(T\) and its corresponding p-value. The
init.tau and init.sigma parameters are the
initial values for \(\tau\) and \(\sigma\) employed in the REML EM
algorithm.
res <- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5)
res$T
#> [1] 246.1886
res$pv
#> [1] 0.8441451The score-like test statistic \(T\)
for the G\(\times\)E effect and its
corresponding p-value can be found in res$T and
res$pv, respectively.