The package can installed via Bioconductor, as shown below:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rSWeeP")
For a brief introduction to using rSWeeP, visit the Quick Start Tutorial
For the original article on the technique see De Pierri et al. (2020).
Start loading the library.
library(rSWeeP)
The rSWeeP package contains the following functions:
There are 4 main types of input accepted by the SWeeP, SWeePlite and extractHDV functions:
Included in the package are amino acid sequences obtained from NCBI. These are mitochondrial proteomes (translated CDSs) from some mammals and bacterial proteomes. These sets will be used to present the functions. Their metadata is also available for visualizing the results.
To make the use of the masks clearer, the table below explains the correspondences. The third and fourth columns correspond to the appropriate formats to insert into the rSWeeP functions.
data type | k-mer | mask | binary format | short format |
---|---|---|---|---|
protein and small proteomes | 4-mer | [1111] | c(1,1,1,1) | c(4) |
protein and small proteomes | 4-mer | [11011] | c(1,1,0,1,1) | c(2,1,2) |
bacterial proteomes | 5-mer | [11111] | c(1,1,1,1,1) | c(5) |
bacterial proteomes | 5-mer | [111011] | c(1,1,1,0,1,1) | c(3,1,2) |
nucleotides | 10-mer | [1111100000011111] | c(1,1,1,1,1,0,0,0,0,0,1,1,1,1,1) | c(5,5,5) |
nucleotides | 12-mer | [111111000111111] | c(1,1,1,1,1,1,0,0,0,1,1,1,1,1,1) | c(6,3,6) |
We recommend masks between 4 and 5-mer for amino acid analysis, for example: [212], [4], [3,1,2]. The 4-mer is suitable for analyzing proteins and small proteomes such as mitochondrial, plastidial and viral proteomes. The 5-mer is suitable for analyzing bacterial proteomes.
For nucleotides, larger masks are required, such as between 6 and 12-mer, for example: [555], [636].
psz
)For proteins and small proteomes, an output vector of approximately 1,000 lengths is recommended. For genomes, longer vectors of around 5,000 length are recommended.
norm
)There are three options for normalization: norm='none'
,
norm='log'
and norm='logNeg'
. The
none
option and simple log
(applied to HDV
matrix before projection) are enough to solve most problems. The
logNeg
option is ideal for analyzing genes and short
sequences.
via log. For counting mode, norm converts 0 to -1 and applies log() to non-zeros. For the binary case, it only converts 0 to -
bin
)The HDV matrix can be binary (presence-absence of k-mer patterns) or
count (number of k-mer patterns found). The counting option
(bin=FALSE
) tends to be more informative. For smaller,
simpler problems, the binary option (bin=TRUE
) is
sufficient.
The output of the SWeeP
and SWeePlite
functions presents a list format output with two main items:
output$proj
a numeric
matrix with
psz
columns and one line per sequence, each row
corresponding to a compact vectoroutput$info
aditional information of the process. This
object is subdivided in:
integer
corresponding to
psz
(the output vector length)boolean
containing if binary (TRUE) or counting
(FALSE)vector
containing the mask usedcharacter
containing the type of the
sequence (amino acid: ‘AA’, ou nucleotide: ‘NT’)boolean
corresponding to the
concatenation of sequencescharacter
corresponding to the version of
the packagecharacter
containing the normalization
usedcharacter
containing the list of
extensions considereddouble
containing the elapsed time in
secondslist
of headers for each analyzed
sequenceAccording to previous studies carried out with SWeeP, we present the parameters indicated for each type of data.
psz \(\approx\) 1500, seqtype= ‘AA’, bin = FALSE, mask = c(2,1,2), norm=‘none’
psz \(\approx\) 7500, seqtype= ‘AA’, bin = FALSE, mask = c(5), norm=‘logNeg’
psz \(\geq\) 600, seqtype= ‘AA’, bin = FALSE, mask = c(2,1,2), norm=‘none’
psz \(\geq\) 600, seqtype= ‘AA’, bin = FALSE, mask = c(2,1,2), norm=‘logNeg’
psz \(\geq\) 1000, seqtype= ‘NT’, bin = FALSE, mask = c(6,3,6), norm=‘none’
Raittz et al. (2021)
More information on parameters can be found in the studies of De Pierri et al. (2020), Raittz et al. (2021) The list of references can be found at the end of the document.
SWeeP
The original SWeeP function requires a projection matrix to be provided, the orthBase. This can be generated by supplying the mask to be used, the data type (‘AA’ for amino acid and ‘NT’ for nucleotide), and the desired projection size, as shown below:
mask = c(2,1,2)
psz = 50
base160k = orthBase(mask=mask,col=psz,seqtype='AA')
dim(base160k$mat)
## [1] 160000 50
The orthBase matrix can be generated only once and then stored, avoiding additional processing. For the same set of parameters (mask, projection size and sequence type) the matrix generated is always the same, which guarantees the comparability of vectors from different analyses.
Below, we present the use of SWeeP for the different types of inputs.
One of the possible inputs to the SWeeP function is the BString format, which can be loaded into RAM via the Biostrings package, as shown below.
# aastring data included in the package for example purposes
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
input = Biostrings::readBStringSet(paste(path,'11_Pan_paniscus.faa',sep=''))
sw = SWeeP(input,base160k,seqtype='AA')
## Starting projection. Please wait.
## Projecting ...
In the case where only one sequence (BString) is supplied, the concatenate=TRUE parameter allows the supplied sequences to be joined into a single sequence. This option is useful, for example, when I want to analyze a complete genome/proteome rather than the individual genes/proteins of a MULTIFASTA. In other words, using concatenate=TRUE you get a single vector representing the organism, and using concatenate=FALSE you get N vectors, each representing a gene/protein.
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
input = Biostrings::readBStringSet(paste(path,'11_Pan_paniscus.faa',sep=''))
sw = SWeeP(input,base160k,seqtype='AA',concatenate=T,bin=T,mask=mask)
## Note that the 'concatenate' option will concatenate all the samples in your input.
## Starting projection. Please wait.
## Projecting ...
sw$info
## $ProjectionSize
## [1] 50
##
## $mask
## [1] 1 1 0 1 1
##
## $SequenceType
## [1] "AA"
##
## $bin
## [1] "binary (TRUE)"
##
## $concatenate
## [1] TRUE
##
## $version
## [1] "SWeeP v2.8"
##
## $norm
## [1] "none"
##
## $timeElapsed
## [1] 0.131
Another possible input for the SWeeP function is the address of a directory with FASTAS files, each corresponding to a vector to be generated (gene or genome, protein or proteome). All the FASTAS are then loaded into memory and processed.
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
sw = SWeeP(path,base160k,seqtype='AA',bin=T,mask=mask)
## Starting projection. Please wait.
## Projecting ...
If each FASTA contains a very long sequence or there are many sequences, and memory is limited, there is the option of using the parameter lowRAMmode=TRUE in which each sequence is processed individually, saving RAM.
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
sw = SWeeP(path,base160k,seqtype='AA',bin=T,mask=mask,lowRAMmode=T)
## starting sequence 1 of 14Projecting ...
## - complete
## starting sequence 2 of 14Projecting ...
## - complete
## starting sequence 3 of 14Projecting ...
## - complete
## starting sequence 4 of 14Projecting ...
## - complete
## starting sequence 5 of 14Projecting ...
## - complete
## starting sequence 6 of 14Projecting ...
## - complete
## starting sequence 7 of 14Projecting ...
## - complete
## starting sequence 8 of 14Projecting ...
## - complete
## starting sequence 9 of 14Projecting ...
## - complete
## starting sequence 10 of 14Projecting ...
## - complete
## starting sequence 11 of 14Projecting ...
## - complete
## starting sequence 12 of 14Projecting ...
## - complete
## starting sequence 13 of 14Projecting ...
## - complete
## starting sequence 14 of 14Projecting ...
## - complete
To visualize the vectorized data, we provide the metadata:
pathmetadata <- system.file(package = "rSWeeP" , "examples" , "metadata_mitochondrial.csv")
mt = read.csv(pathmetadata,header=TRUE)
vunq = c('Hominidae','Psittacidae','Muridae','Bovidae','Apocynaceae')
cl = mt$id
ncl = length(vunq)
pca_output <- prcomp (sw$proj , scale = FALSE)
par(mfrow=c(1,2))
plot(pca_output$x[,1] , pca_output$x [,2] , xlab = 'PC-1' , ylab = 'PC-2' , pch =20 , col = cl)
legend("bottomright" , vunq , col = as.character(c(1:ncl)) , pch =20)
plot(pca_output$x[,3] , pca_output$x [,4] , xlab = 'PC-3' , ylab = 'PC-4' , pch =20 , col = cl)
We can also obtain the phylogenetic relationship between the vectorized organisms using the Neighbour-Joining (NJ) algorithm.
library(ape)
mdist = dist(sw$proj,method='euclidean')
tr = nj(mdist)
tr = root(tr,outgroup='14_Rhazya_stricta')
plot(tr)
In addition to biological sequences, SWeeP is also able to perform dimension reduction of RNAseq data from a supplied expression matrix. This matrix can be in dgCMatrix format, the Seurat package standard, or matrix format. Note that, by default, expression matrices allocate the samples (cells) in the columns and the transcribed genes (features) in the rows. In this case it is necessary to use transpose=TRUE or RNAseqdata=TRUE as parameters.
As an example of data we can use the data provided by the Seurat package itself, pbmc3k, as available at link.
library(dplyr)
library(Seurat)
library(patchwork)
# Load the PBMC dataset
pbmc.data <- Read10X(data.dir = "~/BD/harmony/filtered_matrices_mex/hg19/")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
pbmc
## An object of class Seurat
## 16971 features across 3388 samples within 1 assay
## Active assay: RNA (16971 features, 0 variable features)
## 1 layer present: counts
And now we can use pbmc.data as input for SWeeP. But first, we must scale the projection matrix according to the dimensions of the matrix to be projected, the pbmc
psz=10
base160k = orthBase(dim(pbmc.data)[1],psz)
sw = SWeeP(input=pbmc.data,orthbase=base160k,transpose=T,verbose=F)
## Starting projection. Please wait.
## Projecting ...
If the data is not transposed (each line corresponds to a sample), use FALSE for both parameters. A warning message will be displayed, just ignore it.
psz=10
base160k = orthBase(dim(pbmc.data)[2],psz) # note that if I don't transpose, the columns change
sw = SWeeP(input=pbmc.data,orthbase=base160k,transpose=F,norm='log',verbose=F)
## Caution!
## If your input is of the RNAseq type, probably each column contains a sample, and each row a gene.
## In this case, use the 'transpose=TRUE' option.
##
## Starting projection. Please wait.
## Projecting ...
Other types of matrices can be projected (dimension reduction). The only requirement for the analysis of this data to be valid is that the input matrix has structured features, i.e. each sample is represented by the same set of features. There is also the option of transposing the matrix if necessary.
psz=10
nfeat = 100
nsamples = 5
input = matrix(runif( (nsamples*nfeat) ,0,10), nrow=nsamples, ncol = nfeat)
base160k = orthBase(nfeat,psz)
sw = SWeeP(input=input,orthbase=base160k,transpose=F)
## Starting projection. Please wait.
## Projecting ...
SWeePlite
The lite version of the SWeeP tool does not require an orthBase matrix to be generated or supplied, as this version generates it internally. This version also allows the use of larger masks, without compromising the use of the computer’s RAM if it is limited.
Just like the SWeeP function, SWeePlite supports different types of input. Here are some examples of the supported formats. Consider the following parameters:
mask = c(2,1,2)
psz = 500
Firstly, the biological sequences in BString
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
input = Biostrings::readBStringSet(paste(path,'11_Pan_paniscus.faa',sep=''))
sw = SWeePlite(input,seqtype='AA',mask=mask,psz=psz)
## Starting projection. Please wait.
## starting sequence 1 of 13 - complete
## starting sequence 2 of 13 - complete
## starting sequence 3 of 13 - complete
## starting sequence 4 of 13 - complete
## starting sequence 5 of 13 - complete
## starting sequence 6 of 13 - complete
## starting sequence 7 of 13 - complete
## starting sequence 8 of 13 - complete
## starting sequence 9 of 13 - complete
## starting sequence 10 of 13 - complete
## starting sequence 11 of 13 - complete
## starting sequence 12 of 13 - complete
## starting sequence 13 of 13 - complete
There is also the concatenation option.
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
input = Biostrings::readBStringSet(paste(path,'11_Pan_paniscus.faa',sep=''))
sw = SWeePlite(input,seqtype='AA',mask=mask,psz=psz,concatenate=T)
## Starting projection. Please wait.
## starting sequence 1 of 1 - complete
Address type input for directory with FASTAS files, each file corresponding to a vector to be generated. All the FASTAS are then loaded into memory and processed.
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
sw = SWeePlite(path,seqtype='AA',bin=T,mask=mask,psz=500,lowRAMmode=F)
## Starting projection. Please wait.
## starting sequence 1 of 14 - complete
## starting sequence 2 of 14 - complete
## starting sequence 3 of 14 - complete
## starting sequence 4 of 14 - complete
## starting sequence 5 of 14 - complete
## starting sequence 6 of 14 - complete
## starting sequence 7 of 14 - complete
## starting sequence 8 of 14 - complete
## starting sequence 9 of 14 - complete
## starting sequence 10 of 14 - complete
## starting sequence 11 of 14 - complete
## starting sequence 12 of 14 - complete
## starting sequence 13 of 14 - complete
## starting sequence 14 of 14 - complete
And for large volumes of data, the lowRAMmode=TRUE option for processing each FASTA individually.
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
sw = SWeePlite(path,seqtype='AA',bin=T,mask=mask,psz=500,lowRAMmode=T)
## Starting projection. Please wait.
## starting sequence 1 of 14 - complete
## starting sequence 2 of 14 - complete
## starting sequence 3 of 14 - complete
## starting sequence 4 of 14 - complete
## starting sequence 5 of 14 - complete
## starting sequence 6 of 14 - complete
## starting sequence 7 of 14 - complete
## starting sequence 8 of 14 - complete
## starting sequence 9 of 14 - complete
## starting sequence 10 of 14 - complete
## starting sequence 11 of 14 - complete
## starting sequence 12 of 14 - complete
## starting sequence 13 of 14 - complete
## starting sequence 14 of 14 - complete
To visualize the projected data, we provide the metadata and plot the first two components via PCA:
pathmetadata <- system.file(package = "rSWeeP" , "examples" , "metadata_mitochondrial.csv")
mt = read.csv(pathmetadata,header=TRUE)
vunq = c('Hominidae','Psittacidae','Muridae','Bovidae')
cl = mt$id
ncl = length(vunq)
pca_output <- prcomp (sw$proj , scale = FALSE)
par(mfrow=c(1,2))
plot(pca_output$x[,1] , pca_output$x [,2] , xlab = 'PC-1' , ylab = 'PC-2' , pch =20 , col = cl)
legend("bottomright" , vunq , col = as.character(c(1:ncl)) , pch =20)
plot(pca_output$x[,3] , pca_output$x [,4] , xlab = 'PC-3' , ylab = 'PC-4' , pch =20 , col = cl)
For dimension reduction of RNAseq data from a supplied expression matrix format dgCMatrix (Seurat package) or matrix format. Use the parameter transpose=TRUE or RNAseqdata=TRUE if samples (cells) are in columns and the transcribed genes (features) in the rows.
As an example of data we can use the data provided by the Seurat package itself, pbmc3k, as available at link.
library(dplyr)
library(Seurat)
library(patchwork)
# Load the PBMC dataset
pbmc.data <- Read10X(data.dir = "~/BD/harmony/filtered_matrices_mex/hg19/")
# Initialize the Seurat object with the raw (non-normalized data).
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
pbmc
## An object of class Seurat
## 16971 features across 3388 samples within 1 assay
## Active assay: RNA (16971 features, 0 variable features)
## 1 layer present: counts
This step can take several minutes. Although slower, it is ideal for large data sets with limited memory usage.
psz=10
sw = SWeePlite(input=pbmc.data,transpose=T,psz=psz,verbose=F)
Finally, a generic structured matrix is allowed. The format of the
matrix must correspond to a sample in each row and a feature in each
column. But if the matrix is transposed, use
transpose=TRUE
.
psz=10
nfeat = 100
nsamples = 5
input = matrix(runif( (nsamples*nfeat) ,0,10), nrow=nsamples, ncol = nfeat)
sw = SWeePlite(input=input,psz=10,transpose=F)
## starting sequence 1 of 5 - complete
## starting sequence 2 of 5 - complete
## starting sequence 3 of 5 - complete
## starting sequence 4 of 5 - complete
## starting sequence 5 of 5 - complete
extractHDV
The extractHDV
returns a vector for each sequence, but
not the compact one (LDV). The HDV corresponds to the high-dimensional
vector representing the k-mer pattern count of the given sequence. The
HDV is best described in Figure from section
2
path = paste (system.file("examples/aaMitochondrial/",package = "rSWeeP"),'/', sep = '')
mask = c(2,1,2)
output = extractHDV(input=path,mask=mask,seqtype='AA',bin=FALSE,extension=c('.faa','.fas','.fasta'))
## starting sequence 1 of 14 - complete
## starting sequence 2 of 14 - complete
## starting sequence 3 of 14 - complete
## starting sequence 4 of 14 - complete
## starting sequence 5 of 14 - complete
## starting sequence 6 of 14 - complete
## starting sequence 7 of 14 - complete
## starting sequence 8 of 14 - complete
## starting sequence 9 of 14 - complete
## starting sequence 10 of 14 - complete
## starting sequence 11 of 14 - complete
## starting sequence 12 of 14 - complete
## starting sequence 13 of 14 - complete
## starting sequence 14 of 14 - complete
The dimension of the HDV matrix for a 4-mer mask applied to amino acids is
dim(output$HDV)
## [1] 14 160000
The information provided by the output
output$info
## $headers
## [1] "01_Pan_troglodytes" "02_Capra_aegagrus"
## [3] "03_Homo_sapiens" "04_Bos_taurus"
## [5] "05_Ara_ararauna" "06_Mus_musculus"
## [7] "07_Brotogeris_cyanoptera" "08_Homo_sapiens_neanderthalensis"
## [9] "09_Gazella_gazella" "10_Rattus_norvegicus"
## [11] "11_Pan_paniscus" "12_Psittacara_rubritorquis"
## [13] "13_Apodemus_sylvaticus" "14_Rhazya_stricta"
##
## $mask
## [1] 1 1 0 1 1
##
## $SequenceType
## [1] "AA"
##
## $extension
## [1] ".faa" ".fas" ".fasta"
##
## $concatenate
## [1] FALSE
##
## $bin
## [1] "counting (FALSE)"
##
## $version
## [1] "SWeeP v2.8"
##
## $saturation
## [1] 0.01241875 0.01256875 0.01240625 0.01260000 0.01223125 0.01255000
## [7] 0.01198125 0.01236250 0.01250000 0.01237500 0.01239375 0.01207500
## [13] 0.01238125 0.03448125
##
## $timeElapsed
## [1] 0.293
The output presents:
output$HDV
: a matrix
containing the High
Dimensional Vectors of the given FASTAS. The dimension of the vector
corresponds to \(20^k\) for amino acids
and \(4^k\) for nucleotides. \(k\) corresponds to the k-mer of the mask
used.output$info
aditional information of the process. This
object is subdivided in:
character
containing the list of
samplesinteger
containing the mask usedcharacter
containing the type of the
sequence (amino acid: AA, ou nucleotide: NT)character
containing the list of
extensions consideredcharacter
containing if binary or countingvector
containing the filled (non-zero)
percentage of the HDV for each sampledouble
containing the elapsed time in
secondsSaturation is important information about the mask. When, for all sequences, the saturation is very high (up to 95%), the mask needs to have a higher k-mer.
Other important parameters for optimizing the process of
SWeeP
and SweePlite
functions:
extension
provide the file extension if necessary,
for example extension = c('.faa','.fasta')
. To consider all
the files in the folder, you can simply ignore the parameter or provide
extension=''
so that all the files in the folder are
considered. [for SweeP and SWeePlite]
lowRAMmode = TRUE
reduces memory consumption during
processing by loading one sequence at a time from the file. Ideal when
you want to analyze a large number of sequences (several FASTAS) or when
the files are very large (long genomic FASTAS). [for SweeP and
SWeePlite]
nk = 15000
by default. The higher the
nk
, the greater the memory consumption but the shorter the
processing time. A vectorization with a 5-mer mask and nk=15000 can run
on a machine with 8Gb of memory, with the use of 2 cores. [only for
SWeePlite]
ncores=2
by default. Number of cores to be used by
the process. [for SweeP and SWeePlite]
More information on how to use each function is available in the
package manual or at ?rSWeeP
.
sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: LMDE 5 (elsie)
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=pt_BR.UTF-8
## [5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=pt_BR.UTF-8
## [7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Sao_Paulo
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] patchwork_1.2.0 Seurat_5.0.3 SeuratObject_5.0.1
## [4] sp_2.1-4 dplyr_1.1.4 ape_5.8
## [7] rSWeeP_2.9 Biostrings_2.72.0 GenomeInfoDb_1.40.0
## [10] XVector_0.44.0 IRanges_2.38.0 S4Vectors_0.42.0
## [13] BiocGenerics_0.50.0 doParallel_1.0.17 iterators_1.0.14
## [16] foreach_1.5.2 knitr_1.46
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.8
## [4] magrittr_2.0.3 spatstat.utils_3.0-4 rmarkdown_2.26
## [7] fs_1.6.4 zlibbioc_1.50.0 vctrs_0.6.5
## [10] ROCR_1.0-11 spatstat.explore_3.2-7 memoise_2.0.1
## [13] htmltools_0.5.8.1 usethis_2.2.3 sctransform_0.4.1
## [16] sass_0.4.9 parallelly_1.37.1 KernSmooth_2.23-22
## [19] bslib_0.7.0 htmlwidgets_1.6.4 desc_1.4.3
## [22] ica_1.0-3 plyr_1.8.9 plotly_4.10.4
## [25] zoo_1.8-12 cachem_1.0.8 igraph_2.0.3
## [28] mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
## [31] Matrix_1.7-0 R6_2.5.1 fastmap_1.2.0
## [34] GenomeInfoDbData_1.2.12 fitdistrplus_1.1-11 future_1.33.2
## [37] shiny_1.8.1.1 digest_0.6.35 colorspace_2.1-0
## [40] tensor_1.5 rprojroot_2.0.4 RSpectra_0.16-1
## [43] irlba_2.3.5.1 pkgload_1.3.4 progressr_0.14.0
## [46] spatstat.sparse_3.0-3 fansi_1.0.6 polyclip_1.10-6
## [49] abind_1.4-5 httr_1.4.7 compiler_4.4.0
## [52] remotes_2.5.0 withr_3.0.0 fastDummies_1.7.3
## [55] pkgbuild_1.4.4 highr_0.10 R.utils_2.12.3
## [58] MASS_7.3-60.2 sessioninfo_1.2.2 tools_4.4.0
## [61] lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.2
## [64] goftest_1.2-3 R.oo_1.26.0 glue_1.7.0
## [67] nlme_3.1-164 promises_1.3.0 grid_4.4.0
## [70] Rtsne_0.17 reshape2_1.4.4 cluster_2.1.6
## [73] generics_0.1.3 spatstat.data_3.0-4 gtable_0.3.5
## [76] R.methodsS3_1.8.2 tidyr_1.3.1 data.table_1.15.4
## [79] utf8_1.2.4 spatstat.geom_3.2-9 RcppAnnoy_0.0.22
## [82] ggrepel_0.9.5 RANN_2.6.1 pillar_1.9.0
## [85] stringr_1.5.1 spam_2.10-0 RcppHNSW_0.6.0
## [88] later_1.3.2 splines_4.4.0 lattice_0.22-6
## [91] deldir_2.0-4 survival_3.6-4 tidyselect_1.2.1
## [94] miniUI_0.1.1.1 pbapply_1.7-2 gridExtra_2.3
## [97] scattermore_1.2 xfun_0.43 devtools_2.4.5
## [100] matrixStats_1.3.0 stringi_1.8.3 UCSC.utils_1.0.0
## [103] lazyeval_0.2.2 yaml_2.3.8 evaluate_0.23
## [106] codetools_0.2-20 tibble_3.2.1 cli_3.6.3
## [109] uwot_0.2.2 xtable_1.8-4 reticulate_1.38.0
## [112] munsell_0.5.1 jquerylib_0.1.4 Rcpp_1.0.12
## [115] spatstat.random_3.2-3 globals_0.16.3 png_0.1-8
## [118] ellipsis_0.3.2 ggplot2_3.5.1 dotCall64_1.1-1
## [121] profvis_0.3.8 urlchecker_1.0.1 listenv_0.9.1
## [124] viridisLite_0.4.2 scales_1.3.0 ggridges_0.5.6
## [127] leiden_0.4.3.1 purrr_1.0.2 crayon_1.5.2
## [130] rlang_1.1.4 cowplot_1.1.3
Proteomes refer to the complete set of CDSs translated into proteins from the complete genome.↩︎