| Title: | Collection of Data Structures |
|---|---|
| Description: | A collection of functions to generate a large variety of structures in high dimensions. These data structures are useful for testing, validating, and improving algorithms used in dimensionality reduction, clustering, machine learning, and visualization. |
| Authors: | Jayani P. Gamage [aut, cre] (ORCID: <https://orcid.org/0000-0002-6265-6481>), Dianne Cook [aut] (ORCID: <https://orcid.org/0000-0002-3813-7155>), Paul Harrison [aut] (ORCID: <https://orcid.org/0000-0002-3980-268X>), Michael Lydeamore [aut] (ORCID: <https://orcid.org/0000-0001-6515-827X>), Thiyanga S. Talagala [aut] (ORCID: <https://orcid.org/0000-0002-0656-9789>) |
| Maintainer: | Jayani P. Gamage <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.6 |
| Built: | 2026-05-13 05:03:27 UTC |
| Source: | https://github.com/jayanilakshika/cardinalr |
This function generates background noise data with specified parameters such as the number of samples, number of dimensions, mean, and standard deviation.
gen_bkgnoise(n = 500, p = 4, m = rep(0, p), s = rep(2, p))gen_bkgnoise(n = 500, p = 4, m = rep(0, p), s = rep(2, p))
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
m |
A numeric vector (default: c(0, 0, 0, 0)) representing the mean along each dimensions. |
s |
A numeric vector (default: c(2, 2, 2, 2)) representing the standard deviation along each dimensions. |
A data containing the generated background noise data.
# Generate background noise with custom mean and standard deviation set.seed(20240412) gen_bkgnoise(n = 500, p = 4, m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))# Generate background noise with custom mean and standard deviation set.seed(20240412) gen_bkgnoise(n = 500, p = 4, m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
This function generates a dataset representing a structure with a circle.
gen_circle(n = 500, p = 4)gen_circle(n = 500, p = 4)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
A data containing a circle.
set.seed(20240412) circle <- gen_circle(n = 500, p = 4)set.seed(20240412) circle <- gen_circle(n = 500, p = 4)
This function generates a dataset representing a structure with a small and big spheres.
gen_clusteredspheres( n_vec = c(1000, 100), k_small = 3, r_vec = c(15, 3), sep = 10/sqrt(3) )gen_clusteredspheres( n_vec = c(1000, 100), k_small = 3, r_vec = c(15, 3), sep = 10/sqrt(3) )
n_vec |
A numeric vector (default: c(1000, 100)) representing the sample sizes of the big and small spheres respectively. |
k_small |
A numeric value (default: 3) representing the number of small spheres. |
r_vec |
A numeric vector (default: c(15, 3)) representing the radius of the big and small spheres respectively. |
sep |
A numeric value (default: 10 / sqrt(3) representing how far the small spheres are placed from each other. |
A data containing small spheres within a big sphere.
set.seed(20240412) clusteredspheres <- gen_clusteredspheres(n_vec = c(1000, 100), k_small = 3, r_vec = c(15, 3), sep = 10 / sqrt(3))set.seed(20240412) clusteredspheres <- gen_clusteredspheres(n_vec = c(1000, 100), k_small = 3, r_vec = c(15, 3), sep = 10 / sqrt(3))
This function generate locations for any number of clusters in any dimensions.
gen_clustloc(p = 4, k = 3)gen_clustloc(p = 4, k = 3)
p |
A numeric value (default: 4) representing the number of dimensions. |
k |
A numeric value (default: 3) representing the number of clusters. |
A matrix of the locations.
set.seed(20240412) gen_clustloc(p = 4, k = 3)set.seed(20240412) gen_clustloc(p = 4, k = 3)
This function generates a dataset representing a cone with the option of a sharp or blunted apex.
gen_cone(n = 500, p = 4, h = 5, ratio = 0.5)gen_cone(n = 500, p = 4, h = 5, ratio = 0.5)
n |
An integer value (default: 500) representing the sample size. |
p |
An integervalue (default: 4) representing the number of dimensions. |
h |
A numeric value (default: 5) representing the height of the cone. |
ratio |
A numeric value (default: 0.5) representing the radius tip to radius base ratio of the cone. Should be less than 1. |
A tibble containing the cone with the option of a sharp or blunted apex.
set.seed(20240412) cone <- gen_cone(n = 500, p = 4, h = 5, ratio = 0.5)set.seed(20240412) cone <- gen_cone(n = 500, p = 4, h = 5, ratio = 0.5)
This function generates a dataset representing a conical spiral structure.
gen_conicspiral(n = 500, spins = 1)gen_conicspiral(n = 500, spins = 1)
n |
A numeric value (default: 500) representing the sample size. |
spins |
A numeric value (default: 1) representing the number of loops of the spiral. |
A data containing a conical spiral structure.
set.seed(20240412) conicspiral <- gen_conicspiral(n = 500, spins = 1)set.seed(20240412) conicspiral <- gen_conicspiral(n = 500, spins = 1)
This function generates a dataset representing a structure with a Crescent pattern.
gen_crescent(n = 500)gen_crescent(n = 500)
n |
An integer value (default: 500) representing the sample size. |
A tibble containing a Crescent structure.
set.seed(20240412) crescent <- gen_crescent(n = 500)set.seed(20240412) crescent <- gen_crescent(n = 500)
This function generates a dataset representing a structure with a cubic pattern.
gen_cubic(n = 500, range = c(-1, 2))gen_cubic(n = 500, range = c(-1, 2))
n |
A numeric value (default: 500) representing the sample size. |
range |
A numeric vector (default: c(-1, 2)) representing the range along x1 axis. |
A data containing a cubic structure.
set.seed(20240412) cubic <- gen_cubic(n = 500)set.seed(20240412) cubic <- gen_cubic(n = 500)
This function generates a dataset representing a structure with non-linear shaped branches.
gen_curvybranches(n = 400, k = 4)gen_curvybranches(n = 400, k = 4)
n |
A numeric value (default: 400) representing the sample size. |
k |
A numeric value (default: 4) representing the number of branches. |
A data containing non-linear shaped branches.
set.seed(20240412) curvybranches <- gen_curvybranches(n = 400, k = 4)set.seed(20240412) curvybranches <- gen_curvybranches(n = 400, k = 4)
This function generates a dataset representing a structure with a curvy cell cycle.
gen_curvycycle(n = 500, p = 4)gen_curvycycle(n = 500, p = 4)
n |
An integer value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
A data containing a curvy cell cycle.
set.seed(20240412) curvycycle <- gen_curvycycle(n = 500, p = 4)set.seed(20240412) curvycycle <- gen_curvycycle(n = 500, p = 4)
This function generates a dataset representing a structure with a curvy cylinder.
gen_curvycylinder(n = 500, h = 10)gen_curvycylinder(n = 500, h = 10)
n |
An integer value (default: 500) representing the sample size. |
h |
A numeric value (default: 10) representing the height of the cylinder. |
A data containing a curvy cylinder.
set.seed(20240412) curvycylinder <- gen_curvycylinder(n = 500, h = 10)set.seed(20240412) curvycylinder <- gen_curvycylinder(n = 500, h = 10)
This function generates a dataset representing a structure with exponential shaped branches.
gen_expbranches(n = 400, k = 4)gen_expbranches(n = 400, k = 4)
n |
An integer value (default: 400) representing the sample size. |
k |
An integer value (default: 4) representing the number of branches. |
A tibble containing exponential shaped branches.
set.seed(20240412) expbranches <- gen_expbranches(n = 400, k = 4)set.seed(20240412) expbranches <- gen_expbranches(n = 400, k = 4)
This function generates a dataset representing a structure with a Gaussian.
gen_gaussian(n = 500, p = 4, m = rep(0, p), s = diag(p) * 0.01)gen_gaussian(n = 500, p = 4, m = rep(0, p), s = diag(p) * 0.01)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
m |
A numeric vector (default: c(0, 0, 0, 0)) representing the mean along each dimensions. |
s |
A numeric matrix (default: diag(4) * 0.01) representing the variance of along each dimension. |
A tibble containing a multivariate Gaussian cloud dataset.
set.seed(20240412) gaussian <- gen_gaussian(n = 500, p = 4, m = rep(0, 4), s = diag(4))set.seed(20240412) gaussian <- gen_gaussian(n = 500, p = 4, m = rep(0, 4), s = diag(4))
This function generates a grid dataset with specified grid points along each axes.
gen_gridcube(n = 500, p = 4)gen_gridcube(n = 500, p = 4)
n |
An integer vector (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
A tibble containing the cube with grid points.
set.seed(20240412) gridcube <- gen_gridcube(n = 500, p = 4)set.seed(20240412) gridcube <- gen_gridcube(n = 500, p = 4)
This function generates a dataset representing a structure with a grided sphere.
gen_gridedsphere(n = 500, p = 4)gen_gridedsphere(n = 500, p = 4)
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
A data containing a grided sphere.
set.seed(20240412) gridedsphere <- gen_gridedsphere(n = 500, p = 4)set.seed(20240412) gridedsphere <- gen_gridedsphere(n = 500, p = 4)
This function generates a dataset representing a structure with a helical hyper spiral.
gen_helicalspiral(n = 500)gen_helicalspiral(n = 500)
n |
A numeric value (default: 500) representing the sample size. |
A data containing a helical hyper spiral.
set.seed(20240412) helicalspiral <- gen_helicalspiral(n = 500)set.seed(20240412) helicalspiral <- gen_helicalspiral(n = 500)
This function generates a dataset representing a structure with a hemisphere.
gen_hemisphere(n = 500)gen_hemisphere(n = 500)
n |
A numeric value (default: 500) representing the sample size. |
A data containing a hemisphere.
set.seed(20240412) hemisphere <- gen_hemisphere(n = 500)set.seed(20240412) hemisphere <- gen_hemisphere(n = 500)
Remove points within a spherical hole in the middle
gen_hole(df, anchor = NULL, r = 0.5)gen_hole(df, anchor = NULL, r = 0.5)
df |
A tibble of coordinates. |
anchor |
A numeric vector giving the center of the hole. |
r |
A numeric value for the hole radius. |
A tibble with the hole removed.
set.seed(20240412) df <- gen_scurve(n = 1000) gen_hole(df, r = 0.5)set.seed(20240412) df <- gen_scurve(n = 1000) gen_hole(df, r = 0.5)
This function generates a dataset representing a structure with a sphere with points on the surface.
gen_hollowsphere(n = 500, p = 4)gen_hollowsphere(n = 500, p = 4)
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
A data containing a hollow sphere.
set.seed(20240412) hollowsphere <- gen_hollowsphere(n = 500)set.seed(20240412) hollowsphere <- gen_hollowsphere(n = 500)
This function generates a dataset representing a structure with linear shaped branches.
gen_linearbranches(n = 400, k = 4)gen_linearbranches(n = 400, k = 4)
n |
A numeric value (default: 400) representing the sample size. |
k |
A numeric value (default: 4) representing the number of branches. |
A data containing linear shaped branches.
set.seed(20240412) linearbranches <- gen_linearbranches(n = 400, k = 4)set.seed(20240412) linearbranches <- gen_linearbranches(n = 400, k = 4)
This function generates a dataset consisting of long linear data.
gen_longlinear(n = 500, p = 4)gen_longlinear(n = 500, p = 4)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
A tibble containing the long linear data.
set.seed(20240412) longlinear <- gen_longlinear(n = 500, p = 4)set.seed(20240412) longlinear <- gen_longlinear(n = 500, p = 4)
This function generates a dataset representing a structure with a mobius.
gen_mobius(n = 500)gen_mobius(n = 500)
n |
An integer value (default: 500) representing the sample size. |
A tibble containing a mobius structure.
set.seed(20240412) mobius <- gen_mobius(n = 500)set.seed(20240412) mobius <- gen_mobius(n = 500)
This function generates a dataset with multiple clusters in high-dimensional space. Each cluster can have a different shape, scale, rotation, and centroid, allowing the construction of complex synthetic datasets.
gen_multicluster( n = c(200, 300, 500), k = 3, loc = matrix(c(0, 0, 0, 0, 5, 9, 0, 0, 3, 4, 10, 7), nrow = 3, byrow = TRUE), scale = c(3, 1, 2), shape = c("gaussian", "bluntedcorn", "unifcube"), rotation = NULL, add_bkg = FALSE, ... )gen_multicluster( n = c(200, 300, 500), k = 3, loc = matrix(c(0, 0, 0, 0, 5, 9, 0, 0, 3, 4, 10, 7), nrow = 3, byrow = TRUE), scale = c(3, 1, 2), shape = c("gaussian", "bluntedcorn", "unifcube"), rotation = NULL, add_bkg = FALSE, ... )
n |
An integer vector (default: c(200, 500, 300)) representing the sample sizes
for each cluster. Must have length |
k |
An integervalue (default: 3) representing the number of clusters. |
loc |
A numeric matrix giving the centroids of the clusters.
The number of rows must equal |
scale |
A numeric vector (default: c(3, 1, 2)) giving the scaling factors
for each cluster. Must have length |
shape |
A character vector (default: c("gaussian", "cone", "unifcube"))
specifying the generator function to use for each cluster. Must have length |
rotation |
A list of rotation matrices (one per cluster), or |
add_bkg |
Logical (default: FALSE). If |
... |
Additional arguments passed to the cluster generator functions. |
A tibble containing all generated clusters, with columns x1, x2, ...
for dimensions and a cluster label.
set.seed(20240412) # Example rotation matrices for 4D space rot1 <- gen_rotation(p = 4, planes_angles = list(list(plane = c(1, 2), angle = 60), list(plane = c(3, 4), angle = 90))) rot2 <- gen_rotation(p = 4, planes_angles = list(list(plane = c(1, 3), angle = 30))) rot3 <- gen_rotation(p = 4, planes_angles = list(list(plane = c(2, 4), angle = 45))) clust_data <- gen_multicluster( n = c(200, 300, 500), k = 3, loc = matrix(c( 0, 0, 0, 0, 5, 9, 0, 0, 3, 4, 10, 7 ), nrow = 3, byrow = TRUE), scale = c(3, 1, 2), shape = c("gaussian", "cone", "unifcube"), rotation = list(rot1, rot2, rot3), add_bkg = FALSE )set.seed(20240412) # Example rotation matrices for 4D space rot1 <- gen_rotation(p = 4, planes_angles = list(list(plane = c(1, 2), angle = 60), list(plane = c(3, 4), angle = 90))) rot2 <- gen_rotation(p = 4, planes_angles = list(list(plane = c(1, 3), angle = 30))) rot3 <- gen_rotation(p = 4, planes_angles = list(list(plane = c(2, 4), angle = 45))) clust_data <- gen_multicluster( n = c(200, 300, 500), k = 3, loc = matrix(c( 0, 0, 0, 0, 5, 9, 0, 0, 3, 4, 10, 7 ), nrow = 3, byrow = TRUE), scale = c(3, 1, 2), shape = c("gaussian", "cone", "unifcube"), rotation = list(rot1, rot2, rot3), add_bkg = FALSE )
This function generates random noise dimensions to be added to the coordinates of a data structure.
gen_noisedims(n = 500, p = 4, m = rep(0, p), s = rep(2, p))gen_noisedims(n = 500, p = 4, m = rep(0, p), s = rep(2, p))
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
m |
A numeric vector (default: c(0, 0, 0, 0)) representing the mean along each dimensions. |
s |
A numeric vector (default: c(2, 2, 2, 2)) representing the standard deviation along each dimensions. |
A data containing the generated random noise dimensions.
set.seed(20240412) gen_noisedims(n = 500, p = 4, m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))set.seed(20240412) gen_noisedims(n = 500, p = 4, m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
This function generates a dataset representing a nonlinear hyperbola structure.
gen_nonlinear(n = 500, hc = 1, non_fac = 0.5)gen_nonlinear(n = 500, hc = 1, non_fac = 0.5)
n |
A numeric value (default: 500) representing the sample size. |
hc |
A numeric value (default: 1) representing the hyperbolic component which define the steepness and vertical scaling of the hyperbola. Larger values of this make the curve more pronounced (sharper dips/rises near 0), while smaller values make it flatter. |
non_fac |
A numeric value (default: 1) representing the nonlinear factor which describes the strength of this sinusoidal effect. When this is 0, the curve is purely hyperbolic; as it increases, the wave-like fluctuations become more prominent. |
A data containing a nonlinear hyperbola structure.
set.seed(20240412) nonlinear <- gen_nonlinear(n = 500, hc = 1, non_fac = 0.5)set.seed(20240412) nonlinear <- gen_nonlinear(n = 500, hc = 1, non_fac = 0.5)
This function takes a target integer 'n' and the number of dimensions 'p', and returns a vector 'n_vec' of length 'p' containing positive integers. The goal is to have the product of the elements in 'n_vec' be as close as possible to 'n', especially when 'n' is not a perfect p-th power.
gen_nproduct(n = 500, p = 4)gen_nproduct(n = 500, p = 4)
n |
The target positive integer value for the product of the output vector. |
p |
The number of dimensions (the length of the output vector). Must be a positive integer. |
A sorted vector of positive integers of length 'p'. The product of the elements in this vector will be approximately equal to 'n'. If 'n' is a perfect p-th power, the elements will be equal.
gen_nproduct(500, 6) # Example with n=500, p=6 gen_nproduct(700, 4) # Example with n=700, p=4 gen_nproduct(625, 4) # Example with n=625 (perfect power) gen_nproduct(30, 3) # Example with n=30, p=3 gen_nproduct(7, 2) # Example where exact product might be hardgen_nproduct(500, 6) # Example with n=500, p=6 gen_nproduct(700, 4) # Example with n=700, p=4 gen_nproduct(625, 4) # Example with n=625 (perfect power) gen_nproduct(30, 3) # Example with n=30, p=3 gen_nproduct(7, 2) # Example where exact product might be hard
This function takes a target integer 'n' and the number of clusters 'k', and returns a vector 'n_vec' of length 'k' containing positive integers. The goal is to have the summation of the elements in 'n_vec' be as close as possible to 'n', especially when 'n' is not a perfect multiplier of 'k'.
gen_nsum(n = 500, k = 4)gen_nsum(n = 500, k = 4)
n |
The target positive integer value for the summation of the output vector. |
k |
The number of dimensions (the length of the output vector). Must be a positive integer. |
A sorted vector of positive integers of length 'k'. The summation of the elements in this vector will be approximately equal to 'n'. If 'n' is a perfectly divisible by 'k', the elements will be equal.
gen_nsum(500, 6) # Example with n=500, p=6 gen_nsum(700, 4) # Example with n=700, p=4 gen_nsum(625, 5) # Example with n=625 (perfect division) gen_nsum(30, 3) # Example with n=30, p=3gen_nsum(500, 6) # Example with n=500, p=6 gen_nsum(700, 4) # Example with n=700, p=4 gen_nsum(625, 5) # Example with n=625 (perfect division) gen_nsum(30, 3) # Example with n=30, p=3
This function generates a dataset representing a structure with curvy shaped branches.
gen_orgcurvybranches(n = 400, p = 4, k = 4, allow_share = TRUE)gen_orgcurvybranches(n = 400, p = 4, k = 4, allow_share = TRUE)
n |
A numeric value (default: 400) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
k |
A numeric value (default: 4) representing the number of branches. |
allow_share |
A logical value (default: TRUE). If TRUE, multiple branches may share the same 2D subspace. If FALSE, branches are sampled without replacement from all possible 2D subspaces until exhausted. |
A data containing curvy shaped branches originated in one point.
set.seed(20240412) orgcurvybranches <- gen_orgcurvybranches(n = 400, k = 4)set.seed(20240412) orgcurvybranches <- gen_orgcurvybranches(n = 400, k = 4)
This function generates a dataset representing a structure with linear shaped branches.
gen_orglinearbranches(n = 400, p = 4, k = 4, allow_share = TRUE)gen_orglinearbranches(n = 400, p = 4, k = 4, allow_share = TRUE)
n |
A numeric value (default: 400) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
k |
A numeric value (default: 4) representing the number of branches. |
allow_share |
A logical value (default: TRUE). If TRUE, multiple branches may share the same 2D subspace. If FALSE, branches are sampled without replacement from all possible 2D subspaces until exhausted. |
A data containing linear shaped branches originated in one point.
set.seed(20240412) orglinearbranches <- gen_orglinearbranches(n = 400, p = 4, k = 4)set.seed(20240412) orglinearbranches <- gen_orglinearbranches(n = 400, p = 4, k = 4)
This function generates p-D triangular pyramid with triangular pyramid shaped holes.
gen_pyrfrac(n = 500, p = 4)gen_pyrfrac(n = 500, p = 4)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
A data containing a triangular pyramid with triangular pyramid shaped holes.
set.seed(20240412) pyrfrac <- gen_pyrfrac(n = 500, p = 3)set.seed(20240412) pyrfrac <- gen_pyrfrac(n = 500, p = 3)
This function generates a dataset representing a rectangular based pyramid.
gen_pyrrect(n = 500, p = 4, h = 5, l_vec = c(3, 2), rt = 0.5)gen_pyrrect(n = 500, p = 4, h = 5, l_vec = c(3, 2), rt = 0.5)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
h |
A numeric value (default: 5) representing the height of the pyramid. |
l_vec |
A numeric vector (default: c(3, 2)) representing the base lengths along the and y of the pyramid. |
rt |
A numeric value (default: 0.5) representing the tip radius of the pyramid. |
A tibble containing the rectangular-based pyramid.
set.seed(20240412) pyrrect <- gen_pyrrect(n = 500, p = 4, h = 5, l_vec = c(3, 2), rt = 0.5)set.seed(20240412) pyrrect <- gen_pyrrect(n = 500, p = 4, h = 5, l_vec = c(3, 2), rt = 0.5)
This function generates a dataset representing a star based pyramid.
gen_pyrstar(n = 500, p = 4, h = 5, rb = 3)gen_pyrstar(n = 500, p = 4, h = 5, rb = 3)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
h |
A numeric value (default: 5) representing the height of the pyramid. |
rb |
A numeric value (default: 3) representing the base radius of the pyramid. |
A data containing the star based pyramid.
set.seed(20240412) pyrstar <- gen_pyrstar(n = 500, p = 4, h = 5, rb = 3)set.seed(20240412) pyrstar <- gen_pyrstar(n = 500, p = 4, h = 5, rb = 3)
This function generates a dataset representing a triangular based pyramid.
gen_pyrtri(n = 500, p = 4, h = 5, l = 3, rt = 0.5)gen_pyrtri(n = 500, p = 4, h = 5, l = 3, rt = 0.5)
n |
An integer value (default: 500) representing the sample size. |
p |
An integer value (default: 4) representing the number of dimensions. |
h |
A numeric value (default: 5) representing the height of the pyramid. |
l |
A numeric value (default: 3) representing the base length of the pyramid. |
rt |
A numeric value (default: 0.5) representing the tip radius of the pyramid. |
A data containing the triangular based pyramid.
set.seed(20240412) pyrtri <- gen_pyrtri(n = 500, p = 4, h = 5, l = 3, rt = 0.5)set.seed(20240412) pyrtri <- gen_pyrtri(n = 500, p = 4, h = 5, l = 3, rt = 0.5)
This function generates a dataset representing a structure with a quadratic pattern.
gen_quadratic(n = 500, range = c(-1, 1))gen_quadratic(n = 500, range = c(-1, 1))
n |
A integer value (default: 500) representing the sample size. |
range |
A numeric vector (default: c(-1, 1)) representing the range along x1 axis. |
A tibble containing a quadratic structure.
set.seed(20240412) quadratic <- gen_quadratic(n = 500)set.seed(20240412) quadratic <- gen_quadratic(n = 500)
This function generates a rotation matrix.
gen_rotation(p = 4, planes_angles)gen_rotation(p = 4, planes_angles)
p |
A numeric value (default: 4) representing the number of dimensions. |
planes_angles |
A numeric list which contains plane and the corresponding angle along that plane. |
A matrix containing the rotations.
set.seed(20240412) rotations_4d <- list( list(plane = c(1, 2), angle = 60), # Rotation in the (1, 2) plane list(plane = c(3, 4), angle = 90) # Rotation in the (3, 4) plane ) gen_rotation(p = 4, planes_angles = rotations_4d)set.seed(20240412) rotations_4d <- list( list(plane = c(1, 2), angle = 60), # Rotation in the (1, 2) plane list(plane = c(3, 4), angle = 90) # Rotation in the (3, 4) plane ) gen_rotation(p = 4, planes_angles = rotations_4d)
This function generates S-curve data.
gen_scurve(n = 500)gen_scurve(n = 500)
n |
An integer value (default: 500) representing the sample size. |
A data containing the generated S-curve data.
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
set.seed(20240412) scurve <- gen_scurve(n = 500)set.seed(20240412) scurve <- gen_scurve(n = 500)
This function generates S-curve data with a hole by filtering out samples that are not close to a specified anchor point.
gen_scurvehole(n = 500, r_hole = 0.5)gen_scurvehole(n = 500, r_hole = 0.5)
n |
A numeric value (default: 500) representing the sample size. |
r_hole |
A numeric value (default: 0.5) representing the radius of the hole. |
A data containing the generated S-curve data with a hole.
Wang, Y., Huang, H., Rudin, C., & Shaposhnik, Y. (2021). Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization. J Mach. Learn. Res, 22, 1-73.
the PaCMAP homepage.
set.seed(20240412) scurvehole <- gen_scurvehole(n = 1000)set.seed(20240412) scurvehole <- gen_scurvehole(n = 1000)
This function generates a dataset representing a structure with a spherical spiral.
gen_sphericalspiral(n = 500, spins = 1)gen_sphericalspiral(n = 500, spins = 1)
n |
A numeric value (default: 500) representing the sample size. |
spins |
A numeric value (default: 1) representing the number of loops of the spiral. |
A data containing a spherical spiral.
set.seed(20240412) sphericalspiral <- gen_sphericalspiral(n = 500, spins = 1)set.seed(20240412) sphericalspiral <- gen_sphericalspiral(n = 500, spins = 1)
This function generates swiss roll data.
gen_swissroll(n = 500, w = c(-1, 1))gen_swissroll(n = 500, w = c(-1, 1))
n |
An integer value (default: 500) representing the sample size. |
w |
A numeric vector (default: c(-1, 1)) representing the vertical variation. |
A tibble containing the generated swiss roll data.
Agrafiotis, D. K., & Xu, H. (2002). A self-organizing principle for learning nonlinear manifolds. Proceedings of the National Academy of Sciences, 99(25), 15869-15872.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326.
set.seed(20240412) swissroll <- gen_swissroll(n = 500)set.seed(20240412) swissroll <- gen_swissroll(n = 500)
This function generates coordinates for a 3-D trefoil knot by applying a stereographic projection from 4-D space.
gen_trefoil3d(n = 500, steps = 5)gen_trefoil3d(n = 500, steps = 5)
n |
An integer value (default: 500) representing the sample size. |
steps |
A numeric value (default: 5) representing the number of steps for the theta parameter. |
A data containing 3-D trefoil knot.
set.seed(20240412) trefoil3d <- gen_trefoil3d(n = 500, steps = 5)set.seed(20240412) trefoil3d <- gen_trefoil3d(n = 500, steps = 5)
This function generates coordinates for a 4-D trefoil knot. The number of points is determined by the length of the theta and phi sequences.
gen_trefoil4d(n = 500, steps = 5)gen_trefoil4d(n = 500, steps = 5)
n |
An integer value (default: 500) representing the sample size. |
steps |
A numeric value (default: 5) representing the number of steps for the theta parameter. |
A tibble containing 4-D trefoil knot.
set.seed(20240412) trefoil4d <- gen_trefoil4d(n = 500, steps = 5)set.seed(20240412) trefoil4d <- gen_trefoil4d(n = 500, steps = 5)
This function generates a grid dataset with specified uniform points along each axes.
gen_unifcube(n = 500, p = 4)gen_unifcube(n = 500, p = 4)
n |
A numeric vector (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
A data containing the cube with uniform points.
set.seed(20240412) unifcube <- gen_unifcube(n = 500, p = 4)set.seed(20240412) unifcube <- gen_unifcube(n = 500, p = 4)
This function generates a dataset representing a cube with a hole.
gen_unifcubehole(n = 5000, p = 4, r_hole = 0.5)gen_unifcubehole(n = 5000, p = 4, r_hole = 0.5)
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
r_hole |
A numeric value (default: 0.5) representing the radius of the hole. |
A data containing the cube data with a hole.
set.seed(20240412) cubehole <- gen_unifcubehole(n = 1000, p = 4)set.seed(20240412) cubehole <- gen_unifcubehole(n = 1000, p = 4)
This function generates a dataset representing a structure with a uniform sphere.
gen_unifsphere(n = 500, r = 1)gen_unifsphere(n = 500, r = 1)
n |
A numeric value (default: 500) representing the sample size. |
r |
A numeric vector (default: 1) representing the radius of the sphere. |
A data containing a uniform sphere.
set.seed(20240412) unifsphere <- gen_unifsphere(n = 500)set.seed(20240412) unifsphere <- gen_unifsphere(n = 500)
This function generates random noise dimensions by adding wavy patterns.
gen_wavydims1(n = 500, p = 4, theta = seq(pi/6, 12 * pi/6, length.out = 500))gen_wavydims1(n = 500, p = 4, theta = seq(pi/6, 12 * pi/6, length.out = 500))
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
theta |
A numeric vector representing the nonlinearity along each dimensions. |
A data containing the generated random noise dimensions.
set.seed(20240412) gen_wavydims1(n = 500, p = 4, theta = seq(pi / 6, 12 * pi / 6, length.out = 500))set.seed(20240412) gen_wavydims1(n = 500, p = 4, theta = seq(pi / 6, 12 * pi / 6, length.out = 500))
This function generates random noise dimensions by adding wavy patterns.
gen_wavydims2(n = 500, p = 4, x1_vec)gen_wavydims2(n = 500, p = 4, x1_vec)
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
x1_vec |
A numeric vector representing the first dimension of the data structure. |
A data containing the generated random noise dimensions.
set.seed(20240412) theta <- seq(0, 2 * pi, length.out = 500) x1 <- sin(pi) * cos(theta) gen_wavydims2(n = 500, p = 4, x1_vec = x1)set.seed(20240412) theta <- seq(0, 2 * pi, length.out = 500) x1 <- sin(pi) * cos(theta) gen_wavydims2(n = 500, p = 4, x1_vec = x1)
This function generates random noise dimensions by adding wavy patterns.
gen_wavydims3(n = 500, p = 4, data)gen_wavydims3(n = 500, p = 4, data)
n |
A numeric value (default: 500) representing the sample size. |
p |
A numeric value (default: 4) representing the number of dimensions. |
data |
A matrix representing the first three dimensions of the data structure. |
A data containing the generated random noise dimensions.
set.seed(20240412) df <- gen_scurve(n = 500) |> as.matrix() gen_wavydims3(n = 500, p = 3, data = df)set.seed(20240412) df <- gen_scurve(n = 500) |> as.matrix() gen_wavydims3(n = 500, p = 3, data = df)
This function generates interlocked circular clusters in a
-dimensional space. Unlike make_klink_circles(), the circles
are arranged in a **chain-like structure**, where each circle interlocks
only with its immediate neighbor, resembling links in a chain.
make_chain_circles(n = c(200, 100), p = 4, k = 2, offset = 0.5, angle = 90)make_chain_circles(n = c(200, 100), p = 4, k = 2, offset = 0.5, angle = 90)
n |
An integer vector of length |
p |
Integer, the dimensionality of the embedding space. Must be
at least 3. Default is |
k |
Integer, the number of circles to generate. Default is |
offset |
Numeric, the positional shift applied to each circle along
its linking axis to ensure interlocking instead of overlap. Default is
|
angle |
Numeric, the rotation angle (in degrees) used when placing
each subsequent circle into its respective plane. Default is |
A data frame (or tibble, depending on gen_multicluster())
containing the generated points and cluster assignments.
# Generate two chain-linked circles in 4-D twochain_circles <- make_chain_circles()# Generate two chain-linked circles in 4-D twochain_circles <- make_chain_circles()
This function generates interlocked circular clusters in a
-dimensional space. Unlike make_klink_curvycycle(), the curvycycle
are arranged in a **chain-like structure**, where each curvycycle interlocks
only with its immediate neighbor, resembling links in a chain.
make_chain_curvycycle(n = c(200, 100), p = 4, k = 2, offset = 0.5, angle = 90)make_chain_curvycycle(n = c(200, 100), p = 4, k = 2, offset = 0.5, angle = 90)
n |
An integer vector of length |
p |
Integer, the dimensionality of the embedding space. Must be
at least 3. Default is |
k |
Integer, the number of curvycycle to generate. Default is |
offset |
Numeric, the positional shift applied to each curvycycle along
its linking axis to ensure interlocking instead of overlap. Default is
|
angle |
Numeric, the rotation angle (in degrees) used when placing
each subsequent curvycycle into its respective plane. Default is |
A data frame (or tibble, depending on gen_multicluster())
containing the generated points and cluster assignments.
# Generate two chain-linked curvycycle in 4-D twochain_curvycycle <- make_chain_curvycycle()# Generate two chain-linked curvycycle in 4-D twochain_curvycycle <- make_chain_curvycycle()
This function generates synthetic high-dimensional data containing two clusters: one quadratic-shaped cluster and one Gaussian-shaped cluster. The clusters are positioned apart in feature space with different scaling factors.
make_curvygau(n = c(200, 100), p = 4)make_curvygau(n = c(200, 100), p = 4)
n |
A numeric vector of length 2, specifying the number of observations in each cluster. All values must be positive. |
p |
Integer. Number of dimensions. Must be at least 3. |
A tibble containing rows and columns,
with generated features (x1, x2, ..., xp) and a
cluster label.
# Generate 2 clusters in 4D: one quadratic, one Gaussian curvygau <- make_curvygau()# Generate 2 clusters in 4D: one quadratic, one Gaussian curvygau <- make_curvygau()
This function generates a dataset consisting of multiple circular clusters
together with a single Gaussian cluster in a -dimensional space.
The circles are placed concentrically at the origin with varying scales,
while the Gaussian cluster serves as an additional background or center cluster.
make_gaucircles( n = c(200, 100, 100), p = 4, num_circles = 2, scale_circles = c(1, 2) )make_gaucircles( n = c(200, 100, 100), p = 4, num_circles = 2, scale_circles = c(1, 2) )
n |
An integer vector of length |
p |
Integer, the dimensionality of the embedding space. Must be at least 3.
Default is |
num_circles |
Integer, the number of circular clusters to generate.
Default is |
scale_circles |
Numeric vector of length |
A data frame (or tibble, depending on gen_multicluster())
containing the generated dataset with cluster assignments.
# Two circles (radii 1 and 2) plus one Gaussian cluster in 4-D gaucircles <- make_gaucircles()# Two circles (radii 1 and 2) plus one Gaussian cluster in 4-D gaucircles <- make_gaucircles()
This function generates a dataset consisting of multiple circular clusters
together with a single Gaussian cluster in a -dimensional space.
The curvycycle are placed concentrically at the origin with varying scales,
while the Gaussian cluster serves as an additional background or center cluster.
make_gaucurvycycle( n = c(200, 100, 100), p = 4, num_curvycycle = 2, scale_curvycycle = c(1, 2) )make_gaucurvycycle( n = c(200, 100, 100), p = 4, num_curvycycle = 2, scale_curvycycle = c(1, 2) )
n |
An integer vector of length |
p |
Integer, the dimensionality of the embedding space. Must be at least 3.
Default is |
num_curvycycle |
Integer, the number of circular clusters to generate.
Default is |
scale_curvycycle |
Numeric vector of length |
A data frame (or tibble, depending on gen_multicluster())
containing the generated dataset with cluster assignments.
# Two curvycycle (radii 1 and 2) plus one Gaussian cluster in 4-D gaucurvycycle <- make_gaucurvycycle()# Two curvycycle (radii 1 and 2) plus one Gaussian cluster in 4-D gaucurvycycle <- make_gaucurvycycle()
This function generates interlocked circular clusters in a
-dimensional space. The circles are constructed using
gen_multicluster(), with each circle positioned in a different
coordinate plane and slightly offset so that they interlock with a
central circle (hub-like structure).
make_klink_circles(n = c(200, 100), p = 4, k = 2, offset = 0.5)make_klink_circles(n = c(200, 100), p = 4, k = 2, offset = 0.5)
n |
An integer vector of length |
p |
Integer, the dimensionality of the embedding space. Must be
at least 3. Default is |
k |
Integer, the number of circles to generate. Default is |
offset |
Numeric, the amount of positional shift applied to each
circle along the second coordinate axis to prevent complete overlap.
Default is |
A data frame (or tibble, depending on gen_multicluster())
containing the generated points and cluster assignments.
# Generate two interlocked circles in 4-D twolink_circles <- make_klink_circles()# Generate two interlocked circles in 4-D twolink_circles <- make_klink_circles()
This function generates interlocked circular clusters in a
-dimensional space. The curvycycle are constructed using
gen_multicluster(), with each curvycycle positioned in a different
coordinate plane and slightly offset so that they interlock with a
central curvycycle (hub-like structure).
make_klink_curvycycle(n = c(200, 100), p = 4, k = 2, offset = 0.5)make_klink_curvycycle(n = c(200, 100), p = 4, k = 2, offset = 0.5)
n |
An integer vector of length |
p |
Integer, the dimensionality of the embedding space. Must be
at least 3. Default is |
k |
Integer, the number of curvycycle to generate. Default is |
offset |
Numeric, the amount of positional shift applied to each
curvycycle along the second coordinate axis to prevent complete overlap.
Default is |
A data frame (or tibble, depending on gen_multicluster())
containing the generated points and cluster assignments.
# Generate two interlocked curvycycle in 4-D twolink_curvycycle <- make_klink_curvycycle()# Generate two interlocked curvycycle in 4-D twolink_curvycycle <- make_klink_curvycycle()
This function generates a dataset consisting of a mobius cluster and Gaussian cluster.
make_mobiusgau(n = c(200, 100), p = 4)make_mobiusgau(n = c(200, 100), p = 4)
n |
An integer vector (default: c(200, 100)) representing the sample sizes. |
p |
An integer value (default: 4) representing the number of dimensions. |
A tibble containing the mobius cluster and Gaussian cluster.
mobgau <- make_mobiusgau(n = c(200, 100), p = 4)mobgau <- make_mobiusgau(n = c(200, 100), p = 4)
This function generates a dataset consisting of multiple Gaussian clusters.
make_multigau(n = c(300, 200, 500), p = 4, k = 3, loc = NULL, scale = NULL)make_multigau(n = c(300, 200, 500), p = 4, k = 3, loc = NULL, scale = NULL)
n |
A numeric vector (default: c(300, 200, 500)) representing the sample sizes. |
p |
A numeric value (default: 4) representing the number of dimensions. |
k |
A numeric value (default: 5) representing the number of clusters. |
loc |
A numeric matrix (default: NULL) representing the locations/centroids of clusters. |
scale |
A numeric vector (default: NULL) representing the scaling factors of clusters. |
A data containing the Gaussian clusters.
loc_matrix <- matrix(c(0, 0, 0, 0, 5, 9, 0, 0, 3, 4, 10, 7 ), nrow = 3, byrow = TRUE) multigau <- make_multigau(n = c(300, 200, 500), p = 4, k = 3, loc = loc_matrix, scale = c(0.2, 1.5, 0.5))loc_matrix <- matrix(c(0, 0, 0, 0, 5, 9, 0, 0, 3, 4, 10, 7 ), nrow = 3, byrow = TRUE) multigau <- make_multigau(n = c(300, 200, 500), p = 4, k = 3, loc = loc_matrix, scale = c(0.2, 1.5, 0.5))
This function generates a dataset consisting of one grid-like cluster (a structured cube grid) in 2D, with optional Gaussian noise dimensions added to extend the dataset into higher dimensions.
make_onegrid(n = 500)make_onegrid(n = 500)
n |
Integer, the number of points in the grid cluster. Must be positive.
Default is |
A tibble containing the generated dataset with columns:
x1, x2, x3, x4 — coordinates of the data points.
cluster — cluster assignment (always 1 for the grid).
# Default: 500 points, 4D space (grid in 2D + 2 noise dimensions) onegrid <- make_onegrid()# Default: 500 points, 4D space (grid in 2D + 2 noise dimensions) onegrid <- make_onegrid()
This function generates synthetic high-dimensional data consisting of
clusters of a specified shape (e.g., crescents), arranged in
parallel along alternating dimensions. The first cluster is shifted
along the first dimension, the second along the third dimension,
the third along the first dimension again, and so on.
make_shape_para(n = c(500, 300), k = 2, shift = 1, shape = "crescent")make_shape_para(n = c(500, 300), k = 2, shift = 1, shape = "crescent")
n |
A numeric vector of length |
k |
Integer. Number of clusters to generate. Must be greater than 1. |
shift |
Numeric. The distance between cluster centers along the alternating dimensions (default is '0.4'). |
shape |
Character string. Shape of the clusters to generate (e.g., '"crescent"', '"gridcube"', etc.). Must be a single value. |
A tibble containing rows and columns, with
the generated features ('x1, x2, x3, x4') and a 'cluster' label.
# Generate 2 crescent-shaped clusters in 4D twocrescent <- make_shape_para(n = c(500, 300), k = 2, shape = "crescent")# Generate 2 crescent-shaped clusters in 4D twocrescent <- make_shape_para(n = c(500, 300), k = 2, shape = "crescent")
This function generates a dataset consisting of two overlapping grid-like clusters in a 2D space, with optional noise dimensions added to reach higher-dimensional spaces. The overlap is controlled by scaling factors for the grids.
make_twogrid_overlap(n = c(500, 500))make_twogrid_overlap(n = c(500, 500))
n |
A numeric vector of length 2 specifying the number of points in each grid cluster. |
A tibble with n[1] + n[2] rows and 5 columns:
x1,x2, x3, x4 — coordinates of the generated points.
cluster — cluster membership label.
# Generate two overlapping grid clusters in 4-D df <- make_twogrid_overlap()# Generate two overlapping grid clusters in 4-D df <- make_twogrid_overlap()
This function generates two grid-shaped clusters in a 2D space, where one grid is shifted relative to the other. Optionally, additional noise dimensions can be added to embed the structure in a higher-dimensional space.
make_twogrid_shift(n = c(500, 500))make_twogrid_shift(n = c(500, 500))
n |
A numeric vector of length 2 specifying the number of points in each cluster.
Default is |
A tibble with n[1] + n[2] rows and 5 columns:
x1, x2, x3, x4: Numeric coordinates of the points.
cluster: Cluster membership label (factor with 2 levels).
# Generate 2 shifted grid clusters in 4-D make_twogrid_shift <- make_twogrid_shift()# Generate 2 shifted grid clusters in 4-D make_twogrid_shift <- make_twogrid_shift()
The 'mobiusgau' dataset contains a 3-dimensional Mobius and Gaussian cluster with added noise dimension. Each data point is represented by five dimensions (x1 to x4).
data(mobiusgau)data(mobiusgau)
A data frame with 1000 rows and 4 columns:
High-dimensional coordinates
This dataset is generated for illustrative purposes.
# Load the mobiusgau dataset data(mobiusgau) # Display the first few rows of the dataset head(mobiusgau)# Load the mobiusgau dataset data(mobiusgau) # Display the first few rows of the dataset head(mobiusgau)
The 'mobiusgau_tsne1' dataset contains the tSNE (t-distributed Stochastic Neighbor Embedding) embeddings of a five-dimensional mobiusgau. Each data point is represented by two tSNE coordinates (emb1 and emb2).
data(mobiusgau_tsne1)data(mobiusgau_tsne1)
## 'mobiusgau_tsne1' A data frame with 1000 rows and 4 columns:
Numeric, first tSNE 2D embeddings.
Numeric, second tSNE 2D embeddings.
This dataset is generated for illustrative purposes.
# Load the mobiusgau_tsne1 dataset data(mobiusgau_tsne1) # Display the first few rows of the dataset head(mobiusgau_tsne1)# Load the mobiusgau_tsne1 dataset data(mobiusgau_tsne1) # Display the first few rows of the dataset head(mobiusgau_tsne1)
The 'mobiusgau_tsne2' dataset contains the tSNE (t-distributed Stochastic Neighbor Embedding) embeddings of a five-dimensional mobiusgau. Each data point is represented by two tSNE coordinates (emb1 and emb2).
data(mobiusgau_tsne2)data(mobiusgau_tsne2)
## 'mobiusgau_tsne2' A data frame with 1000 rows and 4 columns:
Numeric, first tSNE 2D embeddings.
Numeric, second tSNE 2D embeddings.
This dataset is generated for illustrative purposes.
# Load the mobiusgau_tsne2 dataset data(mobiusgau_tsne2) # Display the first few rows of the dataset head(mobiusgau_tsne2)# Load the mobiusgau_tsne2 dataset data(mobiusgau_tsne2) # Display the first few rows of the dataset head(mobiusgau_tsne2)
The 'mobiusgau_tsne3' dataset contains the tSNE (t-distributed Stochastic Neighbor Embedding) embeddings of a five-dimensional mobiusgau. Each data point is represented by two tSNE coordinates (emb1 and emb2).
data(mobiusgau_tsne3)data(mobiusgau_tsne3)
## 'mobiusgau_tsne3' A data frame with 1000 rows and 4 columns:
Numeric, first tSNE 2D embeddings.
Numeric, second tSNE 2D embeddings.
This dataset is generated for illustrative purposes.
# Load the mobiusgau_tsne1 dataset data(mobiusgau_tsne3) # Display the first few rows of the dataset head(mobiusgau_tsne3)# Load the mobiusgau_tsne1 dataset data(mobiusgau_tsne3) # Display the first few rows of the dataset head(mobiusgau_tsne3)
The 'mobiusgau_umap1' dataset contains the UMAP (Uniform Manifold Approximation and Projection) embeddings of a five-dimensional mobiusgau. Each data point is represented by two UMAP coordinates (emb1 and emb2).
data(mobiusgau_umap1)data(mobiusgau_umap1)
## 'mobiusgau_umap1' A data frame with 1000 rows and 4 columns:
Numeric, first UMAP 2D embeddings.
Numeric, second UMAP 2D embeddings.
This dataset is generated for illustrative purposes.
# Load the mobiusgau_umap1 dataset data(mobiusgau_umap1) # Display the first few rows of the dataset head(mobiusgau_umap1)# Load the mobiusgau_umap1 dataset data(mobiusgau_umap1) # Display the first few rows of the dataset head(mobiusgau_umap1)
The 'mobiusgau_umap2' dataset contains the UMAP (Uniform Manifold Approximation and Projection) embeddings of a five-dimensional mobiusgau. Each data point is represented by two UMAP coordinates (emb1 and emb2).
data(mobiusgau_umap2)data(mobiusgau_umap2)
## 'mobiusgau_umap2' A data frame with 1000 rows and 4 columns:
Numeric, first UMAP 2D embeddings.
Numeric, second UMAP 2D embeddings.
This dataset is generated for illustrative purposes.
# Load the mobiusgau_umap2 dataset data(mobiusgau_umap2) # Display the first few rows of the dataset head(mobiusgau_umap2)# Load the mobiusgau_umap2 dataset data(mobiusgau_umap2) # Display the first few rows of the dataset head(mobiusgau_umap2)
The 'mobiusgau_umap3' dataset contains the UMAP (Uniform Manifold Approximation and Projection) embeddings of a five-dimensional mobiusgau. Each data point is represented by two UMAP coordinates (emb1 and emb2).
data(mobiusgau_umap3)data(mobiusgau_umap3)
## 'mobiusgau_umap3' A data frame with 1000 rows and 4 columns:
Numeric, first UMAP 2D embeddings.
Numeric, second UMAP 2D embeddings.
This dataset is generated for illustrative purposes.
# Load the mobiusgau_umap3 dataset data(mobiusgau_umap3) # Display the first few rows of the dataset head(mobiusgau_umap3)# Load the mobiusgau_umap3 dataset data(mobiusgau_umap3) # Display the first few rows of the dataset head(mobiusgau_umap3)
This function normalize the data by the largest absolute value found in the dataset.
normalize_data(data)normalize_data(data)
data |
A tibble representing the data which needed to be normalized. |
A normalized data.
set.seed(20240412) data1 <- gen_gaussian(n= 500, p = 4) normalize_data(data = data1)set.seed(20240412) data1 <- gen_gaussian(n= 500, p = 4) normalize_data(data = data1)
This function randomly shuffles the rows of a given data frame.
randomize_rows(data)randomize_rows(data)
data |
A data frame to be randomized. |
A data frame with rows randomly shuffled.
randomize_rows(mobiusgau)randomize_rows(mobiusgau)
This function relocates clusters in a dataset by centering each cluster and shifting it based on a given transformation matrix.
relocate_clusters(data, vert_mat)relocate_clusters(data, vert_mat)
data |
A tibble or data frame containing clustered data. It must have a 'cluster' column indicating cluster membership. |
vert_mat |
A matrix specifying the translation vectors for each cluster. The number of rows must match the number of clusters. |
A tibble containing the relocated clusters with randomized row order.
set.seed(20240412) df <- tibble::tibble( x1 = rnorm(12), x2 = rnorm(12), x3 = rnorm(12), x4 = rnorm(12), cluster = rep(1:3, each = 4) ) # Create a 3x4 matrix to define new cluster centers vert_mat <- matrix(c( 5, 0, 0, 0, # Shift cluster 1 0, 5, 0, 0, # Shift cluster 2 0, 0, 5, 0 # Shift cluster 3 ), nrow = 3, byrow = TRUE) # Apply relocation relocated_df <- relocate_clusters(df, vert_mat)set.seed(20240412) df <- tibble::tibble( x1 = rnorm(12), x2 = rnorm(12), x3 = rnorm(12), x4 = rnorm(12), cluster = rep(1:3, each = 4) ) # Create a 3x4 matrix to define new cluster centers vert_mat <- matrix(c( 5, 0, 0, 0, # Shift cluster 1 0, 5, 0, 0, # Shift cluster 2 0, 0, 5, 0 # Shift cluster 3 ), nrow = 3, byrow = TRUE) # Apply relocation relocated_df <- relocate_clusters(df, vert_mat)
The 'three_clust_01' dataset contains three distinct clusters in a 4-D space.
data(three_clust_01)data(three_clust_01)
A data frame with 1500 rows and 5 columns:
High-dimensional coordinates
This dataset is generated for example purposes.
# Load the mobiusgau dataset data(three_clust_01) # Display the first few rows of the dataset head(three_clust_01)# Load the mobiusgau dataset data(three_clust_01) # Display the first few rows of the dataset head(three_clust_01)
The 'three_clust_02' dataset contains three distinct clusters in a 4-D space.
data(three_clust_02)data(three_clust_02)
A data frame with 1500 rows and 5 columns:
High-dimensional coordinates
This dataset is generated for example purposes.
# Load the mobiusgau dataset data(three_clust_02) # Display the first few rows of the dataset head(three_clust_02)# Load the mobiusgau dataset data(three_clust_02) # Display the first few rows of the dataset head(three_clust_02)
The 'three_clust_03' dataset contains three distinct clusters in a 4-D space.
data(three_clust_03)data(three_clust_03)
A data frame with 1500 rows and 5 columns:
High-dimensional coordinates
This dataset is generated for example purposes.
# Load the mobiusgau dataset data(three_clust_03) # Display the first few rows of the dataset head(three_clust_03)# Load the mobiusgau dataset data(three_clust_03) # Display the first few rows of the dataset head(three_clust_03)
The 'three_clust_04' dataset contains three distinct clusters in a 4-D space.
data(three_clust_04)data(three_clust_04)
A data frame with 1500 rows and 5 columns:
High-dimensional coordinates
This dataset is generated for example purposes.
# Load the mobiusgau dataset data(three_clust_04) # Display the first few rows of the dataset head(three_clust_04)# Load the mobiusgau dataset data(three_clust_04) # Display the first few rows of the dataset head(three_clust_04)
The 'three_clust_05' dataset contains three distinct clusters in a 4-D space.
data(three_clust_05)data(three_clust_05)
A data frame with 1500 rows and 5 columns:
High-dimensional coordinates
This dataset is generated for example purposes.
# Load the mobiusgau dataset data(three_clust_05) # Display the first few rows of the dataset head(three_clust_05)# Load the mobiusgau dataset data(three_clust_05) # Display the first few rows of the dataset head(three_clust_05)