These are helper functions included in the package.
The gen_bkgnoise() function allows users to generate
multivariate Gaussian noise to serve as background data in
high-dimensional spaces.
# Example: Generate 4D background noise
bkg_data <- gen_bkgnoise(n = 500, p = 4,
m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
head(bkg_data)
#> # A tibble: 6 × 4
#> x1 x2 x3 x4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -1.39 1.87 2.06 -1.93
#> 2 0.639 2.57 2.46 -1.26
#> 3 2.57 -1.72 1.93 -0.340
#> 4 5.00 1.98 -0.239 -1.53
#> 5 -0.224 -0.630 0.849 -3.66
#> 6 -2.76 -0.923 1.64 -0.0119The generated data has independent dimensions with specified means
(m) and standard deviations (s).
randomize_rows() ensures the rows of the input data is
randomized.
relocate_clusters() allows users to translate clusters
in any dimension(s). This is achieved by centering each cluster
(subtracting its mean) and then adding a translation vector from a
provided matrix (vert_mat).
df <- tibble::tibble(
x1 = rnorm(12),
x2 = rnorm(12),
x3 = rnorm(12),
x4 = rnorm(12),
cluster = rep(1:3, each = 4)
)
vert_mat <- matrix(c(
5, 0, 0, 0,
0, 5, 0, 0,
0, 0, 5, 0
), nrow = 3, byrow = TRUE)
relocated_df <- relocate_clusters(df, vert_mat)
head(relocated_df)
#> # A tibble: 6 × 5
#> x1 x2 x3 x4 cluster
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0.789 0.637 5.84 0.721 3
#> 2 5.61 -0.770 0.269 -1.34 1
#> 3 0.834 5.09 -0.621 -1.16 2
#> 4 -0.592 -0.00646 4.12 -0.196 3
#> 5 0.717 1.27 5.80 0.427 3
#> 6 -0.967 5.25 1.54 0.274 2The gen_rotation() function creates a rotation matrix in
high-dimensional space for given planes and angles.
rotations_4d <- list(
list(plane = c(1, 2), angle = 60),
list(plane = c(3, 4), angle = 90)
)
rot_mat <- gen_rotation(p = 4, planes_angles = rotations_4d)
rot_mat
#> [,1] [,2] [,3] [,4]
#> [1,] 0.5000000 -0.8660254 0.000000e+00 0.000000e+00
#> [2,] 0.8660254 0.5000000 0.000000e+00 0.000000e+00
#> [3,] 0.0000000 0.0000000 6.123234e-17 -1.000000e+00
#> [4,] 0.0000000 0.0000000 1.000000e+00 6.123234e-17When combining clusters or transforming data geometrically,
magnitudes can differ drastically. The normalize_data()
function rescales the entire dataset to fit within ([-1, 1]) based on
its maximum absolute value.
norm_data <- normalize_data(bkg_data)
head(norm_data)
#> x1 x2 x3 x4
#> 1 -0.19824950 0.26597594 0.29315698 -0.275567334
#> 2 0.09111172 0.36718998 0.35115760 -0.179721292
#> 3 0.36664451 -0.24519698 0.27529407 -0.048461095
#> 4 0.71300432 0.28227563 -0.03402328 -0.217801905
#> 5 -0.03189621 -0.08982008 0.12103091 -0.522438279
#> 6 -0.39364143 -0.13163195 0.23376883 -0.001695661To place clusters in different positions, gen_clustloc()
generates points forming a simplex-like arrangement
ensuring each cluster center is equidistant from others as much as
possible.
centers <- gen_clustloc(p = 4, k = 5)
head(centers)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.26983778 -0.4294511 -0.17898562 0.8220179 -0.4834190
#> [2,] 0.03526906 0.6685061 -0.26238277 -0.9888410 0.5474486
#> [3,] -0.88656467 1.2648195 -0.20908822 -0.9657047 0.7965381
#> [4,] 2.20580849 -1.0348574 0.02853651 -1.4893315 0.2898438Two helper functions, gen_nproduct() and
gen_nsum(), generate numeric vectors of positive integers
that approximately satisfy a user-specified target product or sum,
respectively.
The function gen_nsum(n, k) divides a total sum
n into k positive integers. It first assigns
an equal base value to each element and then randomly distributes any
remainder, ensuring the elements sum exactly to n.
The function gen_nproduct(n, p) aims to produce
p positive integers whose product is approximately
n. It starts with all elements equal to the rounded \(p^{th}\) root of n and
iteratively adjusts elements up or down in a randomized manner until the
product is within a small tolerance of n. This accommodates
the fact that exact integer solutions for a given product are often
impossible.