Reorder row or column clusters in order of increasing (or decreasing) cluster effects.
reorder.clustord.RdThe label-switching problem in model-based clustering is that results with the clusters are in a different order are mathematically equivalent to each other, and the EM algorithm does not distinguish between them. For example, two row clustering results with different starting points on the same data may assign all of the observations to the same clusters both times, but the group of observations labelled as cluster 1 in the first result is labelled as cluster 3 in the second result. Similarly, for column clustering a group of variables can be labelled cluster 4 in the first result and cluster 1 in the second result.
Usage
# S3 method for class 'clustord'
reorder(x, type, decreasing = FALSE, ...)Arguments
- x
Object of class
clustord, the output from aclustordrun.- type
Whether to reorder the row cluster effects (
"row"), column cluster effects ("column"), or both ("both").- decreasing
(default FALSE) One or two element vector, indicating which direction to sort in. If one element, then all clusters being reordered will be reordered in the same direction. Default is increasing (i.e.
decreasing = FALSE, as for the base functionsort()). If two element vector is used, which is only permissible whentype = "both", the first direction will be used for the row clusters and the second direction will be used for the column clusters.- ...
optional: extra arguments.
Value
An object of class clustord, with all the relevant elements
reordered in order of cluster effects. See clustord for more info
about the contents of clustord objects.
The clustord object will gain an extra field, reordered = TRUE.
Elements of clustord object that may be reordered (which ones are
reordered depends on whether row clusters are being reordered and whether
column clusters are being reordered:
- out_parlist (the final list of estimated parameter values)
- row_cluster_proportions and/or column_cluster_proportions
- row_cluster_probs and/or column_cluster_probs
- out_parvec
- row_cluster_members and row_clusters and/or
column_cluster_members and column_clusters
- EMstatus$params_for_best_lli
- EMstatus$params_every_iteration, if using option
control_EM$keep_all_params
- start.par
.
Details
It is often useful to reorder the clusters to show them in order of cluster effect size, because this makes any display of the features of those clusters a bit easier for people to read.
Moreover, if you perform multiple replicate runs of clustord with the
same settings and want to be able to summarise the results, e.g. by providing
the mean estimated parameter values, then you will need to reorder the
cluster results so that in all of the replicate runs the first cluster is the
one with the most negative cluster effect, etc.
Note that if you order the cluster effects in increasing order, the first one will not necessarily be the smallest. If using the default constraint that the cluster effects must sum to zero, the first cluster effect in increasing order will be the most negative and the last will be the most positive.
If you use the argument constraint_sum_zero = FALSE, which uses the
first-element-is-zero constraint for cluster effects, and you sort the
clusters in increasing order (i.e. with default decreasing = FALSE,
then after reordering the clusters in increasing order the first one will be
0 and the second one will be the smallest non-zero effect. However, if you
use the argument constraint_sum_zero = FALSE and sort with
decreasing = TRUE, then the first element will still be zero
because the model is fitted with that first element always set to zero, so
it is special and reordering will not stop it being the first element.
Note that this function CANNOT be used if you have used interaction terms
without the main cluster effects e.g. if you included ROWCLUST:x1 in
the formula for clustering but did not include ROWCLUST as another
term (and similarly for COLCLUST).
Examples
set.seed(1)
long_df <- data.frame(Y=factor(sample(1:3,5*20,replace=TRUE)),
ROW=rep(1:10,times=10),COL=rep(1:10,each=10))
results_original <- clustord(Y ~ ROWCLUST + COLCLUST, model="OSM",
RG=3, CG=2, long_df=long_df,
control_EM=list(maxiter=2))
#> EM algorithm has not converged. Please try again, or with a different random seed, or with more starting points.
results_original$out_parlist
#> $mu
#> mu_1 mu_2 mu_3
#> 0.00000000 0.13787191 -0.06928604
#>
#> $phi
#> phi_1 phi_2 phi_3
#> 0.0000000 0.3974401 1.0000000
#>
#> $rowc
#> rowc_1 rowc_2 rowc_3
#> 0.02539696 0.02802425 -0.05342121
#>
#> $colc
#> colc_1 colc_2
#> 0.3173207 -0.3173207
#>
# $mu
# mu_1 mu_2 mu_3
# 0.0000000 0.2053150 0.4107883
#
# $phi
# phi_1 phi_2 phi_3
# 0.0000000 0.6915777 1.0000000
#
# $rowc
# rowc_1 rowc_2 rowc_3
# 0.07756500 0.09247161 -0.17003661
#
# $colc
# colc_1 colc_2
# 0.07130783 -0.07130783
## Run reorder type "row" to reorder based on row cluster effects,
## in increasing order by default
results.reorder <- reorder(results_original, type="row")
results.reorder$out_parlist
#> $mu
#> mu_1 mu_2 mu_3
#> 0.00000000 0.13787191 -0.06928604
#>
#> $phi
#> phi_1 phi_2 phi_3
#> 0.0000000 0.3974401 1.0000000
#>
#> $rowc
#> rowc_3 rowc_1 rowc_2
#> -0.05342121 0.02539696 0.02802425
#>
#> $colc
#> colc_1 colc_2
#> 0.3173207 -0.3173207
#>
## Run reorder type "column" to reorder based on column cluster effects,
## in decreasing order
results.reorder <- reorder(results_original, type="column", decreasing=TRUE)
results.reorder$out_parlist
#> $mu
#> mu_1 mu_2 mu_3
#> 0.00000000 0.13787191 -0.06928604
#>
#> $phi
#> phi_1 phi_2 phi_3
#> 0.0000000 0.3974401 1.0000000
#>
#> $rowc
#> rowc_1 rowc_2 rowc_3
#> 0.02539696 0.02802425 -0.05342121
#>
#> $colc
#> colc_1 colc_2
#> 0.3173207 -0.3173207
#>
## Run reorder type "row" to reorder based on row and column cluster effects,
## with row effects in increasing order and column effects in decreasing
## order
results.reorder <- reorder(results_original, type="both", decreasing=c(FALSE,TRUE))
results.reorder$out_parlist
#> $mu
#> mu_1 mu_2 mu_3
#> 0.00000000 0.13787191 -0.06928604
#>
#> $phi
#> phi_1 phi_2 phi_3
#> 0.0000000 0.3974401 1.0000000
#>
#> $rowc
#> rowc_3 rowc_1 rowc_2
#> -0.05342121 0.02539696 0.02802425
#>
#> $colc
#> colc_1 colc_2
#> 0.3173207 -0.3173207
#>