Comparing K-Means Clustering and Youden’s J Statistic for Determining Y-Balance Test Cut-off Values for Classifying Chronic Ankle Instability in Logistics Workers with a History of Ankle Lateral Sprains

Hwang, Ui-jae; Kim, Jun-hee; Gwak, Gyeong-tae

doi:10.29273/jmst.2024.8.2.74

J Musculoskelet Sci Technol 2024; 8(2):74-83

pISSN: 2635-8573, eISSN: 2635-8581

DOI: https://doi.org/10.29273/jmst.2024.8.2.74

Research Report

Comparing K-Means Clustering and Youden’s J Statistic for Determining Y-Balance Test Cut-off Values for Classifying Chronic Ankle Instability in Logistics Workers with a History of Ankle Lateral Sprains

Ui-jae Hwang¹^,^*, Jun-hee Kim¹, Gyeong-tae Gwak¹

Author Information & Copyright ▼

¹Department of Physical Therapy, College of Health Science, Laboratory of KEMA AI Research (KAIR), Yonsei University, Wonju, South Korea

^*smartkema@yonsei.ac.kr Ui-Jae Hwang, Department of Physical Therapy, College of Health Science, Laboratory of KEMA AI Research (KAIR), Yonsei University, Wonju, South Korea

© Copyright 2024, Academy of KEMA. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons. org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Aug 14, 2024 ; Revised: Sep 03, 2024 ; Accepted: Sep 03, 2024

Published Online: Dec 31, 2024

ABSTRACT

Background

Chronic ankle instability (CAI) is a common condition among logistics workers (LWs) that can significantly impact workplace productivity. Accurate classification of CAI using the Y-Balance Test (YBT) is crucial for effective management and timely return to work.

Purpose

To compare the effectiveness of K-means clustering and Youden’s J statistic in determining YBT cut-off values for classifying CAI in LWs with a history of ankle sprains.

Study design

Retrospective cohort study

Methods

Data from 121 LWs with a history of ankle sprains were analyzed. YBT measures included anterior, posterolateral, and posteromedial reach distances, and composite scores. Cut-off values were determined using Youden’s J statistic and two K-means clustering approaches (Mean and Top 2). Performance metrics including area under the curve (AUC), sensitivity, specificity, and odds ratios were calculated for each method.

Results

The YBT posteromedial direction distance, using Youden’s method, demonstrated the highest discriminative ability for CAI classification (AUC: 0.62, OR: 5.41, 95% CI: 2.09–14.04). The K-means Top 2 method consistently provided higher cut-offs with improved specificity across all YBT measures, notably achieving 87% specificity for the YBT composite score (cut-off: 96.98%).

Conclusions

While both methods effectively identify CAI risk, the K-means clustering approach, particularly the Top 2 method, offers higher cut-offs with improved specificity. This suggests potential benefits in occupational health settings where stringent screening criteria are necessary for early identification and management of CAI risk in LWs.

Keywords: Chronic ankle instability; K-means clustering; Occupational health; Y-balance test; Youden’s J statistic

Key Points

Question How do K-means clustering and Youden’s J statistic compare in determining YBT cut-off values for classifying CAI in LWs? Can unsupervised machine learning techniques provide more nuanced, population-specific cut-off values for dynamic balance tests?

Findings The YBT posteromedial direction distance using Youden’s method showed the highest discriminative ability for CAI classification. The K-means Top 2 method provided higher cut-offs with improved specificity across all YBT measures. Both methods effectively identified CAI risk but with different sensitivity-specificity trade-offs.

Meaning K-means clustering, especially the Top 2 method, offers higher cut-offs with improved specificity, beneficial for stringent occupational health screening. This study demonstrates the potential of machine learning in developing population-specific cut-off values for dynamic balance tests.

INTRODUCTION

Chronic ankle instability (CAI) is a multifaceted condition that frequently emerges as a consequence of a lateral ankle sprain. This disorder is defined by ongoing symptoms that persist for over a year after the initial injury, including discomfort, inflammation, reduced self-reported functionality, and recurring episodes or sensations of ankle instability, often accompanied by repeated ankle sprains.^1-3 The occupational demands placed on logistics workers (LWs) make them particularly susceptible to ankle sprains. Their job typically requires traversing an average of 8 kilometers per shift while handling packages of diverse dimensions, weights, and configurations.⁴ In the spectrum of work-related musculoskeletal disorders, ankle sprains rank as the second most prevalent issue.⁵ Moreover, among LWs, the ankle is the most injured body part, accounting for 23% of all injuries.⁶ The risk of ankle injuries in this population is exacerbated by the unpredictable and varied outdoor environments in which deliveries are made, often under uncontrolled conditions.⁷

CAI is a common condition characterized by recurrent ankle sprains and persistent symptoms following an initial ankle sprain.⁸ Individuals with CAI often exhibit impaired postural control and dynamic balance, which are critical components of functional stability and injury prevention.^9-12 The relationship between CAI and deficits in postural control and dynamic balance has been well-documented in the literature, with studies showing that individuals with CAI demonstrate decreased performance in various balance tasks compared to healthy controls.^13,14 These impairments can significantly impact daily activities and increase the risk of future injuries, particularly in occupations that require prolonged standing or frequent movement, such as LWs.

The Y-Balance Test (YBT) has emerged as a valuable tool for assessing dynamic balance and identifying individuals at risk for lower extremity injuries, including those with CAI.^15,16 The YBT is a modification of the Star Excursion Balance Test and requires participants to maintain single-leg stance while reaching as far as possible with the contralateral leg in three directions: anterior, posteromedial, and posterolateral.¹⁷ The test provides quantitative data on reach distances and allows for the calculation of a composite score, offering clinicians a standardized method to evaluate dynamic balance and functional symmetry. Several studies have investigated the use of cut-off values for the YBT to identify individuals with CAI or those at risk of lower extremity injuries. Plisky et al. (2006) proposed that high school basketball players with an anterior reach asymmetry greater than 4 cm were at 2.5 times greater risk of lower extremity injury.¹⁵ Butler et al. (2013) found that collegiate football players with a composite YBT score less than 89.6% of limb length were at increased risk for non-contact lower extremity injuries.¹⁸ However, the applicability of these cut-off values to different populations, such as LWs, remains uncertain, and there is a need for population-specific cut-off values to improve the clinical utility of the YBT.

In recent years, machine learning techniques have been increasingly applied in sports and rehabilitation sciences to identify patterns and classify data. K-means clustering, an unsupervised learning algorithm, has been used in sports and ergonomics related studies based on performance characteristics, musculoskeletal pain or injury risk factors.^19-21 While traditional methods like Youden’s J statistic have been widely used to determine cut-off values, the application of K-means clustering for this purpose offers a novel approach.^22-24 By identifying the optimal number of clusters and using the midpoint between the top two clusters as the cut-off value, K-means clustering may provide a more data-driven and nuanced approach to establishing YBT cut-off values.

Ankle sprains result in substantial workplace disruption, with an average of 20 lost workdays per incident.⁵ The situation is further compounded by recurrent sprains associated with CAI, which can lead to even greater productivity losses. Given these significant impacts, there is a pressing need for accurate classification of LWs with and without CAI.²⁵ This classification is crucial not only for effective condition management but also for facilitating timely return to work. However, the demanding nature of the logistics service environment necessitates the use of simple, quick, and reliable assessment tools that can be easily administered in the field. Therefore, research exploring the application of easily implementable tools like the YBT for accurate CAI classification in workplace settings is essential. Such research could provide valuable insights for occupational health professionals and employers in their efforts to mitigate the impact of ankle injuries and CAI on LWs and overall workplace productivity. The purpose of this study is to compare two methods for determining YBT cut-off values in LWs with a history of ankle sprains: (1) a novel approach using K-means clustering, and (2) the traditional Youden’s J statistic method. By exploring the potential of K-means clustering in determining YBT cut-off values, we aim to enhance the accuracy of CAI identification and improve the clinical utility of the YBT for LWs with a history of ankle sprains. This research contributes to the growing body of literature on the application of machine learning techniques in clinical assessment and may provide valuable insights into the development of population-specific cut-off values for dynamic balance tests.

METHODS

Participants

The study utilized data from musculoskeletal screening tests conducted at a healthcare center for a logistics company between August 2021 and March 2022. These tests were originally performed to prevent industrial accidents among LWs. Due to the retrospective nature of the analysis using pre-existing company data, the Institutional Review Board waived the requirement for informed consent (approval number: 1041849-202301-BM-016-01). From a pool of 289 LWs, 121 individuals with a history of at least one ankle sprain were identified as potential participants. CAI was defined according to specific criteria: a history of lateral ankle sprain causing pain and impaired physical function, followed by at least one episode of perceived instability or ‘giving way’.²⁶ The ankle instability instrument was employed to confirm recurrent instability, with a minimum score of 5 points required for inclusion.²⁶ The study excluded LWs with less than 6 months of work experience in logistics service. Additionally, individuals were excluded if they had undergone lower-extremity surgery within the past 6 months, had been diagnosed with ankle osteoarthritis, or had a history of ankle surgery involving intra-articular fixation.

Y-balance test

The examiner used the Y-Balance Test Kit.^16,17 The device consists of a single central plastic plate and three attached tubes arranged in anterior, posteromedial, and posterolateral positions. A measure is positioned on each of the tubes, with an interval of 0.5 cm. The subjects, while standing on the affected leg (barefoot) in a central location on the YBT instrument, with hands placed on the wing of the ilium, were asked to move the pointer as far as possible, using the lower limb opposite to the support limb, in three directions e anterior, posteromedial, posterolateral. All YBT attempts were performed in the same order: the first e anterior direction; the second - posterolateral direction; and the last e posteromedial direction. Participants performed six familiarization trials followed by two recorded trials, maintaining an upright posture with hands on their chest throughout the reaches. The YBT composite score was derived by summing the maximum reach distances in each direction and dividing by three times the limb length.

Data analysis and statistical methods

The data analysis and statistical procedures were conducted using Python (version 3.8.5) with scikit-learn (version 0.24.2), scipy (version 1.6.2), and Orange (version 3.28.0) libraries. Our analysis focused on four key variables from the YBT: anterior direction distance, posterolateral direction distance, posteromedial direction distance, and the composite score. This comprehensive methodology provides a robust framework for determining and evaluating cut-off values (Youden’s J Statistic and K-means Clustering) in the four key YBT measures for CAI classification, with a specific focus on establishing stringent criteria for industrial accident prevention and safe return to work among LWs with a history of ankle sprains with a history of ankle sprains. By combining a traditional statistical approach with an advanced machine learning technique tailored to our specific needs, we aim to offer new insights into the classification of CAI using YBT measures, ultimately contributing to improved workplace safety protocols.

1) Data preprocessing

Prior to analysis, the dataset underwent rigorous preprocessing. Missing values in the four YBT variables were addressed using mean imputation via scikit-learn’s SimpleImputer class. This method replaces missing values with the mean value of the respective variable, helping to maintain the overall distribution of the data. Subsequently, these variables were standardized using z-score normalization (μ=0, σ=1) using StandardScaler. The distribution of each variable was confirmed as a boxplot to remove outliers using a local outlier factor (contamination=10%; neighbors =20; metric=Euclidean) because it influences the accuracy of the learning model.

2) Determination of cut-off values

To determine cut-off values for each of the four YBT variables, we employed two distinct methods: Youden’s J Statistic and K-means Clustering. The Youden’s J Statistic is a well-established method in clinical research that balances sensitivity and specificity. For each YBT variable independently, we determined the optimal cut-off point by maximizing the Youden’s J statistic (J=Sensitivity+Specificity–1). This involved computing the receiver operating characteristic (ROC) curve for each YBT variable, calculating the Youden’s J statistic for each point on the ROC curve, and identifying the threshold that maximizes the J statistic for each variable. We also computed the area under the ROC curve (AUC) to assess the discriminative ability of each YBT measure. In addition to this traditional approach, we introduced a novel method based on K-means clustering, specifically designed to establish more stringent cut-off values. K-means clustering is an unsupervised machine learning technique that aims to partition n observations into k clusters, where each observation belongs to the cluster with the nearest mean (centroid). In our context, this allows us to identify natural groupings in the YBT data that may correspond to different levels of ankle stability.

For the K-means Clustering method, we first determined the optimal number of clusters (k) using the silhouette score. The silhouette score measures how similar an object is to its cluster compared to other clusters, with scores ranging from –1 to 1. A higher silhouette score indicates better-defined clusters. We computed this score for k ranging from 2 to 10 to find the optimal number of clusters for our data. Once the optimal k was determined, we applied K-means clustering to the standardized data of all four YBT variables collectively. This approach allows us to consider the interrelationships among the YBT measures, potentially capturing more complex patterns than analyzing each variable independently. After clustering, we computed the centroids (mean values) of each cluster and inverse-transformed them to the original scale. Once the optimal k was determined, we applied K-means clustering to the standardized data of all four YBT variables collectively. This approach allows us to consider the interrelationships among the YBT measures, potentially capturing more complex patterns than analyzing each variable independently. After clustering, we computed the centroids (mean values) of each cluster and inverse-transformed them to the original scale. For each YBT variable, the cut-off value was calculated as the mean of all cluster centroids for that specific variable.

The K-means Clustering (Top 2) method builds upon the previous approach but focuses on the upper range of the data distribution. Using the same optimal k value and cluster assignments from the K-means (Mean) method, we sorted the cluster centroids in ascending order for each YBT variable independently. The cut-off value for each variable was then calculated as the mean of the top two cluster centroids for that specific variable. This method aims to identify a threshold that separates the higher-performing individuals, which may be more relevant for distinguishing those with CAI.

3) Performance metrics and statistical analysis

For each method and each of the four YBT variables, we calculated a comprehensive set of performance metrics. Sensitivity and specificity were computed based on the confusion matrix derived from the predicted and actual CAI status for each YBT variable and method. We employed Fisher’s exact test to calculate the odds ratio for each YBT variable and method, providing a measure of association between the dichotomized YBT scores and CAI status. The 95% confidence interval (CI) of the odds ratio was estimated using the standard error of the log odds ratio, which provides a more accurate CI for small sample sizes compared to the normal approximation method. The area under the curve (AUC) was calculated for both methods. For Youden’s J statistic method, the AUC was directly computed from the ROC curve for each YBT variable. For the K-means method, which produces binary predictions based on the derived cut-off values, we calculated the AUC using these predictions. To facilitate comparison, we systematically evaluated the performance of each method (Youden’s J statistic and K-means) across all four YBT variables. A comprehensive results table was generated, including cut-off values, AUC, sensitivity, specificity, odds ratio, and the 95% CI of the odds ratio for each method and YBT variable. This tabular format allows for direct comparison and identification of the most effective cut-off determination method for each YBT measure, with particular attention to the stricter criteria established by the K-means method.

RESULTS

Participants characteristics

Table 1 shows the average values and variability of all the variables. The Kolmogorov-Smirnov normality test presented normally distributed data of all the independent variables (p>0.05). Our study evaluated 121 DSW who had a history of ankle sprains with a history of ankle sprains with and without CAI and were analyzed using ML models. There were no significant differences between LWs who had a history of ankle sprains with and without CAI in terms of age, work duration, YBT composite score, and YBT anterior direction distance. The BMI, Ankle Instability Instrument score, and RCSP in LWs who had a history of ankle sprains with CAI were statistically higher than that in LWs who had a history of ankle sprains without CAI. The YBT posteromedial and posterolateral direction distances in LWs who had a history of ankle sprains with CAI were statistically lower than those in LWs who had a history of ankle sprains without CAI. All the LWs who had a history of ankle sprains in the present study were male. The proportions of LWs who had a history of ankle sprains with CAI on the right, left or both sides were 28.6% (n=16), 26.8% (n=15), or 44.6% (n=25; the side with the greater Ankle Instability Instrument score: right=8 and left=17), respectively. The proportions of LWs who had a history of ankle sprains without CAI having their involved side as the right, left or both sides were 47.7% (n=31), 29.2% (n=19), or 23.1% (n=15; the side with the greater history of ankle lateral sprain: right=5 and left=10), respectively.

Table 1. Participants characteristics

Variables	Without CAI (N=65)	With CAI (N=56)	p
Age (yr)	35.68±8.10	37.60±5.90	0.135
Weight (kg)	74.85±12.10	78.01±15.30	0.215
Height (cm)	169.57±5.60	175.08±6.00	0.067
BMI (kg/m²)	24.30±3.61	25.82±4.16	0.035
Work duration (day)	388.92±224.70	401.84±159.90	0.714
Ankle instability instrument	2.80±1.20	6.20±1.30	<0.001
YBT composite score (%)	89.09±13.95	83.98±13.72	0.051
YBT anterior direction distance (cm)	70.69±9.44	67.78±8.57	0.088
YBT posteromedial direction distance (cm)	89.06±14.83	83.35±13.29	0.032
YBT posterolateral direction distance (cm)	73.06±14.36	67.68±12.98	0.038

CAI, chronic ankle instability; YBT, Y-balance test.

Download Excel Table

K-means clustering

The silhouette score was the greatest when the number of clusters was 3 (silhouette score=0.534) from 2 to 10. Consequently, the optimal number of clusters for k-means clustering was 3. Table 2 presents the three YBT variable clusters identified using the k-means algorithm clustering based on YBT anterior direction distance, posterolateral direction distance, posteromedial direction distance, and the composite score. Among 112 PDWs who had a history of ankle sprains, Cluster 1 (C1) comprised the majority of participants (N=68) and demonstrated intermediate YBT scores. Cluster 2 (C2, N=20) showed the lowest performance across all YBT measures, while Cluster 3 (C3, N=22) exhibited the highest scores. A significant difference in YBT composite score (F=291.143, p<0.001), YBT anterior direction distance (F=89.268, p<0.001), YBT posteromedial direction distance (F=217.666, p<0.001), and YBT posterolateral direction distance (F=162.864, p<0.001) was observed among the three clusters.

Table 2. Comparisons of YBT variables between clusters

Variables	Cluster 1 (N=68)	Cluster 2 (N=20)	Cluster 3 (N=22)	p
YBT composite score (%)	87.37±5.11	64.58±5.81	106.89±6.26	<0.0001
YBT anterior direction distance (cm)	69.41±4.13	57.63±5.53	82.59±8.15	<0.0001
YBT posteromedial direction distance (cm)	86.77±5.95	65.84±8.55	105.97±6.50	<0.0001
YBT posterolateral direction distance (cm)	71.68±6.46	49.76±7.81	89.53±7.50	<0.0001

YBT, Y-balance test.

Download Excel Table

Cut-off values and performance metrics for three methods

The study compared three methods for determining cut-off values in YBT measures to classify CAI in LWs with a history of ankle sprains: Youden’s J statistic, K-means clustering using the mean of all centroids (K-means Mean), and K-means clustering using the mean of the top two centroids (K-means Top 2). The results for each YBT measure are as follows (Table 3 and Figure 1):

Table 3. Cut-off values and performance metrics for Youden, K-means Mean and Top 2

Variables	Methods	Cut-off	AUC	Sensitivity	Specificity	Odds ratio	(95% CI)
YBT composite score (%)	Youden	85.99	0.61	0.66	0.57	2.62	(1.21–5.68)
	K-means Mean	86.14	0.61	0.64	0.57	2.43	(1.13–5.23)
	K-means Top 2	96.98	0.59	0.30	0.87	2.93	(1.10–7.78)
YBT anterior direction distance (cm)	Youden	73.37	0.57	0.38	0.80	2.35	(1.00–5.52)
	K-means Mean	69.70	0.53	0.54	0.52	1.24	(0.59–2.63)
	K-means Top 2	75.98	0.55	0.27	0.83	1.83	(0.72–4.63)
YBT posterolateral direction distance (cm)	Youden	71.26	0.59	0.57	0.63	2.27	(1.05–4.87)
	K-means Mean	70.02	0.54	0.61	0.46	1.33	(0.62–2.84)
	K-means Top 2	80.56	0.58	0.34	0.81	2.26	(0.94–5.46)
YBT posteromedial direction distance (cm)	Youden	92.86	0.62	0.45	0.87	5.41	(2.09–14.04)
	K-means Mean	85.93	0.58	0.57	0.59	1.94	(0.91–4.14)
	K-means Top 2	96.31	0.58	0.29	0.87	2.69	(1.00–7.18)

YBT, Y-balance test.

Download Excel Table

Figure 1. YBT anterior, posterolateral and posteromedial direction distance cut-off value for Youden’s J statistic and K-means clustering two methods.

Download Original Figure

For the YBT composite score, Youden’s method yielded a cut-off value of 85.99%, with an AUC of 0.61, sensitivity of 0.66, specificity of 0.57, and an odds ratio of 2.62 (95% CI: 1.21–5.68). The K-means Mean method produced similar results with a cut-off of 86.14%, AUC of 0.61, sensitivity of 0.64, and specificity of 0.57 (OR: 2.43, 95% CI: 1.13–5.23). The K-means Top 2 method resulted in a higher cut-off (96.98%) with lower sensitivity (0.30) but higher specificity (0.87) and a comparable odds ratio (2.93, 95% CI: 1.10–7.78).

For the YBT anterior direction distance (Figure 2), Youden’s method established a cut-off at 73.37, with an AUC of 0.57, sensitivity of 0.38, and specificity of 0.80 (OR: 2.35, 95% CI: 1.00–5.52). The K-means methods showed lower performance, with the K-means Mean method yielding a lower cut-off (69.70) and the K-means Top 2 method providing a higher cut-off (75.98) but with reduced sensitivity (0.27) and increased specificity (0.83).

Figure 2. Scatter plot between YBT composite score and anterior direction distance. (A) calculation of YBT anterior direction distance cut-off value in K-means clustering two methods, (B) distribution of LWs who had who had a history of ankle sprains with and without CAI.

Download Original Figure

In the YBT posterolateral direction distance (Figure 3), Youden’s method determined a cut-off of 71.26, achieving an AUC of 0.59, sensitivity of 0.57, and specificity of 0.63 (OR: 2.27, 95% CI: 1.05–4.87). The K-means methods showed similar patterns to the anterior direction, with the K-means Top 2 method providing a higher cut-off (80.56) but lower sensitivity (0.34) and higher specificity (0.81).

Figure 3. Scatter plot between YBT composite score and posterolateral direction distance. (A) calculation of YBT posterolateral direction distance cut-off value in K-means clustering two methods, (B) distribution of LWs who had who had a history of ankle sprains with and without CAI.

Download Original Figure

For the YBT posteromedial direction distance (Figure 4), Youden’s method established a cut-off at 92.86, demonstrating the highest AUC (0.62) and odds ratio (5.41, 95% CI: 2.09–14.04) among all measures and methods. It achieved a sensitivity of 0.45 and a high specificity of 0.87. The K-means Mean method resulted in a lower cut-off (85.93) with more balanced sensitivity (0.57) and specificity (0.59), while the K-means Top 2 method yielded a higher cut-off (96.31) with lower sensitivity (0.29) but maintained high specificity (0.87).

Figure 4. Scatter plot between YBT composite score and posteromedial direction distance. (A) calculation of YBT posteromedial direction distance cut-off value in K-means clustering two methods, (B) distribution of LWs who had who had a history of ankle sprains with and without CAI.

Download Original Figure

Overall, these results suggest that the YBT posteromedial direction distance, particularly when using Youden’s method for cut-off determination, may be the most effective measure for classifying CAI in LWs with a history of ankle sprains. The K-means Top 2 method consistently provided higher cut-offs with improved specificity across all YBT measures, which may be valuable for identifying individuals at higher risk of CAI in occupational settings where more stringent criteria are required based on the results of specificity.

DISCUSSION

This study aimed to compare the effectiveness of K-means clustering and Youden’s J statistic in determining YBT cut-off values for classifying CAI in LWs with a history of ankle sprains. Our findings suggest that while both methods can identify those at risk of CAI, they offer different trade-offs between sensitivity and specificity. This research contributes to the growing body of literature on the application of machine learning techniques in clinical assessment and provides valuable insights into the development of population-specific cut-off values for dynamic balance tests.

Comparing our results to previous studies, we found some notable differences in cut-off values. For instance, Butler et al.¹⁸ reported a cut-off value of 89.6% for the YBT composite score in collegiate American football players for classifying non-contact lower extremity injuries, while our study found a lower cut-off value of 85.99% using Youden’s J statistic for classifying CAI in LWs.¹⁸ This lower cut-off in our study can be attributed to the differences in the study population, as LWs likely have different physical demands and characteristics compared to collegiate athletes. Similarly, Plisky et al.¹⁵ suggested a cut-off of 94% for high school basketball players to identify those at risk of lower extremity injury.¹⁵ The variation in these cut-off values highlights the importance of developing population-specific thresholds, as the demands of different occupations and sports may significantly influence optimal cut-off points for identifying injury risk. Notably, our study goes beyond previous research by employing K-means clustering, which allowed us to consider all three directions of the YBT (anterior, posteromedial, and posterolateral) simultaneously, along with the composite score. This comprehensive approach resulted in cut-off values ranging from 85.99% to 96.98%, depending on the method used. The inclusion of all YBT directions in determining cut-offs represents a significant advancement in the field, as it provides a more holistic assessment of dynamic balance and potentially more accurate risk classification. This multi-directional approach aligns with the recommendations of Gonell et al.²⁷, who emphasized the need for comprehensive, sport-specific YBT cut-off values in their study on soccer players for predicting soft tissue injuries.²⁷

The discrepancy in AUC values between K-means clustering and Youden’s J statistic methods can be attributed to their fundamental differences in approach and data utilization. Youden’s J statistic is a traditional approach that balances sensitivity and specificity^22,24 and calculates cut-off values for each YBT variable independently. In contrast, K-means clustering is an unsupervised machine learning technique that identifies natural groupings in the data^20,28 and considers all four YBT variables (anterior, posteromedial, posterolateral, and composite scores) simultaneously to create comprehensive classifications of dynamic balance. This holistic approach allows K-means to capture more complex patterns and interactions among the YBT variables, potentially leading to more nuanced risk classifications. The K-means approach, particularly the Top 2 method, consistently provided higher cut-offs with improved specificity across all YBT measures. This difference in performance may be due to K-means’ ability to account for the interrelationships between YBT variables, potentially identifying high-risk individuals more accurately at the cost of lower sensitivity. This finding is consistent with recent studies that have explored machine learning techniques for injury prediction in sports medicine and ergonomics, which often benefit from considering multiple variables simultaneously.^20,21,29-31

In an industrial setting, where the prevention of CAI is crucial for maintaining worker productivity and reducing workplace injuries, the use of higher cut-off values as provided by the K-means Top 2 method may be particularly valuable. This approach is especially relevant given the substantial workplace disruption caused by ankle sprains, with an average of 20 lost workdays per incident.⁵ The situation is further exacerbated by recurrent sprains associated with CAI, which can lead to even greater productivity losses. While this approach may result in more false positives, it allows for the identification of a larger proportion of at-risk individuals. This conservative approach could be beneficial in occupational health settings where the cost of missing a potential injury is high, and implementing preventive measures for a larger group is feasible and cost-effective in the long run. The YBT, as a simple, quick, and reliable assessment tool, addresses the need for easily implementable measures in the demanding logistics service environment. Its application for accurate CAI classification in workplace settings is essential, as it can be readily administered in the field. By using the K-means clustering method with the YBT, we provide a more comprehensive approach to risk assessment for thorough screening and preventive strategies.^10,16 This research offers valuable insights for occupational health professionals and employers in their efforts to mitigate the impact of ankle injuries and CAI on LWs and overall workplace productivity. By facilitating early identification of at-risk workers, this approach can contribute to more effective condition management and timely return to work, ultimately reducing the economic burden associated with ankle injuries in the logistics industry.

However, this study has several limitations that should be addressed in future research. Firstly, our sample size was relatively small, which may limit the generalizability of our findings. Secondly, we only examined LWs, and the results may not be applicable to other occupations or athletic populations. Future studies should investigate larger and more diverse populations to validate these findings.³² Additionally, we did not consider other potential risk factors for CAI, such as previous injury history or biomechanical factors, which could influence the accuracy of our cut-off values. Longitudinal studies that track individuals over time would provide more robust evidence for the predictive validity of these cut-off values in sporting populations.³³ Lastly, while we compared two methods for determining cut-off values, future research could explore other advanced statistical and machine learning techniques to further improve the accuracy of CAI risk classification.

CONCLUSIONS

The present study compared K-means clustering and Youden’s J statistic in determining YBT cut-off values for classifying CAI in LWs with a history of ankle sprains. Our findings revealed that the YBT posteromedial direction distance, using Youden’s method, demonstrated the highest discriminative ability for CAI classification, with an AUC of 0.62 and an odds ratio of 5.41 (95% CI: 2.09–14.04). The K-means Top 2 method consistently provided higher cut-offs with improved specificity across all YBT measures, notably achieving 87% specificity for the YBT composite score (cut-off: 96.98%). These results highlight the potential of machine learning techniques in developing more nuanced, population-specific cut-off values for dynamic balance tests, which could be particularly valuable in occupational health settings where stringent screening criteria are necessary.

Conflict of Interest Disclosures:

The authors declare that they have no competing financial interests or personal relationships that may have influenced the work reported in this study.

Funding/Support:

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Acknowledgment:

We would like to thank all LWs in our study for their active participation and cooperation.

Ethic Approval:

The present study was approved by Yonsei University Mirae Campus Institutional review board approval (1041849-202301-BM-016-01) and informed consent was waived.

Author contributions

Conceptualization: UJ Hwang.

Data acquisition: JH Kim, GT Gwak.

Design of the work: UJ Hwang.

Data analysis: GT Gwak, JH Kim.

Project administration: UJ Hwang.

Interpretation of data: GT Gwak.

Writing – original draft: UJ Hwang.

Writing–review&editing: UJ Hwang.

REFERENCES

Hertel J, Corbett RO. An updated model of chronic ankle instability. J Athl Train. 2019; 54(6):572-588

Thompson C, Schabrun S, Romero R, Bialocerkowski A, van Dieen J, Marshall P. Factors contributing to chronic ankle instability: a systematic review and meta-analysis of systematic reviews. Sports Med. 2018; 48:189-205

Doherty C, Bleakley C, Delahunt E, Holden S. Treatment and prevention of acute and recurrent ankle sprain: an overview of systematic reviews with meta-analysis. Br J Sports Med. 2017; 51(2):113-125

Martinez-Sykora A, McLeod F, Lamas-Fernandez C, Bektaş T, Cherrett T, Allen J. Optimised solutions to the last-mile delivery problem in London using a combination of walking and driving. Ann Oper Res. 2020; 295:645-693

González-Iñigo S, Munuera-Martínez PV, Lafuente-Sotillos G, Castillo-López JM, Ramos-Ortega J, Domínguez-Maldonado G. Ankle sprain as a work-related accident: status of proprioception after 2 weeks. PeerJ. 2017; 5e4163

Bentley TA. Slip, trip and fall accidents occurring during the delivery of mail. Ergonomics. 1998; 41(12):1859-1872

Bentley TA, Haslam R. Identification of risk factors and countermeasures for slip, trip and fall accidents during the delivery of mail. Appl Ergon. 2001; 32(2):127-134

Hertel J. Functional anatomy, pathomechanics, and pathophysiology of lateral ankle instability. J Athl Train. 2002; 37:4-364

Hiller CE, Kilbreath SL, Refshauge KM. Chronic ankle instability: evolution of the model. J Athl Train. 2011; 46(2):133-141

10.

Gribble PA, Hertel J, Plisky P. Using the Star Excursion Balance Test to assess dynamic postural-control deficits and outcomes in lower extremity injury: a literature and systematic review. J Athl Train. 2012; 47(3):339-357

11.

Ko J, Wikstrom E, Li Y, Weber M, Brown CN. Performance differences between the modified star excursion balance test and the Y-balance test in individuals with chronic ankle instability. J Sport Rehabil. 2019; 29(6):748-753

12.

Payne S, McCabe M, Pulliam J. The effect of Chronic Ankle Instability (CAI) on Y-balance scores in soccer athletes. J Sports Med Allied Health Sci. 2016; 2:1-9

13.

Wikstrom EA, Naik S, Lodha N, Cauraugh JH. Balance capabilities after lateral ankle trauma and intervention: a meta-analysis. Database of Abstracts of Reviews of Effects (DARE): Quality-assessed Reviews [Internet]. Centre for Reviews and Dissemination (UK). 2009

14.

Wikstrom EA, Tillman MD, Chmielewski TL, Borsa PA. Measurement and evaluation of dynamic joint stability of the knee and ankle after injury. Sports Med. 2006; 36:393-410

15.

Plisky PJ, Rauh MJ, Kaminski TW, Underwood FB. Star excursion balance test as a predictor of lower extremity injury in high school basketball players. J Orthop Sports Phys Ther. 2006; 36(12):911-919

16.

Plisky PJ, Gorman PP, Butler RJ, Kiesel KB, Underwood FB, Elkins B. The reliability of an instrumented device for measuring components of the star excursion balance test. N Am J Sports Phys Ther. 2009; 4:2-92

17.

Shaffer SW, Teyhen DS, Lorenson CL, et al. Y-balance test: a reliability study involving multiple raters. Mil Med. 2013; 178(11):1264-1270

18.

Butler RJ, Lehr ME, Fink ML, Kiesel KB, Plisky PJ. Dynamic balance performance and noncontact lower extremity injury in college football players: an initial study. Sports Health. 2013; 5(5):417-422

19.

Bańkosz Z, Winiarski S. Using wearable inertial sensors to estimate kinematic parameters and variability in the table tennis topspin forehand stroke. Appl Bionics Biomech. 2020; 2020:1-8413948

20.

Miao D, Wang W, Lv Y, Liu L, Yao K, Sui X. Research on the classification and control of human factor characteristics of coal mine accidents based on K-Means clustering analysis. Int J Ind Ergon. 2023; 97:103481

21.

Andersen LL, Vinstrup J, Sundstrup E, Skovlund SV, Villadsen E, Thorsen SV. Combined ergonomic exposures and development of musculoskeletal pain in the general working population: a prospective cohort study. Scand J Work Environ Health. 2021; 47:4-287

22.

Youden WJ. Index for rating diagnostic tests. Cancer. 1950; 3(1):32-35

23.

Perkins NJ, Schisterman EF. The Youden Index and the optimal cut‐point corrected for measurement error. Biom J. 2005; 47(4):428-441

24.

Schisterman EF, Perkins N. Confidence intervals for the Youden index and corresponding optimal cut-point. Commun Stat Simul Comput. 2007; 36(3):549-563

25.

Hwang U-j, Kwon O-y, Kim J-h, Gwak G-t. Classification of chronic ankle instability using machine learning technique based on ankle kinematics during heel rise in delivery workers. Digit Health. 2024; 10:2055207624 1235116

26.

Grindstaff TL, Dolan N, Morton SK. Ankle dorsiflexion range of motion influences Lateral Step Down Test scores in individuals with chronic ankle instability. Phys Ther Sport. 2017; 23:75-81

27.

Gonell AC, Romero JAP, Soler LM. Relationship between the Y balance test scores and soft tissue injury incidence in a soccer team. Int J Sports Phys Ther. 2015; 10(7):955

28.

Bhargavi M, Gowda SD. A novel validity index with dynamic cut-off for determining true clusters. Pattern Recognit. 2015; 48(11):3673-3687

29.

Bahr R, Krosshaug T. Understanding injury mechanisms: a key component of preventing injuries in sport. Br J Sports Med. 2005; 39(6):324-329

30.

Chan VC, Ross GB, Clouthier AL, Fischer SL, Graham RB. The role of machine learning in the primary prevention of work-related musculoskeletal disorders: a scoping review. Appl Ergon. 2022; 98:103574

31.

Rossi A, Pappalardo L, Cintia P, Iaia FM, Fernández J, Medina D. Effective injury forecasting in soccer with GPS training data and machine learning. PloS One. 2018; 13(7)e0201264

32.

O’Connor S, McCaffrey N, Whyte E, Moran K. Epidemiology of injury in male collegiate Gaelic footballers in one season. Scand J Med Sci Sports. 2017; 27(10):1136-1142

33.

Attenborough AS, Hiller CE, Smith RM, Stuelcken M, Greene A, Sinclair PJ. Chronic ankle instability in sporting populations. Sports Med. 2014; 44:1545-1556

Comparing K-Means Clustering and Youden’s J Statistic for Determining Y-Balance Test Cut-off Values for Classifying Chronic Ankle Instability in Logistics Workers with a History of Ankle Lateral Sprains

ABSTRACT

Key Points

INTRODUCTION

METHODS

RESULTS

DISCUSSION

CONCLUSIONS

Conflict of Interest Disclosures:

Funding/Support:

Acknowledgment:

Ethic Approval:

Author contributions

REFERENCES

한국연구재단 등재학술지 선정