# Introduction atient-Related Outcomes (PROs) have emerged as useful tools for measuring medical conditions, has have been proven to be extremely useful in musculoskeletal disease clinics. 1 These well-structured questionnaires are completed by patients to reflect their own perspective. 2,3 . Hip pain is a prevalent complaint, in which both the patient and the clinician could benefit from utilizing a PRO to monitor conditions and decide on a management approach. [4][5] The Harris Hip Score is a widely used tool that combines the clinician's input with the patient-reported symptoms to generate a better clinical picture of the hip pathology at hand and evaluate treatment options. 6 The questionnaire itself, however, is in English. Healthcare services in Arabic speaking countries would not be able to use it; hence, the need for a cross-cultural adaptation of the score. The authors of this study aim to prove the validity and reliability of the Arabic version of this score. # II. # Methods and Materials a) Translation We did the translation as per recommendations of Guillemin's guidelines for validation and cross-cultural adaptation 9 after permission obtained from the original HHS copyright holder. Two Bilingual orthopedic surgeons were responsible for the conceptual and literary translation of the original version. Two other versions were produced by independent translation companies with a background in scientific English. All the versions produced were similar. Modifications to incorporate from all the versions were made and implemented in the final version. A professional Arabic grammar checker reviewed it. The back-translation came close to the original score. A pilot test was then conducted on ten random patients from the arthroplasty clinic. This was done after the approval of the Arabic version by the translation committee. Both the physicians interviewed the patients after completing the questionnaire to address any issues or need for assistance. # b) Participants One hundred ten patients completed the Harris Hip Score questionnaire and agreed to have their data analyzed for research purposes. The average age of the participants was 44.3 years, with a standard deviation of 15.4 years, implying that the majority of the sample was between 30 and 60 years of age. The youngest participant was 16, and the oldest was 76 years of age. # c) Psychometric Properties and Data Analysis For all of the analyses, IBM SPSS Statistics 21 was used. To estimate the reliability of the questionnaire we calculated Cronbach's alpha, and since every patient completed the survey on three different occasions, Cronbach's alpha was calculated for each of the three test situations. Also, we used the ICC (interclass correlation coefficient) to assess test-retest reliability. Content validity was tested by examing the shape of data distribution, as well as floor and ceiling effects. The floor effect is the percentage of patients who scored the lowest possible score (score of 0), and the ceiling effect is the percentage of those with the highest score (score of 100). If more than 30% of the respondents had the floor or ceiling effect, the effects are considered to be relevant. To test the convergent validity of HHS, we calculated Spearman's correlation coefficient between HHS and WOMAC. Since WOMAC has already been validated in Arabic speaking countries, the higher correlation coefficient would prove the convergent validity of the HHS. Nonetheless, it is worth noting that a higher score on WOMAC indicates a greater disability, while patients with a lower disability will have a low HHS score. This means that to have HHS validated, we are to expect a negative correlation between the score on WOMAC and HHS. # d) Questionnaires # Harris Hip Score The HHS usually contains 12 questions covering four domains: pain, function, deformity, and range of motion. The questions are answered using a Likert scale, with the final score having a maximum of 100 points (best possible outcome), and a minimum of 0 points (extreme symptoms). The 100 points are shared into subdomains -pain receives 44 points, function 47 points, range of motion 5 points, and deformity 4 points; function is split into activities of daily living (14 points) and gait (33 points). A total HHS of <70 points are considered as poor results, 70 to 80 is fair, 80 to 90 is good, and 90 to 100 is excellent (Nilsdotter and Bremander, 2011). For this study, a modified HHS (subtracted from the deformity and range of motion subdomains) is used. Hence, the possible range for this instrument is not from 0 to 100, but from 0 to 91. What this means is that the ceiling effect was documented for those patients who had scored 91 points. All 110 patients have completed HHS on at least two different occasions (T1 and T2), and 109 of them completed a third time (T3). There were two and a half weeks between each of these three occasions. # e) Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) 8 24 Likert-type items make this WOMAC and using it, every patient gets three scores from three different subscales. First subscale -pain -has five questions (score range 0-20), two questions address stiffness (score range 0-8), and physical function has 17 questions (range 0-68). A 0 score on each of the subscales means that the patient has not felt any discomfort in his/her hip (if any); on the other hand, a higher score suggests a higher disability. The survey was done on two different occasions, and two weeks had passed between the two testing situations. # III. # Results # a) WOMAC questionnaire WOMAC has been validated in Arabic speaking countries and has since been employed in clinical practice. Nevertheless, we did additional analyses to explore the psychometric characteristics of a WOMAC questionnaire that was used in this study. Test reliability for the first testing situationcalculated using Cronbach's alpha -was 0.98 for the pain subscale, 0.98 for the stiffness, and 0.99 for the physical function subscale. For the second testing, reliability was 0.99, 0.97, and 0.99 (pain, stiffness, and physical function, respectively). This is proof that WOMAC is a reliable instrument. To check content validity, we examined floor and ceiling effects. 10% of the patients have recorded floor effect on pain subscale, 14% on stiffness subscale, and 12% on the physical function. On the other hand, 3% have recorded ceiling effects on the pain subscale, 3% on stiffness subscale, and 3% on the physical function. Being that these percentages are far less than 30% (which is considered relevant) -this is an argument in favor of the content validity of WOMAC. # Harris Hip Score To test the reliability of the instrument, we calculated Cronbach's alpha. For each of the three testing occasions the reliability was very good or excellent? 1 = 0.92, ? 2 = 0.91, and ? 3 = 0.90. The intraclass correlation coefficient was good with a score of 0.76 (95% CI 0.44-0.88). We recorded floor effect for 1% of the patients, and 2% showed a ceiling effect in the first week of testing. Two and a half weeks later, 1% of respondents again showed the ceiling effect, and there was no floor effect recorded. On the third testing, 1% recorded the floor effect, and an additional time ceiling effect was not documented. We checked whether the data had deviated significantly from the normal distribution using the Shapiro-Wilk test. The result showed that it did, in all three testing occasions. We applied a 2-week test-retest reliability of HHS to the present manuscript. Of the 110 patients that fulfilled the questionnaire, 108 responded to the second assessment after the initial evaluation. Test-retest reliability was performed using Intraclass Correlation (ICC). The results (Table 2) indicated that HHS has an acceptable intra-class correlation with 0.755 (95% CI 0.442, 0.876). Considering the value of 0.902 (95% CI 0.704 -0.955) for Cronbach's alpha, the internal consistency of the three assessments were proven to be very high. To be able to compare the results of the WOMAC questionnaire with those from HHS, it was necessary to standardize the scores of WOMAC to the range of 0-100. Also, the HHS scores, which were in the range of 0-91, were rescaled to 0-100 to match the WOMAC scores. Figure 1 illustrates the change and the mean level of different subscales during different assessments which were conducted two weeks apart from each other. It is visually evident that the mean score of HHS decreased, which is related to more pain and symptoms. At the same time, the WOMAC mean score is showing an upward trend, which is also related to more pain, and in general, worsened conditions of the patient. This illustrates a visual agreement between the two questionnaires. # b) Responsiveness Fourteen patients (13.1%) reported overall relevant improvement in their condition by responding to the WOMAC questionnaire, while 53 patients (49.5%) reported worsening of their condition, and 40 of participants remained stable (37.4%). On the other hand, only eight patients (7.3%) reported remaining stable by responding to the HHS questionnaire. The majority of them (86.4%) believed their condition to deteriorate, and only 6.4% of them reported relevant improvement after 2 weeks. Also, it is worth noting that twelve patients (11.2%) showed contradictory results (one patient improved according to HHS, and worsened according to WOMAC, while eleven patients showed the opposite). Thirty-three patients (30%) believed that their condition had aggravated according to HHS, while according to the WOMAC, their condition was not changed (Table 3). Effects are often used to give meaning to change over time in terms of 'trivial' (ES < 0.20), 'small' (ES ? 0.20 < 0.50),'moderate' (ES ? 0.50 < 0.80) or 'large' (ES ? 0.80) change. Cohen introduced this 'matched pairs' effect size, which was later renamed the standardized response mean (SRM) by Liang et al. 20 According to responsiveness test, WOMAC subscales show similar responsiveness (SRM = 0.41) between first and second measurement. In comparison to WOMAC, HHS showed better responsiveness with SRM = 0.46. It is important to note, however, that responsive change of both questionnaires are very similar and the differences are not considerable. # c) Level of Agreement between WOMAC & HHS One of the best methods to measure the level of agreement between the two measurement methods is the Bland-Altman plot. In this method, the mean difference between WOMAC and HHS is plotted as a function of the mean of WOMAC and HHS. As shown in the graphs, the overall mean difference between WOMAC and HOOS shows that there could be a systemic bias between two questionnaires (M = -7.49, 95% CI -13.59, -1.41, p = 0.016). To test this result, linear regression was performed with a mean difference between WOMAC and HOOS as a dependent variable and a mean value of WOMAC and HOOS as an independent variable. The result of linear regression also indicates statistically significant difference between the two measurement methods (? = -0.94, 95% CI -1.801 --0.081, t = -2.168, p = 0. The first and last measurements of both methods are also compared together with the help of the Bland-Altman plot, to investigate whether there will be any change over time to the systemic bias between the two methods. The results indicate that in the first measurement there is a systemic bias between the two methods (M = -18.9, 95% CI -25.13, -12.65, p < 0.001), the performed linear regression also confirms this bias (? = -0.95, 95% CI -1.81 --0.104, t = -2.235, p = 0.028). It means that HHS increasingly overestimates the worsened conditions in comparison to WOMAC. However, in the last measurement, the slope of the regression line decreases and became statistically insignificant (? = -0.58, 95% CI -1.38 -0.23, t = -1.429 p = 0.156). IV. # Discussion The primary objective of this study was to create a reliable and valid Arabic version of HOOS by translation and adaptation. For this purpose, the Arabic version of HHS is compared to the efficacy and results of the WOMAC questionnaire. Preliminary validity and reliability tests revealed that there is a moderate reverse correlation between WOMAC subscales and HHS, which indicated that they are related in the right direction, since their scores are in the opposite directions (0 for WOMAC = no pain / 0 for HHS = extreme pain). However, according to Altman and Bland's views regarding the correct analysis of the data gathered in studies of this type, it is not enough to use the correlation coefficient between the two measurements as a measure of agreement 18. They pointed out that methods can correlate well yet disagree greatly, as would occur if one method read consistently higher than the other. For this reason, the Bland-Altman Plot was used to measure the level of agreement between WOMAC and HHS. The Bland-Altman plots indicated that there is a systemic bias between WOMAC and HHS. And the linear regression illustrated that with an increasing mean score, the Arabic version of HHS tends to underestimate the results of WOMAC. According to McGrory et. al. 19 , Differences in scores between hips were highly correlated for HSS and WOMAC total score, HHS pain, and WOMAC pain subscores, and HHS function and WOMAC physical function subscores. However, they found out that WOMAC stiffness and HHS range of motion were not significantly correlated. Overall, they concluded, that patients with bilateral hip arthroplasty can apply the WOMAC osteoarthritis index questions to individual hips at the same time as effectively as the joint-specific HHS questions. The illustrated forest plots, and effect sizes, showed that HHS scores were generally higher than WOMAC scores. In general, the results of both methods lead the surgeon to the right direction when it comes to information about the overall condition of the patient, especially about the improvement or deterioration, however, it is important to be cautious using HHS when the change magnitude of patient's condition is investigated since there is a potential probability that the level of improvement of the patient's condition will be overestimated by HHS. The major outcome of this study is that the HSS Arabic version demonstrated high levels of validity and reliability of evaluated patient-reported outcomes of Arabic patients with a range of hip pathologies. The patients did not encounter any difficulty in completing the questionnaire. An evaluation of the internal consistency showed that Cronbach's ? coefficient for the HSS Arabic version was within the recommended range of values 10 , the implication being that the questionnaire items were nonredundant as well as homogenous. The Arabic version of the HSS appears to have an excellent test-retest reliability (ICC, 0.755), compared to data reported in previous literature 11 . Hinman et al reported lower test-retest reliability with a 0.76 ICC value which corresponds with ours 12 . Interval of time between repeat measurements is a vital issue to be considered when determining the reliability of testretest. According to the literature, the estimation of HSS test-retest reliability ranges from 7-14 days, and three weeks to a month 11,12 . If patients are given short-retest intervals, then there is the risk of them getting overfamiliar with the questions, while answers given will depend on their potential to recall the answers in the first assessment. Although this possibility is decreased by H longer intervals, one may observe a spontaneous improvement of acute complaints. Generally, there should be a very short period between repeat administrations of outcome measures reported by the patient, when the condition being measured is expected to undergo a rapid change. The test was repeated seven days after the initial assessment. Hinman et al did a ~7.5-day interval retest for the hip patients (7-14 days), which corresponds with our study 12 . Celik et al. 21 sought to translate and culturally adapt the HHS into Turkish, and thereby determine the reliability and validity of the translated version. Celik et al translated the HHS into Turkish per Beatonrecommended stages. 80 patients were tested by the HHS. The Turkish version of the HHS showed sufficient internal consistency (Cronbach's alpha, 0.70) and testretest reliability (ICC = 0.91) compared to the Arabic version which had test-retest reliability of 0.755 11 . The Turkish study' correlation coefficients between the WOMAC & the OHS and the HHS were 0.89 and 0.64 respectively 21 . The highest correlations between the HHS and SF-36 were with the physical function scale (r = 0.72), and the lowest correlations were with the mental function scale (r = 0.10). Celik et al. observed no floor or ceiling effects. The literature has reported several validity tests. Studies conducted recently have sought to investigate the validity of the HHS by determining the link that it has with other outcome measures reported by patients, such as the Total Functional Score 13 , the WOMAC 11,14 , and the Nonarthritic Hip Score 15 . Our study provided evidence for construct validity by establishing the link between the Arabic versions to the WOMAC. The Arabic version of the HHS and the WOMAC had a very good construct validity (r = 0.67), which corresponded with that in previously documented data 12,16 . Evidence for discriminate validity and convergent validity was provided. We determine what links existed between the eight scale scores and the HHS and 2 summary scores of the SF-36. Of course, the HHS had a strong relationship with concurrent measures of physical function compared to concurrent measures of mental function. We found the lowest correlation value between the HHS and mental domains of the SF-36 (r = 0.014). This demonstrates that the SF-36 measures additional aspects of physical health and provides more comprehensive, but less specific, information about a patient's overall health than do condition-specific questionnaires. V. # Conclusion The primary purpose of this study was to create a reliable and valid Arabic version of HHS by translation and adaptation. Its reliability -calculated both through Cronbach's alpha and ICC -was good or moderate. Although the distributions for all subscales deviate from a normal one, no significant ceiling or floor effects were observed. The Arabic version of HHS is short and easily administered and interpreted with minimal investment of time required for both the researcher and clinician. We believe that the Arabic version of the HHS is sufficient to evaluate the state of a Hip disease. Its levels of reliability and validity are acceptable and we believe that it will facilitate the assessment of functional limitations and symptoms experienced by Arab-speaking individuals with a variety of hip disorders. There is a need for further studies to assess the responsiveness and to determine the minimum clinically relevant differences in the Arabic version of the HHS for common Hip pathologies. 1![Figure 1: The mean score and the absolute difference along with their standard deviations during 3 different assessments for HHS and two assessments for the WOMAC questionnaire. Decrease of the mean score in HHS & Increase of mean score in WOMAC = worsened conditionAs illustrated in the table below, there are medium to large negative correlations between Harris Hip Score on one side, and all the subscales from the WOMAC questionnaire on the other. It shows that patients with high scores on WOMAC have low scores on HHS. It, therefore, means that those who experience more severe hip pain have higher scores on WOMAC, and lower HHS.](image-2.png "Figure 1 :") 2![Figure 2: Forest Plot of Effect Sizes and SRMs for the WOMAC subscales and HHS. Bars represent the 95% confidence intervals](image-3.png "Figure 2 :") 3![Figure 3: Bland-Altman Plot to demonstrate the level of agreement between HHS and WOMAC (First, last, and average assessments). The linear regression line is also drawn to better demonstrate the systemic bias between the two methods](image-4.png "Figure 3 :") 1N 1Min 2Max 3MeanSD 4Sk 5Ku 6Floor effect Ceiling effectWeek 111009166.017.613-1.2321.4941%2%HHSWeek 211008761.117.841-1.024.6921%0%Week 310808552.618.563-.565-.0151%0%Note: 1 Sample size; 2 Minimum; 3 Maximum; 4 Standard deviation; 5 Skewness; 6 Kurtosis. 2SubscalesFirst assessmentScores Second assessmentThird assessmentChange*ICC (95% CI)Cronbach's alpha (95% CI)MeanSDMeanSDMeanSDWOMACPain53.2215.9063.1718.859.950.581 (0.234 -0.760)0.735 (0.379 -0.864)Stiffness53.3816.8763.5518.5010.170.593 (0.230 -0.772)0.745 (0.375 -0.872)Physical Function53.3116.3962.9118.609.600.623 (0.262 -0.793)0.768 (0.416 -0.884)HHS72.5519.3567.1219.6157.8120.40-14.740.755 (0.442 -0.876)0.902 (0.704 -0.955)* Minus sign in HHS means that the condition of the patient has been worsened over time (lower score = Deterioration) / Plus sign in WOMAC means that the condition of the patient has been worsened over time (higher score = Deterioration) 2WOMACPainStiffnessPhysical functionNote: ** Correlation is significant at the 0.01 level(2-tailed). 3D D D D ) H(QUESTIONNAIRESStableHARRIS HIP SCORE (HHS) ImprovementDeteriorationTOTALStable3.7%2.8%30.8%37.4%WOMACImprovement0.0%2.8%10.3%13.1%Deterioration3.7%0.9%44.9%49.5%TOTAL7.5%6.5%86.0%100.0% 4QuestionnaireSubscalesEffect Size (Cohen's d)95% CI*SRM95% CI*Pain0.5710.3870.7510.4060.3580.434WOMACStiffness0.5740.3950.7490.4110.3660.436Physical Function0.5470.3780.7090.4100.3630.434HHS0.7290.5370.8910.4560.4410.467* Bootstrap confidence interval (1000 iterations; random number seed: 978). © 2020 Global Journals ## Acknowledgment Not applicable ## Declarations Ethical approval and Consent for publication Consent to publish Consent of participation and publish was obtained with written Format from all participants. ## Availability of data and material The data that support the findings of this study are available from [ministry of health Al-raze hospital, Kuwait] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of [ministry of health Al-Razi hospital, Kuwait]. ## Competing interests The authors declare that they have no competing interests. ## Funding No funding was supplied in this case report. ## Authors' Contributions The data collection and the writing were done by all the four authors equally. * Implementing patient-reported outcomes in clinical decision-making within the knee and hip osteoarthritis: an explorative review NSørensen LHammeken JThomsen LEhlers BMC Musculoskeletal Disorders 20 1 2019 * MPSiljander KSMcquivey AMFahs LAGalasso KJSerdahely MSKaradsheh Current Trends in Patient-Reported Outcome Measures in Total Joint Arthroplasty: A Study of 4 2018 * JournalsMajor Orthopaedic The Journal of Arthroplasty * Commonalities, differences, and challenges with patient-derived outcome measurement tools: function/activity scales PCNoble MDwyer ABrekke Clin Orthop Relat Res 471 2013 * Choosing an outcome measure PBPynsent J Bone Joint Surg Br 83 6 2001 * Socio-economic costs of osteoarthritis: a systematic review of costof-illness studies JPuig-Junoy RuizZamora A Semin Arthritis Rheum 44 2015 * Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty. An end-result study using a new method of result evaluation WHHarris J Bone Joint Surg Am 51 4 1969 * MannevikHip disability and osteoarthritis outcome score. An extension of the Western Ontario and McMaster Universities Osteoarthritis Index MKlassbo ELarsson E Scand J Rheumatol 32 2003 * Oxford Hip Score (OHS), Lequesne Index of Severity for Osteoarthritis of the Hip (LISOH), and American Academy of Orthopedic Surgeons (AAOS) Hip and Knee Questionnaire ANilsdotter ABremander Hip Disability and Osteoarthritis Outcome Score (HOOS) 2011 63 Measures of hip function and symptoms: Harris Hip Score (HHS) * Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines FGuillemin CBombardier DBeaton J Clin Epidemiol 46 12 1993 * Quality criteria were proposed for measurement properties of health status questionnaires CBTerwee SDBot MRDe Boer J Clin Epidemiol 60 1 2007 * Outcome of total hip replacement: a comparison of different measurement methods PSöderman HMalchau PHerberts Clin Orthop Relat Res 390 2001 * Which is the most useful patientreported outcome in femoroacetabular impingement? Test-retest reliability of six questionnaires RSHinman FDobson ATakla JO'donnell KLBennell Br J Sports Med 48 6 2014 * The Harris hip score: Do ceiling effects limit its usefulness in orthopedics? KEWamper INSierevelt RWPoolman MBhandari DHaverkamp Acta Orthop 81 6 2010 * Reliability and validity of the cross-culturally adapted German Oxford hip score FDNaal MSieverding FMImpellizzeri FVon Knoch AFMannion MLeunig Clin Orthop Relat Res 467 4 2009 * The nonarthritic hip score: reliable and validated CPChristensen PLAlthausen MAMittleman JALee JCMccarthy Clin Orthop Relat Res 406 2003 * Reliability, validity, and responsiveness of functional tests in patients with total joint replacement RKShields LJEnloe REEvans KBSmith SDSteckel Phys Ther 75 3 1995 * Correlation of Short Form-36 and disability status with outcomes of arthroscopic acetabular labral debridement BKPotter BAFreedman RCAndersen JABojescul TRKuklo KPMurphy Am J Sports Med 33 6 2005 * Measurement in medicine: the analysis of method comparison studies DGAltman JMBland Statistician 32 1983 * Can the western Ontario and McMaster Universities (WOMAC) osteoarthritis index be used to evaluate different hip joints in the same patient? BJMcgrory WHHarris J Arthroplasty 11 7 1996 * Translation, Cross-Cultural Adaptation, and Validation of the Turkish Version of the Harris Hip Score MHLiang AHFossel Mg ; DLarson CCan YAslan HHCeylan KBilsel AOzdincler Medical Care 28 7 1990. 2014 Hip Int