Delphi consensus on the American Society of Anesthesiologists’ physical status classification in an Asian tertiary women’s hospital
Article information
Abstract
Background
The American Society of Anesthesiologists (ASA) score is generated based on patients’ clinical status. Accurate ASA classification is essential for the communication of perioperative risks and resource planning. Literature suggests that ASA classification can be automated for consistency and time-efficiency. To develop a rule-based algorithm for automated ASA classification, this study seeks to establish consensus in ASA classification for clinical conditions encountered at a tertiary women’s hospital.
Methods
Thirty-seven anesthesia providers rated their agreement on a 4-point Likert scale to ASA scores assigned to items via the Delphi technique. After Round 1, the group’s collective responses and individual item scores were shared with participants to improve their responses for Round 2. For each item, the percentage agreement (‘agree’ and ‘strongly agree’ responses combined), median (interquartile range/IQR), and SD were calculated. Consensus for each item was defined as a percentage agreement ≥ 70%, IQR ≤ 1.0, and SD < 1.0.
Results
All participants completed the study and none had missing data. The number of items that reached consensus increased from 25 (51.0%) to 37 (75.5%) in the second Delphi round, particularly for items assigned ASA scores of III and IV. Nine items, which pertained to alcohol intake, asthma, thyroid disease, limited exercise tolerance, and stable angina, did not reach consensus even after two Delphi rounds.
Conclusions
Delphi consensus was attained for 37 of the 49 study items (75.5%), facilitating their incorporation into a rule-based clinical support system designed to automate the prediction of ASA classification.
Introduction
Pre-anesthesia assessment is the process of clinical evaluation that precedes the delivery of anesthesia for surgical and non-surgical procedures [1]. Upon completing this assessment, it is standard practice to assign an American Society of Anesthesiologists (ASA) score based on the patient’s clinical status [2]. The ASA classification system is most widely used in the pre-anesthesia assessment for surgical patients and aids in resource planning [3], reimbursement of anesthesia services [4] and prediction of complications [5].
Despite its widespread use, studies suggest poor inter-rater agreement on ASA classifications [6–10]. Interpretations of ASA definitions may be influenced by the patient case-mix [11,12], rater expertise [9,11], and healthcare funding model [6]. However, consistency in ASA classification is vital for accurate risk prediction and resource planning. With the establishment of outpatient pre-anesthesia evaluation clinics, discrepancies between the ASA classification assigned by preprocedural and day-of-surgery anesthesiologists could lead to day-of-surgery cancellations, which are associated with decreased operating room efficiency, low staff morale, increased patient anxiety, and increased costs [13,14].
Traditional models of in-person pre-anesthesia assessments have transitioned to digital formats, administered by health care providers or self-administered by patients [15–23]. Electronic pre-anesthesia assessment platforms often incorporate clinical decision support systems (CDSS) that improve quality of care through the standardization of practice [24,25]. We previously reported the development and validation of a web-based Pre-AnaesThesia Computerized Health (PATCH) assessment application through a mixed-methods approach [22]. The PATCH application allows patients to self-administer a pre-anesthesia health screening questionnaire on a mobile device at the time, place, and pace most convenient to them. Patient responses gathered online generate a comprehensive health report that is as reliable and accurate as that of nurse-led assessment [23]. However, in its current form, the application does not automatically generate the ASA score. Therefore, we aimed to build a CDSS for automated ASA classification for integration into the PATCH application.
As part of the ongoing research to develop a CDSS for automated ASA classification, the present study was undertaken with the aim of establishing Delphi consensus in ASA classification for a spectrum of clinical conditions encountered in our tertiary women’s hospital setting. As is typical of the Delphi technique, experts’ opinions were sought to determine the extent of agreement between them, and discrepancies were resolved through a series of anonymized sequential rounds, interspersed with controlled feedback and an opportunity for respondents to modify their responses [26]. Items that attained consensus could then be incorporated to build decision rules for the program algorithm to automate ASA classification.
Materials and Methods
Study participants
Ethics approval of the study (2017/3002) was provided by the SingHealth Centralized Institutional Review Board of the Singapore Health Services Private Limited. The study was conducted at the Department of Women’s Anesthesia of the KK Women’s and Children’s Hospital, Singapore from 2 January to 28 February 2021. The 830-bed hospital provides tertiary care for women and children. Eligible experts contacted for the Delphi study were anesthesia providers of the department who staffed the outpatient pre-anesthesia evaluation clinics and operating rooms and had a minimum of two years’ experience providing supervised or independent anesthesia care. Purposive sampling was performed to ensure that the participants met eligibility criteria.
Study design
To assess Delphi consensus, two rounds of structured questionnaires were administered. Questionnaire items were formulated by three members of the study team (EL, BLS, and RD), each of whom have more than 20 years of clinical experience. The items covered patient conditions commonly encountered in our clinical setting and included examples adapted from the ASA-approved examples [2]. Conditions that are typically classified as ASA V (e.g., moribund patient) and VI (e.g., brain death) were excluded from the study, as they are not considered controversial in nature. The first version of the questionnaire was evaluated for clarity and relevance by two consultant anesthesiologists not affiliated with the hospital. No changes were deemed necessary after their review.
Round 1
After providing written informed consent, participants accessed a web-based questionnaire
(https://form.gov.sg/5fb48cb93a3ec7001128173b) to rate their agreement (on a 4-point Likert scale) to ASA scores assigned to 49 items in the ASA questionnaire framework. The participants had the option to provide free-text comments on individual items. The participants were also asked to provide information on gender and clinical experience.
Participants were instructed to indicate their level of agreement on a 4-point Likert scale (strongly disagree ‘1’, disagree ‘2’, agree ‘3’, and strongly agree ‘4’) to ASA scores assigned to the 49 items. The ‘neutral’ option was removed to move the group towards consensus [27] and produce stable findings in the Delphi [28].
Round 2
Four weeks after completion of the first Delphi round, participants received an individualized questionnaire in Excel format via email denoting their individual scores, the group median, distribution of responses, and the free-text comments collected in Round 1. Participants were then asked to reconsider their responses for Round 2, taking into consideration the group’s collective responses (i.e., median ASA score for each item) and comments obtained in Round 1. The method of providing feedback along with the distribution of responses per item has been previously described in similar Delphi studies [29]. For Round 2, participants were allowed to review their ratings in order to potentially achieve a level of consensus for the group rating. Free-text comments were not elicited for any of the items in Round 2. Fig. 1 summarizes the process of the Delphi technique used for this study.

Flow diagram illustrating the Delphi method used. ASA: American Society of Anesthesiologists, IQR: interquartile range, SD: standard deviation.
We had aimed to conduct two Delphi rounds, making an a priori decision to proceed with Round 3 if consensus was not achieved by Round 2.
Defining consensus
For the present study, consensus for each item was determined by a combination of the percentage agreement, interquartile range (IQR), and standard deviation (SD). Although using the percentage level setting based on the majority may be considered subjective [30], adding the IQR and SD increased the rigor regarding consensus since they are a measure of the stability of responses between rounds and level of convergence in the participants’ assessment [31,32].
To measure consensus in this study, the following criteria were used in combination a priori:
1. Percentage agreement ≥ 70%, meaning ≥ 70% of participants must either agree or strongly agree (Likert scale ≥ 3) with an item in Round 2 for it to be included in the ASA score assignment framework. This level of agreement has been described in previous studies using the Delphi technique [33].
2. IQR ≤ 1.0, meaning the IQR lies within one unit of the median on a 4-point Likert scale [31].
3. SD < 1.0, which indicates homogeneity in the participants’ responses [32].
Failure to achieve consensus in Round 2 on all three measures resulted in the item being excluded.
Statistical analysis
Data were analyzed using IBM SPSS Statistics for Windows (IBM Corp., USA) at the conclusion of each round. Demographic data and Likert item responses were analyzed using descriptive statistics. The median (IQR) score was calculated for each item. The categories ‘strongly agree’ and ‘agree’ were combined to compute the percentage agreement of each item. Variability in responses was measured using the SD, where a decrease in the SD between rounds indicated increasing homogeneity of the response. Regardless of whether the level of consensus was obtained in Round 1, all items were re-introduced in Round 2 of the Delphi survey to give every item the same opportunity to gain the highest rating and level of consensus.
Results
All 37 eligible staff members of the anesthesia department (excluding the three study team members) consented to the study and completed both Delphi rounds with no missing data (100% response rate). Table 1 shows the demographic characteristics of the 37 participants, comprising 15 consultant anesthesiologists, two anesthesia nurse practitioners, 14 residents, and six resident physicians. The majority (75.7%) of participants had ≥ five years of experience in providing anesthesia care.
Tables 2–5 shows the Delphi consensus levels of items at the end of two rounds. The number of items that reached consensus increased from 25 (51.0%) in the first round to 37 (75.5%) in the second round. The greatest increase in consensus occurred for the items assigned ASA scores III and IV. Consensus was obtained for 77.3% of items assigned ASA III (Table 4) and 100% of items assigned ASA IV (Table 5). Three items (age > 75 years, disseminated intravascular coagulation, and obstetric hemorrhage with Hb < 6 g/dl) did not achieve consensus in one assigned class but achieved consensus when assigned another ASA class.
After two Delphi rounds, consensus was not achieved for nine items, which pertained to alcohol intake of 1–2 pints twice a week, asthma with monthly attacks managed on home therapy, thyroid disease, exercise tolerance of one flight of stairs, and stable angina. As consensus was attained for at least 75% of the items after round two, it was deemed unnecessary to proceed with another consensus round and the study was concluded.
Discussion
Delphi consensus was attained for 37 of the 49 clinical items (75.5%), facilitating their inclusion in a rule-based clinical support system designed to automate the prediction of the ASA classification. We postulate that the moderate level of consensus obtained could reflect the similarity in training background among anesthesia providers at our setting of predominantly obstetric and gynecological cases. The literature also suggests an increased inter-rater agreement in ASA classification when raters share common training backgrounds and experience [11].
However, three clinical items (age > 75 years, disseminated intravascular coagulation [DIC], and obstetric hemorrhage with Hb < 6 g/dl) did not achieve consensus in one allocated ASA class but did in another class.
Aged > 75 years
Age alone is not a criterion for ASA classification, although chronic diseases are more prevalent with advanced age. Advanced age is also a risk factor for increased morbidity and mortality. Technically, ASA classification should be based on the assessment of underlying organ function resulting from deterioration associated with age or disease and not simply by an age cut-off. However, anesthesiologists have been known to apply an ASA score of II to otherwise healthy patients based on an arbitrary age criterion that ranges from 60 to 75 years [34], which was confirmed by participants in this study.
Disseminated intravascular coagulation
DIC is a condition characterized by macro- and microvascular thrombosis and progressive consumption coagulopathy. In pregnancy, it can be triggered by placental abruption, placenta previa, amniotic fluid embolism, intrauterine death, eclampsia, and the hemolysis, elevated liver enzymes, low platelet count syndrome. The mortality rate for DIC is reported to be 20% to 50% [35]. Hence, it is not surprising that the consensus rating of ASA IV was attained for DIC in this study.
Obstetric hemorrhage with Hb < 6 g/dl
Obstetric hemorrhage is a leading cause of maternal mortality, accounting for 27% of all maternal deaths [36]. As our institution is an obstetric tertiary referral center, anesthesia providers have had first-hand experience managing life-threatening obstetric hemorrhages, including placenta accreta spectrum disorders [37]. We postulate that clinical experiences had likely influenced the group consensus of an ASA score of IV for acute obstetric hemorrhage complicated by severe anemia.
After both Delphi rounds, the nine items that did not achieve consensus in ASA rating were alcohol consumption of 1–2 pints twice a week, asthma with monthly attacks managed by home therapy, thyroid disease with and without thyroid storm, exercise tolerance of one flight of stairs, and stable angina.
Alcohol intake
Participants could not reach a consensus on whether to assign an ASA score of II or III for alcoholic consumption of 1–2 pints twice a week. Based on the latest ASA guidelines, ‘minimal alcohol intake’ is an example of ASA I while ‘social drinking’ is considered ASA II [2]. The ASA definitions do not define differential volumes and alcohol concentrations. However, the U.S. Department of Agriculture defines social drinking as limited to ≤ 2 drinks a day in men and ≤ 1 drink a day in women [38]. Accordingly, the intake of 1–2 pints of alcohol twice a week would be considered minimal and should warrant an ASA I classification. Our results suggest that participants were likely to be up-to-date with current guidelines on alcohol consumption and of the opinion that the consumption of 1–2 pints twice a week warranted an ASA I classification.
Asthma
No consensus was achieved regarding an ASA II classification for a patient with asthma with monthly attacks that could be controlled by home therapy. The ASA definitions have previously been criticized for their subjective nature [6–9], and this is a case in point. ‘Asthma with exacerbation’ is an approved example for ASA III; however, it is vague and does not quantify frequency and severity, thus making it difficult to differentiate between ASA II and III. Therefore, participants likely had mixed opinions on whether to assign an ASA II or III classification, thus accounting for the results obtained.
Thyroid disease
The ASA classification does not provide approved examples of thyroid disease [2]. The item description, which states ‘active thyroid disease with abnormal levels of thyroid hormone,' is vague and does not provide details regarding the symptomatology or serum thyroid hormone levels. Without the benefit of clinical examination and laboratory thyroid measurements, we postulate that the majority of participants chose to adopt a more conservative approach in assigning ASA III to cases of active thyroid disease in the absence of thyroid storm. This failure to achieve consensus among participants could be explained by the fact that the presence of a thyroid storm is associated with a mortality of 10% [39], and an ASA IV classification would have been the appropriate option in that case.
Exercise tolerance
Exercise tolerance is an important predictor of cardiovascular complications after non-cardiac surgery [40]. In the preoperative setting, exercise tolerance can be estimated from activities of daily living using metabolic equivalents (METs), where 1 MET is the resting oxygen consumption of a 40-year-old, 70 kg man [41,42]. Exercise tolerance for one flight of stairs or ≥ 4 METs [40] is usually used as a discriminator for further preoperative cardiac testing [41]. In the present study, participants agreed that an exercise capacity of one flight of stairs constituted an ASA III classification but could not agree that an exercise capacity of two flights of stairs constituted an ASA II physical status classification.
Few authorities have argued that exercise tolerance may be better utilized as an indicator for further cardiac testing [43]. In one study, exercise tolerance < 4 METs was used to further stratify a broad category of ASA III vascular patients for more accurate risk prediction [44].
Stable angina
Stable angina is characterized by chest pain that is precipitated by exertion but relieved with rest or medication. In the ASA guidelines and approved examples [2], myocardial infarction is listed as an approved example, with onset ≤ 3 months as a discriminator between ASA III and ASA IV classifications. Besides this temporal relationship, stable and unstable angina are not provided as approved examples for ASA classification. Hence, participants likely drew upon their own varied clinical experience for interpretation, resulting in the lack of consensus.
The findings of this study provide a preliminary platform to establish decision ‘rules’ for the automated prediction of ASA classification scores, with the benefit of improved productivity and consistency in classification. A CDSS can either be knowledge-based and implemented as a conditional logic, or non-knowledge-based using artificial intelligence to derive patterns from clinical data sets [45]. CDSSs aid clinical decision making [46] and have been implemented for direct patient care [47] or to improve protocol compliance and quality measures [48]. More recently, CDSSs incorporating the automated prediction of a patient’s ASA classification have been reported [24,49]. In one study, data from a web-based preoperative assessment system were processed using decision logic to provide automated computation of ASA scores [24]. Except for 159 cases (or 1.1%), the computed ASA scores showed close agreement with ASA scores estimated clinically by a heterogeneous group of anesthesia providers. Machine learning approaches have also been developed to predict ASA classification [49]; however, the quality of the algorithm’s output is highly dependent on the quality and size of the data sets. A simple and basic CDSS based on the ‘IF THEN’ rule could be designed using data from the present study. For example, a patient aged > 75 years would automatically be assigned an ASA class of II based on the consensus attained, unless it is superseded by another condition that warrants a higher ASA classification score.
This study has a number of strengths and limitations. Although the sample size was only 37, a 100% response rate was obtained for both Delphi rounds. To ensure the robustness of the Delphi, all items in Round 1 were maintained in Round 2 to give every item an equal opportunity of attaining consensus in each round. The re-circulation of items also made it possible to compare the IQR, which indicated whether consensus was present throughout or only developed between rounds. However, the study was conducted at a single institution with its unique case-mix; therefore, external validity of the results is limited. The level of consensus could also vary in another population of anesthesia providers or even in the same population at another time. Additionally, controversial items could have been repeated under other ASA classes to give participants the chance to achieve consensus in these ASA classes. To develop an accurate and robust system for automated ASA classification, consensus should ideally be achieved for all items. This can be achieved by training participants in ASA classification. Future research should also evaluate consensus on a wider range of clinical conditions (including clinical and laboratory data) to improve the internal validity of the system. Consensus could also be evaluated through clinical vignettes oriented to local practice, as this has been shown to improve the internal consistency of ASA classifications [50].
In the present study, Delphi consensus in ASA classification was attained for 37 of the 49 (75.5%) example cases commonly encountered at our tertiary women’s hospital. This facilitated the development of a rule-based CDSS for the automated prediction of ASA classification in a pre-anesthesia health assessment application. Future research should seek consensus in ASA classification on a wider range of clinical conditions and vignettes to improve internal validity.
Acknowledgements
The authors would like to thank Agnes Teo for managing the administrative activities related to the conduct of the study.
Notes
Funding
The study is supported by a hospital grant (KKHHF/2018/04).
Conflicts of Interest
No potential conflict of interest relevant to this article was reported.
Author Contributions
Tarig Osman (Conceptualization; Formal analysis; Writing – original draft)
Eileen Lew (Conceptualization; Data curation; Funding acquisition; Writing – review & editing)
Ban L. Sng (Data curation; Writing – review & editing)
Rajive Dabas (Data curation; Writing – review & editing)
Konstadina Griva (Formal analysis; Writing – review & editing)
Josip Car (Writing – review & editing)