Standardization of Uveitis Nomenclature ("SUN"), Global, 2004-2021 (ICPSR 38665)

Version Date: Feb 14, 2024 View help for published

Principal Investigator(s): View help for Principal Investigator(s)
Douglas Jabs, Johns Hopkins University. Bloomberg School of Public Health

https://doi.org/10.3886/ICPSR38665.v1

Version V1

Slide tabs to view more

The uveitides are a collection of >30 diseases characterized by intraocular inflammation. Collectively, they are the 5th or 6th leading cause of blindness in the United States, and the cost of treating them has been estimated be comparable to the cost of treating diabetic retinopathy. These diseases may be due to an intraocular or systemic infection, associated with a systemic rheumatic or other inflammatory disease or eye-limited and immune-mediated. They often are grouped by the primary site of inflammation as anterior, intermediate, posterior, or panuveitides, with the primary site of clinically detected inflammation in the anterior chamber, vitreous, retina and/or choroid, or entire eye, respectively. Clinical and translational research in the field of Uveitis has been hampered by the lack of gold standards for diagnosis and a lack of consistency in the diagnosis of these diseases. Agreement among uveitis experts on diagnosis has been modest at best with some pairs of experts having agreement no better than chance alone. Research in other branches of medicine has been greatly facilitated by the development of classification criteria. Classification criteria are a type of diagnostic criteria for research purposes. Classification criteria differ from clinical diagnostic criteria in that, if a trade-off is needed, classification criteria emphasize specificity, i.e. the classification of a group of patients definitely thought to have the disease. The Standardization of Uveitis Nomenclature (SUN) Working Group is an international group of 99 investigators from 64 centers in 22 countries, with expertise in uveitis, informatics, consensus techniques, database management, ophthalmic image interpretation, and machine learning.

The SUN Working Group's project "Developing Classification Criteria for the Uveitides" goal was to develop classification criteria for 25 of the most common uveitides. The project proceeded in 4 phases: 1) informatics, 2) case collection, 3) case selection, and 4) machine learning.

The informatics phase resulted in a standardized language to describe the uveitides and a successful mapping of terms and phrases to individual diseases. The informatics phase led to the creation of a menu-driven, hierarchical, data collection tool for the case collection phase. The case collection phase consisted of the SUN Working Group entering retrospective and de-identified data on 5766 cases (total) into a preliminary database. The goal of case collection was 100-250 cases of each of the 25 diseases. Because of the lack of gold standards for diagnosis and the modest agreement among experts on diagnosis, collected cases were reviewed, and only cases with a supermajority (>75%) agreement that they were the disease were selected for the final database. Case selection consisted of committees of 9 uveitis experts reviewing the cases and voting on whether or not they were the disease. This process used formal consensus techniques, including nominal group techniques. Committees were geographically and school-of-thought dispersed. Cases achieving a supermajority agreement that they represented the disease were included in the final database. Cases with a supermajority agreement that they were not the disease were excluded, and cases without a supermajority agreement were tabled. Only 1% of cases were tabled. The final database consisted of 4046 cases (70% of those collected). The consensus diagnosis was used as the accepted diagnosis in the machine learning phase.

Following case selection, the final database was subjected to machine learning as to features that distinguished the diseases. For machine learning the case data were split into a training set and a validation set. Cases were analyzed within anatomic class, with cases from those diseases with protean presentations used in more than one class. Multiple different machine learning approaches were used, including classification and regression trees, random forests, support vector machines and multinomial logistic regression, all of which tended to have a high degree of agreement on the distinguishing features and relatively similar accuracies. The method chosen for reporting was multinomial logistic regression. Boruta analyses were used to determine a parsimonious set of criteria, and the Quine-McCluskey algorithm to create a logical set of Boolean expressions that correctly classified the diseases. Because different tests or clinical features (e.g. hilar adenopathy in patients with sarcoid can be seen on chest radiography or on chest computed tomography) might be able to indicate the disease, feature engineering was used during machine learning. The set of Boolean expressions from the machine learning were then translated into English phrases ("final rules") for clinical use. As a back check on the translation, a random set of 10% of cases was subjected to classification by an observer masked as to the consensus diagnosis. These performance of these criteria (>90% accuracy within class for machine learning on the validation set and >95% accuracy of the "final rules" by the masked observer) suggest that they can be used in clinical and translational research.

Following the machine learning phase, a meeting of the SUN Working Group was held in December 2019 to review the work and the proposed criteria. The result of this meeting was an approval of the criteria. Twenty-six manuscripts were prepared, one dealing with the methods used, and 25 disease-specific manuscripts with the criteria for each disease. The individual diseases addressed in this project included: cytomegalovirus anterior uveitis, Fuchs uveitis syndrome, herpes simplex anterior uveitis, juvenile idiopathic arthritis-associated anterior uveitis, spondyloarthritis/HLA-B27-associated anterior uveitis, tubulointerstitial nephritis with uveitis, varicella zoster anterior uveitis, pars planitis, intermediate uveitis non-pars planitis type, multiple sclerosis-associated intermediate uveitis, acute posterior multifocal placoid pigment epitheliopathy, birdshot chorioretinitis, multiple evanescent white syndrome, multifocal choroiditis with panuveitis, punctate inner choroiditis, serpiginous choroiditis, Behçet disease uveitis, sympathetic ophthalmia, Vogt-Koyanagi-Harada disease, sarcoidosis-associated uveitis, acute retinal necrosis syndrome, cytomegalovirus retinitis, syphilitic uveitis, toxoplasmic retinitis, and tubercular uveitis. The goal is for these criteria to be used as the underpinning for future clinical and translational research in the field of Uveitis.

Jabs, Douglas. Standardization of Uveitis Nomenclature (“SUN”), Global, 2004-2021. Inter-university Consortium for Political and Social Research [distributor], 2024-02-14. https://doi.org/10.3886/ICPSR38665.v1

Export Citation:

  • RIS (generic format for RefWorks, EndNote, etc.)
  • EndNote

This data collection may not be used for any purpose other than statistical reporting and analysis. Use of these data to learn the identity of any person or establishment is prohibited. To protect respondent privacy, the data files in this collection are restricted from general dissemination. To obtain these restricted files researchers must agree to the terms and conditions of a Restricted Data Use Agreement.

Inter-university Consortium for Political and Social Research
Hide

2004 -- 2021
2004 -- 2021
  1. This release is a Fast Track Release and files are distributed as they were received from the data depositor. The files have been zipped for release. Users should consult the investigator(s) if further information is needed.

  2. The investigator provided the following information:

    The SUN datasets deposited with ICPSR include the following datasets: 1) the cases collected in the "case collection" phase; 2) the cases selected for inclusion in the machine learning phase ("selected cases"); and 3) the datasets with feature engineering used for machine learning. The original Case Collection Dataset included a free text field for additional data considered potentially useful by the individual investigator at the participating site. Datasets for the machine learning include feature engineered variables derived from the original datasets and may have included information in the free text fields. As per ICPSR policies on the protection of patient confidentiality, free text fields have been masked. This masking could potentially limit the ability to replicate the formation of feature-engineered variables, but it is estimated to be relevant in less than 5% of cases.

Hide

The SUN Working Group's project "Developing Classification Criteria for the Uveitides" goal was to develop classification criteria for 25 of the most common uveitides. The goal is for these criteria to be used as the underpinning for future clinical and translational research in the field of Uveitis.

The project proceeded in 4 phases: 1) informatics, 2) case collection, 3) case selection, and 4) machine learning.

The informatics phase resulted in a standardized language to describe the uveitides and a successful mapping of terms and phrases to individual diseases. The informatics phase led to the creation of a menu-driven, hierarchical, data collection tool for the case collection phase. The case collection phase consisted of the SUN Working Group entering retrospective and de-identified data on 5766 cases (total) into a preliminary database. The goal of case collection was 100-250 cases of each of the 25 diseases. Because of the lack of gold standards for diagnosis and the modest agreement among experts on diagnosis, collected cases were reviewed, and only cases with a supermajority (>75%) agreement that they were the disease were selected for the final database. Case selection consisted of committees of 9 uveitis experts reviewing the cases and voting on whether or not they were the disease. This process used formal consensus techniques, including nominal group techniques. Committees were geographically and school-of-thought dispersed. Cases achieving a supermajority agreement that they represented the disease were included in the final database. Cases with a supermajority agreement that they were not the disease were excluded, and cases without a supermajority agreement were tabled. Only 1% of cases were tabled. The final database consisted of 4046 cases (70% of those collected). The consensus diagnosis was used as the accepted diagnosis in the machine learning phase.

Following case selection, the final database was subjected to machine learning as to features that distinguished the diseases. For machine learning the case data were split into a training set and a validation set. Cases were analyzed within anatomic class, with cases from those diseases with protean presentations used in more than one class. Multiple different machine learning approaches were used, including classification and regression trees, random forests, support vector machines and multinomial logistic regression, all of which tended to have a high degree of agreement on the distinguishing features and relatively similar accuracies. The method chosen for reporting was multinomial logistic regression. Boruta analyses were used to determine a parsimonious set of criteria, and the Quine-McCluskey algorithm to create a logical set of Boolean expressions that correctly classified the diseases. Because different tests or clinical features (e.g. hilar adenopathy in patients with sarcoid can be seen on chest radiography or on chest computed tomography) might be able to indicate the disease, feature engineering was used during machine learning. The set of Boolean expressions from the machine learning were then translated into English phrases ("final rules") for clinical use. As a back check on the translation, a random set of 10% of cases was subjected to classification by an observer masked as to the consensus diagnosis. These performance of these criteria (>90% accuracy within class for machine learning on the validation set and >95% accuracy of the "final rules" by the masked observer) suggest that they can be used in clinical and translational research.

Following the machine learning phase, a meeting of the SUN Working Group was held in December 2019 to review the work and the proposed criteria. The result of this meeting was an approval of the criteria. Twenty-six manuscripts were prepared, one dealing with the methods used, and 25 disease-specific manuscripts with the criteria for each disease.

Cases were collected in an informatics designed preliminary database. Using formal consensus techniques, a final database was constructed of 4,046 cases achieving supermajority agreement on the diagnosis. Cases were analyzed within uveitic class and were split into a training set and a validation set. Machine learning used multinomial logistic regression with lasso regularization on the training set to determine a parsimonious set of criteria for each disease and to minimize misclassification rates. The resulting criteria were evaluated in the validation set. Accuracy of the rules developed to express the machine learning criteria was evaluated by a masked observer in a 10% random sample of cases.

Cross-sectional

Differential diagnoses of uveitis by anatomic class. Deidentified cases of uveitis were submitted from 76 clinician investigators across 5 continents. Of the 5766 cases collected, 4046 were selected for inclusion in the final database.

Images of uveitis cases

Of the 5766 cases collected, 4046 (70%) were selected for inclusion in the final database.

The SUN Developing Classification Criteria for the Uveitides project proceeded in 4 phases:

  1. Informatics: conducted from 2009 to 2010 and developed a standardized vocabulary and set of dimensions for describing uveitic cases and diseases,
  2. Case collection: 5,766 cases of 25 of the most common uveitides was collected retrospectively between 2010 and 2016 using the standardized forms developed during the informatics phase. Information was entered into the SUN preliminary database by the 76 contributing investigators. Case information was de-identified, and investigators entered cases retrospectively from existing case records,
  3. Case selection: Case selection occurred during 2016 and 2017. Cases in the preliminary database were reviewed by committees of 9 investigators for inclusion into the final database (case "selection"). Committees were geographically and "school of thought" dispersed. Case selection proceeded in 2 steps: online voting followed by consensus conference calls,
  4. Machine learning: Machine learning was conducted during 2018 and 2019. The final database then was randomly separated into a training set ( ~85% of the cases) and a validation set (~15% of the cases) for each uveitic class).

Hide

2024-02-14

2024-02-14 ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:

  • Performed recodes and/or calculated derived variables.

Hide

Notes