/**************************************************************************** PROGRAM: DIAB_APR94_MAR15.SAS TITLE: Defining the Diabetes Cohort CREATED BY: Interdisciplinary Chronic Disease Collaboration (ICDC)/ Alberta Kidney Disease Network (AKDN), September 2010 /***************************************************************************** Updated by Zhihai in Mar 2016. ====================================================================== TERMS OF USE: This code was created by the ICDC/AKDN for its own use. Please be aware that the ICDC/AKDN DOES NOT: 1) Provide any guarantees about this code 2) Commit to the availability of any updates to the code 3) Support any implementation outside of the ICDC/AKDN Please use this code for information purposes only. Please do not distribute this code further. ====================================================================== Goal: Identify people in the Alberta Health and Wellness (AHW) administrative data from April 1, 1994 to March 31, 2015 with diabetes. This will form a cohort of patients with incident and prevalent diabetes. ICDC/AKDN Definition for Diabetes DATASETS: 1. Hospitalization Discharge Abstract Database (DAD), Apr. 1994 - Mar. 2015, all ICD diagnostic fields 2. Supplemental Enhanced Service Event (SESE) Database (Physician claims), Apr. 1994 - Mar. 2015, all ICD diagnositic fields Visits with diagnostic prefix "X", "B" or "E" excluded (These represent labs, radiology, ophthalmologist, dental, and chiropractic visits). CASE DEFINITION: One hospitalization (ICD-9-CM or ICD-10-CA) excluding gestational diabetes OR Two physician claims (excluding claims containing a CCPx procedure code with prefix 'X','B',or 'E') on separate dates with an ICD-9-CM code within two years. Diabetes hospitalizations are excluded prior to applying the case definition if those hospitalizations are found within 120 days before or 180 days after a gestational hospital record. The date of diabetes onset is defined as the first date of hospitalization for diabetes (excluding gestational diabetes), or the former of the two physician claims, whichever date is earliest. CODES: Diabetes: 250 (ICD9) AND E10-E14 (ICD10) Gestational: 641-676, V27 (ICD9) AND O1, O21-95, O98, O99, Z37 (ICD10) Records where diabetes event is 120 days before or 180 days after gestational event are excluded. HOSPITALIZATION TYPE: Any diagnosis type, selected from all available fields, for both diabetes hospitalizations and gestational hospitalizations DATES TO BE USED: All available data. The date of diabetes onset is defined as the first date of hospitalization or the former of the two physician claims, whichever date is earliest. WASH-OUT PERIOD: There will be a wash-out period of at least 3 years. At least one diagnosis of diabetes present during the wash-out period excludes a patient as an incident case. If the date of onset diabetes (by case definition) occurs on or after April 1, 1997, diabetes will be classified as an incident case. If the date of onset diabetes (by case definition) occurs between April 1, 1994 and March 31, 1997, there will not be a wash-out period of at least 3 years available. In this case, diabetes will be considered prevalent. USAGE NOTES (for this program): The DIAB_94_13_cohort dataset includes the patients with diabetes identified from the claims (no claims containing a procedure code with prefix X,B, or E) or hospitalizations using any diagnosis type from all available fields. This dataset will form a cohort of patients with incident and prevalent diabetes. Only the first case definition of diabetes is included. Diabetes diagnoses following the first (incident event) are not included. The dataset includes both adults and non-adults at the time of onset diabetes. The dataset includes a DATA_SOURCE variable to identify the data source of the diabetes case. PHN is the unique identifier used to link the administrative data. ******************************************************************************/ options threads=yes nofmterr mlogic helpbrowser=sas; * HELPBROWSER=SAS sets the SAS browser to default in the 64-bit OS --; * Location of the AHW administrative datasets (Hospitalizations and claims) --; libname ahw 'G:\IDENTIFIED\AHW\EXTRACTION_1994_2015\DATA\CleanDataCombined_2013_2015\DataCombined_2013_2015'; * Location to store the DIAB_94_15_cohort dataset --; libname diab 'G:\IDENTIFIED\AHW\EXTRACTION_1994_2015\DATA\DERIVED_DATA\AHW_DIABETES\DATA'; * Location of the demographics dataset created by the ICDC/AKDN. Includes information on gender, and dates of birth, death, and out-migration from Alberta --; libname demog 'G:\IDENTIFIED\AHW\EXTRACTION_1994_2015\DATA\DERIVED_DATA\AHW_DEMOGRAPHICS\DEMOGRAPHICS\DATA'; *********************************; * MACROS USED IN THIS PROGRAM -- ; *********************************; * MACRO for sorting by &srtvar --; %macro srt(inds,outds,srtvar); proc sort data=&inds out=&outds; by &srtvar; run; %mend; * MACRO for first record of &frstvar --; %macro first(inds,outds,srtvar,frstvar); data &outds; set &inds; by &srtvar; if first.&frstvar; run; %mend; * MACRO for defining Fiscal Year from April 1994 - March 2015 --; %macro fy(in_t=,out_t=,date=); data &out_t; set &in_t; if not missing(&date) then do; if &date<='31MAR1994'd then FY='----/1994'; if '01APR1994'd<=&date<='31MAR1995'd then FY='1994/1995'; if '01APR1995'd<=&date<='31MAR1996'd then FY='1995/1996'; if '01APR1996'd<=&date<='31MAR1997'd then FY='1996/1997'; if '01APR1997'd<=&date<='31MAR1998'd then FY='1997/1998'; if '01APR1998'd<=&date<='31MAR1999'd then FY='1998/1999'; if '01APR1999'd<=&date<='31MAR2000'd then FY='1999/2000'; if '01APR2000'd<=&date<='31MAR2001'd then FY='2000/2001'; if '01APR2001'd<=&date<='31MAR2002'd then FY='2001/2002'; if '01APR2002'd<=&date<='31MAR2003'd then FY='2002/2003'; if '01APR2003'd<=&date<='31MAR2004'd then FY='2003/2004'; if '01APR2004'd<=&date<='31MAR2005'd then FY='2004/2005'; if '01APR2005'd<=&date<='31MAR2006'd then FY='2005/2006'; if '01APR2006'd<=&date<='31MAR2007'd then FY='2006/2007'; if '01APR2007'd<=&date<='31MAR2008'd then FY='2007/2008'; if '01APR2008'd<=&date<='31MAR2009'd then FY='2008/2009'; if '01APR2009'd<=&date<='31MAR2010'd then FY='2009/2010'; if '01APR2010'd<=&date<='31MAR2011'd then FY='2010/2011'; if '01APR2011'd<=&date<='31MAR2012'd then FY='2011/2012'; if '01APR2012'd<=&date<='31MAR2013'd then FY='2012/2013'; if '01APR2013'd<=&date<='31MAR2014'd then FY='2013/2014'; if '01APR2014'd<=&date<='31MAR2015'd then FY='2014/2015'; if '01APR2015'd<=&date then FY='2015/----'; end; COUNT=1; run; %mend; * MACRO to delete records in datasets occuring before dob and after death or outmigration --; * This macro uses the demographics dataset and SRT macro, defines the AGE variable, and produces * frequencies of records to be deleted. It also creates the &out_t2 dataset with the records deleted --; %macro demogcheck(in_t=,out_t1=,out_t2=,sourcedate=); data &out_t1; merge &in_t (in=in1) demog.ahw_demographics_2015(in=in2); by phn; if in1; age=(&sourcedate-pers_dob)/365.25; if (pers_dob^=. and pers_dob>&sourcedate) or pers_dob=. then dobdelete=1; if death_date^=. and death_date<&sourcedate then deathdelete=1; if out_migrate_date^=. and out_migrate_date<&sourcedate then outdelete=1; if dobdelete=1 | deathdelete=1 | outdelete=1 then rec_delete=1; run; proc freq data=&out_t1; table dobdelete*deathdelete*outdelete*rec_delete/list missing; run; %srt(&out_t1,&out_t1,phn); data &out_t2; set &out_t1; if rec_delete=1 then delete; run; %mend; * Begin ODS OUTPUT File for summary information --; ods rtf file="G:\IDENTIFIED\AHW\EXTRACTION_1994_2015\DATA\DERIVED_DATA\AHW_DIABETES\OUTPUT\DIAB_cohortsummary_2015.rtf" bodytitle; *********************************************; * 1) Identify Diabetes from Physician Claims ; *********************************************; * 1a) Identify diabetes codes using physician claims with no XBE --; * This code identifies any ICD-9 code beginning with '250 from * all 3 diagnosis fields from the phsyician claims database. It * creates a variable 'CODE' to identify the code in the earliest * diagnosis field --; *note: no start_date variable for 2015 claim data; data claims_noxbe_94_15; set ahw.claims_noxbe_94_15; if missing(start_date) then start_date=end_date; run; data claim_1 (keep= phn start_date diabclaim diabclaim_yr code); LENGTH CODE $7; set claims_noxbe_94_15; CODE=' '; ARRAY dx (*) $ HLTH_DX_ICD9X_CODE_1 HLTH_DX_ICD9X_CODE_2 HLTH_DX_ICD9X_CODE_3; do i=1 to DIM(dx); if substr(dx(i), 1, 3)='250' then do; diabclaim = 1; if code=' ' then CODE=dx(i); end; end; if diabclaim^= 1 then delete; diabclaim_yr=year(start_date); run; * 1b) Dates for claims must be on separate dates --; * First, sort by PHN and START_DATE. Take first of START_DATE --; %srt(claim_1,claim_1,phn start_date); %first(claim_1,claim_2,phn start_date,start_date); * 1c) Delete claims after death date/out-migration and claims before dob --; %demogcheck(in_t=claim_2,out_t1=claim_3,out_t2=claim_4,sourcedate=start_date) title1 'DIABETES CLAIMS PER YEAR'; proc freq data=claim_3; tables diabclaim_yr/list missing; run; title1; * 1d) Delete patients with a single claim (i.e. Criteria requires at least 2 physician claims --; data claim_5; set claim_4; by phn; if first.phn and last.phn then delete; run; * 1e) Identify two diabetes claims in 2 years --; data claim_6; set claim_5; by phn start_date; days_diff=dif(start_date); if first.phn then do; days_diff=.; end; run; data claim_7 (drop=count); set claim_6; count+1; by phn; if first.phn then count=1; ind=count; run; data second_visit; set claim_7; if days_diff^=. & days_diff<=(365.25*2); run; %srt(second_visit,second_visit,phn start_date); %first(second_visit,second_visit2,phn,phn); data first_visit; set second_visit2 (keep=phn ind); ind2=ind-1; drop ind; rename ind2=ind; run; data first_visit2; merge claim_7 (in=in1) first_visit (in=in2); by phn ind; if in1 and in2; run; * 1f) Dataset with both visits where case definition is met --; data diab_claim (keep=phn start_date yr_diab source code age); set first_visit2 second_visit2; yr_diab=year(start_date); SOURCE='CLAIM'; run; * 1g) Take the date of the first visit for onset of diabetes --; * Store all of the diabetes cases identified from the physician claims in the DIAB folder --; %srt(diab_claim,diab_claim,phn start_date); %first(diab_claim,diab.pt_diab_claim,phn,phn); title1 'YEAR WHERE FIRST OF TWO CLAIMS IN 2 YEARS CASE DEFINITION MET'; proc freq data=diab.pt_diab_claim; tables yr_diab /list missing; run; title1; *************************************************; * 2) Identify Diabetes from Hospitalization files ; *************************************************; * 2a) Create datasets with hospitalizations containing relevant ICD9 codes --; * MACRO to extract codes for ICD9 dataset --; * The ICD9 codes from the hospitalizations database are available from 3 datasets covering * the fiscal years 1994-1997, 1997-1999, and 1999-2002. There are 16 fields for ICD9 * diagnosis codes. This code identifies any ICD-9 code beginning with '250' from * all 16 diagnosis fields from the hospitalizations database. It creates a variable 'CODE' * to identify the code in the earliest diagnosis field. This code also identifies cases * of gestational diabetes --; %macro hospicd9(inds,outds,outdiab,outobst); data &outds; set &inds; ARRAY dx (*) $ SEPI_DX_MR_ICD9CM SEPI_DX_OTH_ICD9CM_1-SEPI_DX_OTH_ICD9CM_15; ARRAY hos_ICD9_ (*) $ hosp_icd9_1-hosp_icd9_16; DO I=1 TO DIM(dx); hos_ICD9_(I)=dx(I); END; drop i; keep phn start_date hosp_icd9_1-hosp_icd9_16; if start_date=. then delete; run; data &outdiab (keep=phn diab_hos diab_hosdt code) &outobst (keep=phn obst_hos obst_hosdt); LENGTH CODE $7; set &outds; ARRAY hos_ICD9_ (*) $ hosp_icd9_1-hosp_icd9_16; do i=1 to DIM(hos_ICD9_); * Diabetes hospitalizations --; if substr(hos_icd9_(i), 1, 3)='250' then do; diab_hos = 1; if code=' ' then code=hos_ICD9_(i); diab_hosdt = start_date; end; * Gestational diabetes --; if substr(hos_icd9_(i), 1, 2) in ('65','66') or substr(hos_icd9_(i), 1, 3) in ('641','642','643','644','645','646','647','648','670','671','672','673','674','675','676','V27') then do; obst_hos = 1; obst_hosdt = start_date; end; end; format diab_hosdt yymmdd10. obst_hosdt yymmdd10.; if diab_hos=1 then output &outdiab; * Diabetes hospitalizations --; if obst_hos=1 then output &outobst; * Gestational diabetes --; run; %mend; %hospicd9(ahw.hosp_94_97,hosp_94_97,hosp_94_97_diab,hosp_94_97_obst); %hospicd9(ahw.hosp_97_99,hosp_97_99,hosp_97_99_diab,hosp_97_99_obst); %hospicd9(ahw.hosp_99_02,hosp_99_02,hosp_99_02_diab,hosp_99_02_obst); * 2b) Create dataset with hospitalizations containing relevant ICD10 codes --; * ICD10 codes used since April 2002. This code identifies any ICD-10 code from * all 25 diagnosis fields from the hospitalizations database. It creates a variable 'CODE' * to identify the code in the earliest diagnosis field. This code also identifies cases * of gestational diabetes --; data hospicd10; set ahw.hosp_02_15; ARRAY dx (*) $ DX1-DX25; ARRAY hos_ICD10_ (*) $ hosp_icd10_1-hosp_icd10_25; DO I=1 TO DIM(dx); hos_ICD10_(I)=dx(I); END; drop i; keep phn start_date hosp_ICD10_1-hosp_ICD10_25; run; data hospicd10_diab (keep=phn diab_hos diab_hosdt code) hospicd10_obst (keep=phn obst_hos obst_hosdt); LENGTH CODE $7; set hospicd10; ARRAY hos_ICD10_ (*) $ hosp_ICD10_1-hosp_ICD10_25; do i=1 to DIM(hos_ICD10_); * Diabetes hospitalizations --; if substr(hos_ICD10_(i), 1, 3) in ('E10', 'E11', 'E12','E13','E14') then do; diab_hos = 1; if code=' ' then code=hos_ICD10_(i); diab_hosdt = start_date; end; * Gestational diabetes --; if substr(hos_ICD10_(i), 1, 2) in ('O1','O3','O4','O5','O6','O7','O8') or substr(hos_ICD10_(i), 1, 3) in ('O21','O22','O23','O24','O25','O26','O27','O28','O29','O90','O91','O92','O95','O98','O99','Z37') then do; obst_hos = 1; obst_hosdt = start_date; end; end; if diab_hos=1 then output hospicd10_diab; * Diabetes hospitalizations --; if obst_hos=1 then output hospicd10_obst; * Gestational diabetes --; run; * 2c) Combine all 4 hospitalizations for diabetes data sources and sort by phn/start date --; data hosp_diab; set hosp_94_97_diab hosp_97_99_diab hosp_99_02_diab hospicd10_diab; diabhosp_yr=year(diab_hosdt); run; %srt(hosp_diab,hosp_diab,phn diab_hosdt); %first(hosp_diab,hosp_diab2,phn diab_hosdt, diab_hosdt); * 2d) Delete hospitalizations after death date/out-migration date or before DOB from diabetes dataset --; %demogcheck(in_t=hosp_diab2,out_t1=hosp_diab3,out_t2=hosp_diab4,sourcedate=diab_hosdt) title1 'DIABETES HOSPITALIZATIONS PER YEAR'; proc freq data=hosp_diab4; tables diabhosp_yr /list missing; run; title1; * 2d) Combine all 4 gestational diabetes data sources --; data hosp_obst; set hosp_94_97_obst hosp_97_99_obst hosp_99_02_obst hospicd10_obst; run; %srt(hosp_obst,hosp_obst,phn obst_hosdt); %first(hosp_obst,hosp_obst2,phn obst_hosdt, obst_hosdt); * 2d) Delete hospitalizations after death date/out-migration date or before DOB from gestational and transpose by phn--; %demogcheck(in_t=hosp_obst2,out_t1=hosp_obst3,out_t2=hosp_obst4,sourcedate=obst_hosdt) proc transpose data = hosp_obst4 out = widehosp_obst prefix = obst_hosdt; var obst_hosdt; by phn; run; %srt(widehosp_obst,widehosp_obst,phn); * 2e) Merge hospitalization for diabetes records with gestational subjects --; data hosp_diab5; merge hosp_diab4 (in=in1) widehosp_obst (in=in2); by phn; if in1; run; %first(hosp_diab5,hosp_diab5subj,phn,phn); title1 '# SUBJECTS BEFORE EXCLUSION OF GESTATIONAL EVENTS'; proc means data=hosp_diab5subj n; var diab_hos; run; title1; *2f) Exclude diabetes events if within 120 days before or 180 days after a gestational hospital record --; data diab_hosp (keep=phn start_date source diab_hos yr_diab age code); set hosp_diab5; array obstdt(*) obst_hosdt1-obst_hosdt40; array diff(*) difftest1-difftest40; do dates = 1 to 40; diff[dates]=obstdt[dates]-diab_hosdt; if obstdt[dates]<=diab_hosdt then do; if -180<=diff[dates]<=0 then flag=1; end; if obstdt[dates]>=diab_hosdt then do; if 0<=diff[dates]<=120 then flag=1; end; end; if flag=1 then delete; SOURCE='HOSP'; yr_diab=year(diab_hosdt); rename diab_hosdt=start_date; run; %srt(diab_hosp,diab_hosp,phn start_date); %first(diab_hosp,diab_hospsubj,phn,phn); title1 '# SUBJECTS AFTER EXCLUSION OF GESTATIONAL EVENTS'; proc means data=diab_hospsubj n; var diab_hos; run; title1; * 2g) Identify first hospitalization where case defintion is met --; * Store all of the diabetes cases identified from the physician claims in the DIAB folder --; %first(diab_hosp,diab.pt_diab_hosp,phn,phn); title1 'YEAR WHERE FIRST HOSPITILZATION CASE DEFINITION MET'; proc freq data=diab.pt_diab_hosp; tables yr_diab /list missing; run; title1; * 3) Combine diabetes (by case definition) from physician claims and hospitalizations. Take the first record per patient to identify the onset date of diabetes. Rename variables. Identify prevalent and incident cases. Identify adults. --; data diab; set diab.pt_diab_claim diab.pt_diab_hosp; run; %srt(diab,diab,phn start_date); data diab2 (drop=diab_hos phn rename=(phn2=PHN)); length phn2 $9; set diab; by phn start_date; if first.phn; DIAB=1; if start_date<'01APR1997'd then CASE='PREVALENT'; if start_date>='01APR1997'd or age <3 then CASE='INCIDENT'; if age>=18 then ADULT=1; else ADULT=0; *label start_date='Date of First Diabetes (by defn)'; rename start_date=DIAB_DATE; label start_date=' '; rename source=DATA_SOURCE; phn2=phn; *label phn2='Personal Health Number'; run; * 4) Save DIABETES cohort --; data diab.DIAB_94_15_COHORT(drop=code) DIAB_94_15_COHORT; retain PHN DIAB DIAB_DATE DATA_SOURCE; set diab2(drop=yr_diab adult age); run; **********************************************************************; * DESCRIPTIVE STATISTICS FOR THE DIABETES COHORT, DIAB_94_09_COHORT --; **********************************************************************; * Add Fiscal Year --; %fy(in_t=diab_94_15_cohort,out_t=test,date=diab_date) title1 "PROC CONTENTS: DIAB_94_15_COHORT"; proc contents data=diab.diab_94_15_cohort varnum; run; title1; title1 "Frequency of Subjects who are Adults (>=18 yrs) at time of First Diabetes (by Defn)"; proc freq data=diab2; table ADULT; run; title1; title1 "Frequency of Prevalent and Incident Diabetes (by Defn)"; proc freq data=test; table Case/missing; run; title1; title1 "Frequency of Prevalent and Incident Diabetes (by Defn) by Fiscal Year"; proc freq data=test; table Case*fy/list missing; run; title1; title1 "Frequency: First Diabetes (by Defn) by Fiscal Year"; proc freq data=test; table fy/missing out=testa; run; title1; title1 "Frequency: Prevalent Diabetes (by Defn) by Fiscal Year"; proc freq data=test(where=(case='PREVALENT')); table fy*case/ list missing; run; title1; title1 "Frequency: Incident Diabetes (by Defn) by Fiscal Year"; proc freq data=test(where=(case='INCIDENT')); table fy*case/list missing; run; title1; title1 "Frequency: First Diabetes (by Defn)- Data Source (CLAIM, HOSP)"; proc freq data=test; table data_source/missing; run; title1; title1 "Frequency: First Diabetes (by Defn)- Data Source (CLAIM, HOSP) by Fiscal Year"; proc freq data=test; table fy*data_source/missing; run; title1; title1 "Frequency: First Diabetes (by Defn)- ICD Code"; proc freq data=test; table code/missing; run; title1; title1 "Frequency: First Diabetes (by Defn)- ICD Code by Fiscal Year"; proc freq data=test; table code*fy/missing; run; title1; * Create Graph- First Case Diabetes (by Defn) by Fiscal Year --; title1 "Graph: First Case Diabetes (by Defn) by Fiscal Year"; proc gplot data=testa; plot count*fy; symbol i=join v=circle c=black; label count="First Diabetes (by Defn)"; label fy="Fiscal Year of First Diabetes (by Defn)"; run; quit; ods rtf close; *******************; * END OF PROGRAM --; *******************;