Calculating Kappa with SAS

Calculating Kappa with SAS^®

General Tips

Example 1 - Raw data

Example 2 - Frequency data

Other Resources

Tips

Basic Commands. Use PROC FREQ with the /agree option in the tables statement. For example:
```
   proc freq data = ratings ;
      tables Rater1 * Rater2 /agree;
   run;
```
Statistical significance. To get p-values for kappa and weighted kappa, use the statement:
```
   test kappa wtkap ;
```
Important! Ordered-category data. SAS calculates weighted kappa weights based on unformatted values. If your ratings are numbers, like 1, 2 and 3, this works fine.
But if your ratings are character variables, like Lo, Med, and Hi, SAS will assign numerical weights based on alphabetical order, like:
If the alphabetical order is different than the true order of the categories, weighted kappa will be incorrectly calculated. To avoid this, either (1) recode the character values to numbers that reflect the true ordering of categories, or (2) use a format and specify the order=formatted option for Proc freq (see Example 2).
Nonsquare tables. SAS only calculates kappa for square tables--ones where both raters use the same categories. If one rater doesn't use all the categories, but the other rater does, kappa will not be calculated.
This is fixed by adding pseudo-observations, which supply the unused category(ies), but which are given a very small weight. This makes SAS process the table as square and calculate kappa. See Example 1 and Example 2 below.
Nominal ratings. For nominal (unordered categorical) ratings, disregard the value that SAS reports for weighted kappa (the unweighted kappa value, however is correct). As described above, SAS calculates weights based on an alphabetical ordering of categories, which has no meaning for nominal data.
The SAS documenation is excellent. See: Proc Freq, Tests and Measures of Agreement.

Top of page
Back to Kappa Coefficient page
Back to Agreement Statistics main page

Example 1

This example shows how:

To input raw rating data;
To use pseudo-observations to force square tables so that SAS will calculate kappa statistics
To calculate kappa, weighted kappa, their confidence ranges and standard errors, and their statistical significance

Note: this is just an example. The N is too small to produce a realistic standard error estimate, confidence range, or p-value for kappa and weighted kappa.

The code for the example is as follows:

/***** Example 1:  Calculate Kappa from Raw Data *****/

* input ratings by three raters ;

data raw ;
   infile datalines ;
   input rater1 rater2 rater3;
datalines;
1 2 1
1 2 1
1 2 1
1 2 2
1 3 2
2 2 2
2 2 1
2 2 2
2 2 2
2 2 2
2 2 2
2 2 1
2 2 2
3 3 2
3 2 2
3 3 2
3 3 1
3 3 2
3 3 2
;
run;

*------------------------------------------------------------*;
* The above would produce non-square tables because Rater 2  *;
* doesn't use category 1 and Rater 3 doesn't use category 3.  *;
* The next 3 data steps fix this.                            *;
*------------------------------------------------------------*;

* step 1:  give all current observations a weight of 1 ;

data raw ;
   set raw ;
   wgt = 1 ;
run;

* step 2:  make pseudo-records ;

data pseudo ;
   infile datalines ;
   wgt = .0000000001;
   input rater1 rater2 rater3 ;
   datalines;
   1 1 1
   2 2 2
   3 3 3
   ;
 run;

* step 3:  concatenate the original data and pseudo-observations ;

data both ;
   set raw pseudo ;
run;

* calculate kappa and weighted kappa between all pairs of raters ;

title "Example 1:  Raw Data";
proc freq data = both ;
   weight wgt ;
   tables rater1 * (rater2 rater3) / norow nocol agree ;
   tables rater2 * rater3          / norow nocol agree ;

*  include significance tests ;

   test kappa wtkap ;
run;

The following is part of the output produced by the code above:


                    Example 1:  Raw Data

                 Table of rater1 by rater2

        rater1     rater2

        Frequency|
        Percent  |       1|       2|       3|  Total
        ---------+--------+--------+--------+
               1 |  1E-10 |      4 |      1 |      5
                 |   0.00 |  21.05 |   5.26 |  26.32
        ---------+--------+--------+--------+
               2 |      0 |      8 |      0 |      8
                 |   0.00 |  42.11 |   0.00 |  42.11
        ---------+--------+--------+--------+
               3 |      0 |      1 |      5 |      6
                 |   0.00 |   5.26 |  26.32 |  31.58
        ---------+--------+--------+--------+
        Total       1E-10       13        6       19
                     0.00    68.42    31.58   100.00

                  Simple Kappa Coefficient
              --------------------------------
              Kappa                     0.4842
              ASE                       0.1380
              95% Lower Conf Limit      0.2137
              95% Upper Conf Limit      0.7547

                  Test of H0: Kappa = 0

              ASE under H0              0.1484
              Z                         3.2626
              One-sided Pr >  Z         0.0006
              Two-sided Pr > |Z|        0.0011

                   Weighted Kappa Coefficient
              --------------------------------
              Weighted Kappa            0.4701
              ASE                       0.1457
              95% Lower Conf Limit      0.1845
              95% Upper Conf Limit      0.7558

               Test of H0: Weighted Kappa = 0

              ASE under H0              0.1426
              Z                         3.2971
              One-sided Pr >  Z         0.0005
              Two-sided Pr > |Z|        0.0010

Top of page
Back to Kappa Coefficient page
Back to Agreement Statistics main page

Example 2

This example shows how:

To input rating data in the form of a crossclassification table;
To use pseudo-frequencies to force a square table;
To create and apply category formats so that SAS calculates weighted kappa correctly.

The SAS code to input the data and make pseudo-frequencies is as follows:

/***** Example 2:  Calculate Kappa from Frequency Data *****/

* input crossclassification frequencies (including 0 frequencies) ;

data rate ;
   length rater1 $3 rater2 $3 ;
   infile datalines ;
   input rater1 rater2 f ;
   datalines;
   Lo   Lo   0
   Lo   Med  0
   Lo   Hi   0
   Med  Lo   5
   Med  Med  16
   Med  Hi   3
   Hi   Lo   8
   Hi   Med  12
   Hi   Hi   28
 ;
run;

*----------------------------------------------*;
* If all frequencies of any row or any column  *;
* of the crossclassification table are 0, SAS  *;
* will not calculate kappa.  In this case, add *;
* the next data step.                          *;
*----------------------------------------------*;

* change the 0 frequencies to a negligible non-zero value ;

data rate ;
   set rate ;
   if f = 0 then f = .0000000001 ;
run;

For comparison, we first see what SAS reports if we don't apply category formats:


*  see what happens by default ;

title "Example 2a:  Frequency Input" ;
title2 "Default: Rows/Columns Ordered by Category Values";
title3 "Correct Kappa but Incorrect Weighted Kappa!";

proc freq data = rate ;
   weight f;
   tables rater1*rater2 / agree norow nocol;
run;

Here is the output produced by the commands above:


   Example 2a:  Frequency Input
   Default: Rows/Columns Ordered by Category Values
   Correct Kappa but Incorrect Weighted Kappa!

        rater1     rater2

        Frequency|
        Percent  |Hi      |Lo      |Med     |  Total
        ---------+--------+--------+--------+
        Hi       |     28 |      8 |     12 |     48
        ---------+--------+--------+--------+
        Lo       |  1E-10 |  1E-10 |  1E-10 |  3E-10
        ---------+--------+--------+--------+
        Med      |      3 |      5 |     16 |     24
        ---------+--------+--------+--------+
        Total          31       13       28       72

                         Kappa Statistics

   Statistic          Value       ASE     95% Confidence Limits
   ------------------------------------------------------------
   Simple Kappa      0.3333    0.0814       0.1738       0.4929
   Weighted Kappa    0.3944    0.0917       0.2146       0.5741

Now let's do things the right way. First we create a format that assigns our categories to numbers. Then we refer to the format in proc freq:

* define category intervals using a format ;

 proc format ;
   value $rate 'Lo'  = 1
               'Med' = 2
               'Hi'  = 3 ;
 run;

* calculate kappa and unweighted kappa using formatted values;

title "Example 2a:  Frequency Input" ;
title2 "Order Rows/Columns by Formatted Values" ;
proc freq data = rate order=formatted ;
   format rater1 rater2 $rate. ;
   weight f;
   tables rater1*rater2 / agree norow nocol;
run;

Here is the output produced by the above. Note that the value of kappa is the same, but the value of weighted kappa is now correct:


   Example 2b:  Frequency Input
   Order Rows/Columns by Formatted Values

   Table of rater1 by rater2

        rater1     rater2

        Frequency|
        Percent  |1       |2       |3       |  Total
        ---------+--------+--------+--------+
        1        |  1E-10 |  1E-10 |  1E-10 |  3E-10
        ---------+--------+--------+--------+
        2        |      5 |     16 |      3 |     24
        ---------+--------+--------+--------+
        3        |      8 |     12 |     28 |     48
        ---------+--------+--------+--------+
        Total          13       28       31       72

                        Kappa Statistics

   Statistic          Value       ASE     95% Confidence Limits
   ------------------------------------------------------------
   Simple Kappa      0.3333    0.0814       0.1738       0.4929
   Weighted Kappa    0.2895    0.0756       0.1414       0.4376

Top of page
Back to Kappa Coefficient page
Back to Agreement Statistics main page

Other Resources

The magree.sas macro calculates multi-rater kappa.
Another macro, by Liu and Hays, handles nonsquare or irregular tables and permits user-supplied weights for kappa between two raters. Their macro is described here .
Another way to fix the nonsquare table bug is described in this short paper.

Top of page
Back to Kappa Coefficient page
Back to Agreement Statistics main page

Last revised: 20 July 2002