## Calculating Kappa with SAS®

### Tips

• Basic Commands. Use PROC FREQ with the /agree option in the tables statement. For example:
```   proc freq data = ratings ;
tables Rater1 * Rater2 /agree;
run;
```
• Statistical significance. To get p-values for kappa and weighted kappa, use the statement:

```   test kappa wtkap ;
```
• Important! Ordered-category data. SAS calculates weighted kappa weights based on unformatted values. If your ratings are numbers, like 1, 2 and 3, this works fine.

But if your ratings are character variables, like Lo, Med, and Hi, SAS will assign numerical weights based on alphabetical order, like:

Hi = 1
Lo = 2
Med = 3

If the alphabetical order is different than the true order of the categories, weighted kappa will be incorrectly calculated. To avoid this, either (1) recode the character values to numbers that reflect the true ordering of categories, or (2) use a format and specify the order=formatted option for Proc freq (see Example 2).

• Nonsquare tables. SAS only calculates kappa for square tables--ones where both raters use the same categories. If one rater doesn't use all the categories, but the other rater does, kappa will not be calculated.

This is fixed by adding pseudo-observations, which supply the unused category(ies), but which are given a very small weight. This makes SAS process the table as square and calculate kappa. See Example 1 and Example 2 below.

• Nominal ratings. For nominal (unordered categorical) ratings, disregard the value that SAS reports for weighted kappa (the unweighted kappa value, however is correct). As described above, SAS calculates weights based on an alphabetical ordering of categories, which has no meaning for nominal data.

• The SAS documenation is excellent. See: Proc Freq, Tests and Measures of Agreement.

### Example 1

This example shows how:

• To input raw rating data;

• To use pseudo-observations to force square tables so that SAS will calculate kappa statistics

• To calculate kappa, weighted kappa, their confidence ranges and standard errors, and their statistical significance

Note: this is just an example. The N is too small to produce a realistic standard error estimate, confidence range, or p-value for kappa and weighted kappa.

The code for the example is as follows:

```/***** Example 1:  Calculate Kappa from Raw Data *****/

* input ratings by three raters ;

data raw ;
infile datalines ;
input rater1 rater2 rater3;
datalines;
1 2 1
1 2 1
1 2 1
1 2 2
1 3 2
2 2 2
2 2 1
2 2 2
2 2 2
2 2 2
2 2 2
2 2 1
2 2 2
3 3 2
3 2 2
3 3 2
3 3 1
3 3 2
3 3 2
;
run;

*------------------------------------------------------------*;
* The above would produce non-square tables because Rater 2  *;
* doesn't use category 1 and Rater 3 doesn't use category 3.  *;
* The next 3 data steps fix this.                            *;
*------------------------------------------------------------*;

* step 1:  give all current observations a weight of 1 ;

data raw ;
set raw ;
wgt = 1 ;
run;

* step 2:  make pseudo-records ;

data pseudo ;
infile datalines ;
wgt = .0000000001;
input rater1 rater2 rater3 ;
datalines;
1 1 1
2 2 2
3 3 3
;
run;

* step 3:  concatenate the original data and pseudo-observations ;

data both ;
set raw pseudo ;
run;

* calculate kappa and weighted kappa between all pairs of raters ;

title "Example 1:  Raw Data";
proc freq data = both ;
weight wgt ;
tables rater1 * (rater2 rater3) / norow nocol agree ;
tables rater2 * rater3          / norow nocol agree ;

*  include significance tests ;

test kappa wtkap ;
run;
```

The following is part of the output produced by the code above:

```
Example 1:  Raw Data

Table of rater1 by rater2

rater1     rater2

Frequency|
Percent  |       1|       2|       3|  Total
---------+--------+--------+--------+
1 |  1E-10 |      4 |      1 |      5
|   0.00 |  21.05 |   5.26 |  26.32
---------+--------+--------+--------+
2 |      0 |      8 |      0 |      8
|   0.00 |  42.11 |   0.00 |  42.11
---------+--------+--------+--------+
3 |      0 |      1 |      5 |      6
|   0.00 |   5.26 |  26.32 |  31.58
---------+--------+--------+--------+
Total       1E-10       13        6       19
0.00    68.42    31.58   100.00

Simple Kappa Coefficient
--------------------------------
Kappa                     0.4842
ASE                       0.1380
95% Lower Conf Limit      0.2137
95% Upper Conf Limit      0.7547

Test of H0: Kappa = 0

ASE under H0              0.1484
Z                         3.2626
One-sided Pr >  Z         0.0006
Two-sided Pr > |Z|        0.0011

Weighted Kappa Coefficient
--------------------------------
Weighted Kappa            0.4701
ASE                       0.1457
95% Lower Conf Limit      0.1845
95% Upper Conf Limit      0.7558

Test of H0: Weighted Kappa = 0

ASE under H0              0.1426
Z                         3.2971
One-sided Pr >  Z         0.0005
Two-sided Pr > |Z|        0.0010
```

### Example 2

This example shows how:

• To input rating data in the form of a crossclassification table;

• To use pseudo-frequencies to force a square table;

• To create and apply category formats so that SAS calculates weighted kappa correctly.

The SAS code to input the data and make pseudo-frequencies is as follows:

```/***** Example 2:  Calculate Kappa from Frequency Data *****/

* input crossclassification frequencies (including 0 frequencies) ;

data rate ;
length rater1 \$3 rater2 \$3 ;
infile datalines ;
input rater1 rater2 f ;
datalines;
Lo   Lo   0
Lo   Med  0
Lo   Hi   0
Med  Lo   5
Med  Med  16
Med  Hi   3
Hi   Lo   8
Hi   Med  12
Hi   Hi   28
;
run;

*----------------------------------------------*;
* If all frequencies of any row or any column  *;
* of the crossclassification table are 0, SAS  *;
* will not calculate kappa.  In this case, add *;
* the next data step.                          *;
*----------------------------------------------*;

* change the 0 frequencies to a negligible non-zero value ;

data rate ;
set rate ;
if f = 0 then f = .0000000001 ;
run;

```
For comparison, we first see what SAS reports if we don't apply category formats:
```
*  see what happens by default ;

title "Example 2a:  Frequency Input" ;
title2 "Default: Rows/Columns Ordered by Category Values";
title3 "Correct Kappa but Incorrect Weighted Kappa!";

proc freq data = rate ;
weight f;
tables rater1*rater2 / agree norow nocol;
run;

```

Here is the output produced by the commands above:

```
Example 2a:  Frequency Input
Default: Rows/Columns Ordered by Category Values
Correct Kappa but Incorrect Weighted Kappa!

rater1     rater2

Frequency|
Percent  |Hi      |Lo      |Med     |  Total
---------+--------+--------+--------+
Hi       |     28 |      8 |     12 |     48
---------+--------+--------+--------+
Lo       |  1E-10 |  1E-10 |  1E-10 |  3E-10
---------+--------+--------+--------+
Med      |      3 |      5 |     16 |     24
---------+--------+--------+--------+
Total          31       13       28       72

Kappa Statistics

Statistic          Value       ASE     95% Confidence Limits
------------------------------------------------------------
Simple Kappa      0.3333    0.0814       0.1738       0.4929
Weighted Kappa    0.3944    0.0917       0.2146       0.5741
```

Now let's do things the right way. First we create a format that assigns our categories to numbers. Then we refer to the format in proc freq:

```* define category intervals using a format ;

proc format ;
value \$rate 'Lo'  = 1
'Med' = 2
'Hi'  = 3 ;
run;

* calculate kappa and unweighted kappa using formatted values;

title "Example 2a:  Frequency Input" ;
title2 "Order Rows/Columns by Formatted Values" ;
proc freq data = rate order=formatted ;
format rater1 rater2 \$rate. ;
weight f;
tables rater1*rater2 / agree norow nocol;
run;

```
Here is the output produced by the above. Note that the value of kappa is the same, but the value of weighted kappa is now correct:
```
Example 2b:  Frequency Input
Order Rows/Columns by Formatted Values

Table of rater1 by rater2

rater1     rater2

Frequency|
Percent  |1       |2       |3       |  Total
---------+--------+--------+--------+
1        |  1E-10 |  1E-10 |  1E-10 |  3E-10
---------+--------+--------+--------+
2        |      5 |     16 |      3 |     24
---------+--------+--------+--------+
3        |      8 |     12 |     28 |     48
---------+--------+--------+--------+
Total          13       28       31       72

Kappa Statistics

Statistic          Value       ASE     95% Confidence Limits
------------------------------------------------------------
Simple Kappa      0.3333    0.0814       0.1738       0.4929
Weighted Kappa    0.2895    0.0756       0.1414       0.4376
```

### Other Resources

Top of page
Back to Kappa Coefficient page
Back to Agreement Statistics main page

(c) 2000-2009 John Uebersax PhD    email

Last revised: 20 July 2002