How to read SPSS files with duplicated levels in some variables?

While trying to read a SPSS file with the read.spss function you may find an error similar to:

Error in `levels<-`(`*tmp*`, value = if (nl == nL)
as.character(labels) else paste0(labels, :
factor level [67] is duplicated

This error happens when a categorical variable has repeated labels. For instance, in round 5 of Afrobarometer data, the variable corresponding to names of country regions have a different level (integer number) for each region, but some labels (the names of regions) are repeated because many countries have regions named “East”, “West”, “South”, “North”, etc. To avoid the error, I call read.spss with the argument use.value.labels set to FALSE and, later, I convert the SPSS categorical variables into R factors with a custom function, Int2Factor:

Int2Factor <- function(x)
    if(!is.null(attr(x, "value.labels"))){
        vlab <- attr(x, "value.labels")
        if(sum(duplicated(vlab)) > 0)
            cat("Duplicated levels:", vlab, "\n")
        else if(sum(duplicated(names(vlab))) > 0)
            cat("Duplicated labels:",
                names(vlab)[duplicated(names(vlab))], "\n")
            x <- factor(x, levels = as.numeric(vlab),
                        labels = names(vlab))

a <- read.spss("merged_r5_data_0.sav", use.value.labels = FALSE)
a <- lapply(a, Int2Factor)
a <-, stringsAsFactors = FALSE)

I only noted this error in R 3.4.0 and it might be fixed soon. Meanwhile, the above code might be useful to you.


5 thoughts on “How to read SPSS files with duplicated levels in some variables?

  1. Thanks, saved me a lot of confusion. I don’t think this “error in 3.4.0” is going to be fixed however. I read a thread stating duplicated factor levels had been deprecated since 2009, and finally they’ve changed it from a warning to an error. Since I am receiving SPSS sav files from others (not being an SPSS user), I don’t think I can go into the SPSS sav file and “correct” this duplication (nor can I tell *where* the problem is, as in your example, what variable corresponds to “factor level [67]”? ). So your function was very helpful…and will be in the future.

    Liked by 1 person

  2. Another big vote of thanks. I can see why the R core team made this change but receiving SPSS files from others that have duplicated factor levels in them, it was going to be a real pain sorting this out for myself!

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s