How to read SPSS files with duplicated levels in some variables?

While trying to read a SPSS file with the read.spss function you may find an error similar to:

Error in `levels<-`(`*tmp*`, value = if (nl == nL)
as.character(labels) else paste0(labels, :
factor level [67] is duplicated

This error happens when a categorical variable has repeated labels. For instance, in round 5 of Afrobarometer data, the variable corresponding to names of country regions have a different level (integer number) for each region, but some labels (the names of regions) are repeated because many countries have regions named “East”, “West”, “South”, “North”, etc. To avoid the error, I call read.spss with the argument use.value.labels set to FALSE and, later, I convert the SPSS categorical variables into R factors with a custom function, Int2Factor:

Int2Factor <- function(x)
{
    if(!is.null(attr(x, "value.labels"))){
        vlab <- attr(x, "value.labels")
        if(sum(duplicated(vlab)) > 0)
            cat("Duplicated levels:", vlab, "\n")
        else if(sum(duplicated(names(vlab))) > 0)
            cat("Duplicated labels:",
                names(vlab)[duplicated(names(vlab))], "\n")
        else
            x <- factor(x, levels = as.numeric(vlab),
                        labels = names(vlab))
    }
    x
}

a <- read.spss("merged_r5_data_0.sav", use.value.labels = FALSE)
a <- lapply(a, Int2Factor)
a <- as.data.frame(a, stringsAsFactors = FALSE)

I only noted this error in R 3.4.0 and it might be fixed soon. Meanwhile, the above code might be useful to you.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s