How to read SPSS files with duplicated levels in some variables?

While trying to read a SPSS file with the read.spss function you may find an error similar to:

Error in `levels<-`(`*tmp*`, value = if (nl == nL)
as.character(labels) else paste0(labels, :
factor level [67] is duplicated

This error happens when a categorical variable has repeated labels. For instance, in round 5 of Afrobarometer data, the variable corresponding to names of country regions have a different level (integer number) for each region, but some labels (the names of regions) are repeated because many countries have regions named “East”, “West”, “South”, “North”, etc. To avoid the error, I call read.spss with the argument use.value.labels set to FALSE and, later, I convert the SPSS categorical variables into R factors with a custom function, Int2Factor:

Int2Factor <- function(x)
    if(!is.null(attr(x, "value.labels"))){
        vlab <- attr(x, "value.labels")
        if(sum(duplicated(vlab)) > 0)
            cat("Duplicated levels:", vlab, "\n")
        else if(sum(duplicated(names(vlab))) > 0)
            cat("Duplicated labels:",
                names(vlab)[duplicated(names(vlab))], "\n")
            x <- factor(x, levels = as.numeric(vlab),
                        labels = names(vlab))

a <- read.spss("merged_r5_data_0.sav", use.value.labels = FALSE)
a <- lapply(a, Int2Factor)
a <-, stringsAsFactors = FALSE)

I only noted this error in R 3.4.0 and it might be fixed soon. Meanwhile, the above code might be useful to you.


The effect of exposure to democratic institutions on tolerance: Brazil compared with other Latin American countries

I presented this paper on April 5, 2017, during the joint conference Citizens and the State: Public Opinion, Democracy, and Development in Brazil. The paper is a working in progress, and the section on Brazil still is in embryonic form. The R scripts to replicate the results are attached to the PDF of the full paper.