Skip to content

DT1 row selection broken after DT2 = DT1[!b%in%x], setkey(DT2,a) #5230

@webbp

Description

@webbp

This issue is difficult to describe. Installing and using the development data.table build does not fix the bug.

# [Minimal reproducible example] Here are steps to reproduce. Requires DT1.tsv.gz

R --vanilla

library(data.table)
DT1 = fread('DT1.tsv')
DT2 = DT1[!b%in%c('qm27','qm29')] # to reproduce the bug, there must be no occurrences of these in column b
# instead doing DT2 = copy(DT1[!b%in%c('qm27','qm29')]) fixes the bug
indices(DT1) # "b"; caused by previous row selection and assignment
nrow(DT1[b=='qm105']) # 133705 (correct)
# adding setindex(DT1,NULL) here fixed bug
# adding setindex(DT1,NULL); setindex(DT1,b) has no effect; bug still occurs
setkey(DT2,a)
nrow(DT1[b=='qm105']) # 1 (incorrect)

# Output of sessionInfo()

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.14.3

loaded via a namespace (and not attached):
[1] bit_4.0.4      compiler_4.1.0 bit64_4.0.5

Also reproduced with different machine, OS, R, and data.table versions:

R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin20.3.0 (64-bit)
Running under: macOS Big Sur 11.5.1

Matrix products: default
BLAS/LAPACK: /opt/local/lib/libopenblas-r1.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.14.0

loaded via a namespace (and not attached):
[1] compiler_4.0.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions