diff --git a/NEWS.md b/NEWS.md index e3820812ee..58e9d6fbee 100644 --- a/NEWS.md +++ b/NEWS.md @@ -95,7 +95,7 @@ 15. `DT[order(col)[1:5], ...]` (i.e. where `i` is a compound expression involving `order()`) is now optimized to use `data.table`'s multithreaded `forder`, [#1921](https://github.com/Rdatatable/data.table/issues/1921). This example is not a fully optimal top-N query since the full ordering is still computed. The improvement is that the call to `order()` is computed faster for any `i` expression using `order`. -16. `as.data.table` now unpacks columns in a `data.frame` which are themselves a `data.frame`. This need arises when parsing JSON, a corollary in [#3369](https://github.com/Rdatatable/data.table/issues/3369#issuecomment-462662752). `data.table` does not allow columns to be objects which themselves have columns (such as `matrix` and `data.frame`), unlike `data.frame` which does. Bug fix 19 in v1.12.2 (see below) added a helpful error (rather than segfault) to detect such invalid `data.table`, and promised that `as.data.table()` would unpack these columns in the next release (i.e. this release) so that the invalid `data.table` is not created in the first place. +16. `as.data.table` now unpacks columns in a `data.frame` which are themselves a `data.frame` or `matrix`. This need arises when parsing JSON, a corollary in [#3369](https://github.com/Rdatatable/data.table/issues/3369#issuecomment-462662752). Bug fix 19 in v1.12.2 (see below) added a helpful error (rather than segfault) to detect such invalid `data.table`, and promised that `as.data.table()` would unpack these columns in the next release (i.e. this release) so that the invalid `data.table` is not created in the first place. Further, `setDT` now warns if it observes such columns and suggests using `as.data.table` instead, [#3760](https://github.com/Rdatatable/data.table/issues/3760). 17. `CJ` has been ported to C and parallelized, thanks to a PR by Michael Chirico, [#3596](https://github.com/Rdatatable/data.table/pull/3596). All types benefit, but, as in many `data.table` operations, factors benefit more than character. @@ -272,17 +272,15 @@ 32. `x[, round(.SD, 1)]` and similar operations on the whole of `.SD` could return a locked result, incorrectly preventing `:=` on the result, [#2245](https://github.com/Rdatatable/data.table/issues/2245). Thanks @grayskripko for raising. -33. `setDT` now detects and halts if passed a `data.frame` containing matrix columns, [#3760](https://github.com/Rdatatable/data.table/issues/3760). Such objects cannot be converted to `data.table` by reference. The message suggests using `as.data.table` instead which will convert each matrix column to a new `data.table` column. +33. Using `get`/`mget` in `j` could cause `.SDcols` to be ignored or reordered, [#1744](https://github.com/Rdatatable/data.table/issues/1744), [#1965](https://github.com/Rdatatable/data.table/issues/1965), and [#2036](https://github.com/Rdatatable/data.table/issues/2036). Thanks @franknarf1, @MichaelChirico, and @TonyBonen, for the reports. -34. Using `get`/`mget` in `j` could cause `.SDcols` to be ignored or reordered, [#1744](https://github.com/Rdatatable/data.table/issues/1744), [#1965](https://github.com/Rdatatable/data.table/issues/1965), and [#2036](https://github.com/Rdatatable/data.table/issues/2036). Thanks @franknarf1, @MichaelChirico, and @TonyBonen, for the reports. +34. `DT[, i-1L, with=FALSE]` would misinterpret the minus sign and return an incorrect result, [#2019](https://github.com/Rdatatable/data.table/issues/2109). Thanks @cguill95 for the report. -35. `DT[, i-1L, with=FALSE]` would misinterpret the minus sign and return an incorrect result, [#2019](https://github.com/Rdatatable/data.table/issues/2109). Thanks @cguill95 for the report. +35. `DT[id==1, DT2[.SD, on="id"]]` (i.e. joining from `.SD` in `j`) could incorrectly fail in some cases due to `.SD` being locked, [#1926](https://github.com/Rdatatable/data.table/issues/1926), and when updating-on-join with factors [#3559](https://github.com/Rdatatable/data.table/issues/3559) [#2099](https://github.com/Rdatatable/data.table/issues/2099). Thanks @franknarf1 and @Henrik-P for the reports and for diligently tracking use cases for almost 3 years! -36. `DT[id==1, DT2[.SD, on="id"]]` (i.e. joining from `.SD` in `j`) could incorrectly fail in some cases due to `.SD` being locked, [#1926](https://github.com/Rdatatable/data.table/issues/1926), and when updating-on-join with factors [#3559](https://github.com/Rdatatable/data.table/issues/3559) [#2099](https://github.com/Rdatatable/data.table/issues/2099). Thanks @franknarf1 and @Henrik-P for the reports and for diligently tracking use cases for almost 3 years! +36. `as.IDate.POSIXct` returned `NA` for UTC times before Dec 1901 and after Jan 2038, [#3780](https://github.com/Rdatatable/data.table/issues/3780). Thanks @gschett for the report. -37. `as.IDate.POSIXct` returned `NA` for UTC times before Dec 1901 and after Jan 2038, [#3780](https://github.com/Rdatatable/data.table/issues/3780). Thanks @gschett for the report. - -38. `rbindlist` now returns correct idcols for lists with different length vectors, [#3785](https://github.com/Rdatatable/data.table/issues/3785), [#3786](https://github.com/Rdatatable/data.table/pull/3786). Thanks to @shrektan for the report and fix. +37. `rbindlist` now returns correct idcols for lists with different length vectors, [#3785](https://github.com/Rdatatable/data.table/issues/3785), [#3786](https://github.com/Rdatatable/data.table/pull/3786). Thanks to @shrektan for the report and fix. #### NOTES diff --git a/R/as.data.table.R b/R/as.data.table.R index d9481dbebf..ba5785d9f5 100644 --- a/R/as.data.table.R +++ b/R/as.data.table.R @@ -213,7 +213,7 @@ as.data.table.data.frame = function(x, keep.rownames=FALSE, key=NULL, ...) { setnames(ans, 'rn', keep.rownames[1L]) return(ans) } - if (any(!sapply(x,is.atomic))) { + if (any(vapply_1i(x, function(xi) length(dim(xi))))) { # not is.atomic because is.atomic(matrix) is true # a data.frame with a column that is data.frame needs to be expanded; test 2013.4 return(as.data.table.list(x, keep.rownames=keep.rownames, ...)) } @@ -234,6 +234,9 @@ as.data.table.data.frame = function(x, keep.rownames=FALSE, key=NULL, ...) { as.data.table.data.table = function(x, ...) { # as.data.table always returns a copy, automatically takes care of #473 + if (any(vapply_1i(x, function(xi) length(dim(xi))))) { # for test 2089.2 + return(as.data.table.list(x, ...)) + } x = copy(x) # #1681 # fix for #1078 and #1128, see .resetclass() for explanation. setattr(x, 'class', .resetclass(x, "data.table")) diff --git a/R/data.table.R b/R/data.table.R index 3f97544372..5549563acc 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -2629,10 +2629,11 @@ setDT = function(x, keep.rownames=FALSE, key=NULL, check.names=FALSE) { stop("Cannot convert '", cname, "' to data.table by reference because binding is locked. It is very likely that '", cname, "' resides within a package (or an environment) that is locked to prevent modifying its variable bindings. Try copying the object to your current environment, ex: var <- copy(var) and then using setDT again.") } } - # check no matrix-like columns, #3760 - colndim = vapply_1i(x, function(xi) length(dim(xi))) - if (any(idx <- colndim > 1L)) { - stop("Some columns are a multi-column type (such as a matrix column): ", brackify(which(idx)),". These cannot be converted to data.table by reference. Please use as.data.table() instead which will create a new column for each embedded column.") + # check no matrix-like columns, #3760. Other than a single list(matrix) is unambiguous and depended on by some revdeps, #3581 + if (length(x)>1L) { + idx = vapply_1i(x, function(xi) length(dim(xi)))>1L + if (any(idx)) + warning("Some columns are a multi-column type (such as a matrix column): ", brackify(which(idx)),". setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column.") } if (is.data.table(x)) { # fix for #1078 and #1128, see .resetclass() for explanation. @@ -2652,6 +2653,9 @@ setDT = function(x, keep.rownames=FALSE, key=NULL, check.names=FALSE) { x[, (nm[1L]) := rn] setcolorder(x, nm) } + } else if (is.list(x) && length(x)==1L && is.matrix(x[[1L]])) { + # a single list(matrix) is unambiguous and depended on by some revdeps, #3581 + x = as.data.table.matrix(x[[1L]]) } else if (is.null(x) || (is.list(x) && !length(x))) { x = null.data.table() } else if (is.list(x)) { diff --git a/R/print.data.table.R b/R/print.data.table.R index 6f822644fe..ac608133f2 100644 --- a/R/print.data.table.R +++ b/R/print.data.table.R @@ -141,8 +141,8 @@ format.data.table = function (x, ..., justify="none", timezone = FALSE) { x[idx] = paste0(substr(x[idx], 1L, as.integer(trunc.char)), "...") x } - do.call("cbind",lapply(x,function(col,...){ - if (!is.null(dim(col))) stop("Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.") + do.call("cbind",lapply(x,function(col,...) { + if (!is.null(dim(col))) return("") if(timezone) col = format.timezone(col) if (is.list(col)) col = vapply_1c(col, format.item) else col = format(char.trunc(col), justify=justify, ...) # added an else here to fix #5435 diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 4a15a5b765..45e94a3bc7 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -2949,12 +2949,14 @@ tab2 <- as.data.table(as.data.frame(table(DF$x, DF$y), stringsAsFactors=FALSE)); tab3 <- as.data.table(as.data.frame(table(DF$x, DF$y, DF$z), stringsAsFactors=FALSE)); setattr(tab3, 'names', c("V1", "V2", "V3", "N")) test(1025, as.data.table(table(DF$x)), tab1) test(1026, as.data.table(table(DF$x, DF$y)), tab2) -test(1027, as.data.table(table(DF$x, DF$y, DF$z)), tab3) +test(1027.1, as.data.table(table(DF$x, DF$y, DF$z)), tab3) # catch printing of data.table(table()), #4847 (as.data.table should be used instead) # new, updated 14th Feb, 2015. data.table(table) now redirects to as.data.table -test(1027.1, data.table(table(1:99)), as.data.table(table(1:99))) -# data.table() and rbindlist() in v1.8.11 now catch and removes the dim attribute. For it on to test print catches it : -test(1027.2, {DT<-data.table(table(1:99));setattr(DT[[1]],"dim",99L);print(DT,nrows=100)}, error="Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.") +test(1027.2, data.table(table(1:99)), as.data.table(table(1:99))) +# data.table() and rbindlist() in v1.8.11 caught and removed the dim attribute. +# from 1.12.4 the dim attribute is preserved and the print method outputs "multi-column" for such columns (better than error) +test(1027.3, {DT<-data.table(table(1:99));setattr(DT[[1]],"dim",99L);print(DT,nrows=100)}, + output="") # as.data.table.x where x is integer, numeric, etc... set.seed(45) @@ -15836,10 +15838,24 @@ test(2087.3, setkey(x, a, verbose=TRUE), data.table(a=paste0(1:2), b=paste0(1:2) x = data.table(a=c(0.85, -0.38, 1.19), b=c(0.56, 0.63, -1.30)) test(2088, x[, round(.SD, 1)][, c:=8.88], data.table(a=c(.8, -.4, 1.2), b=c(.6,.6,-1.3), c=8.88)) -# setDT should halt when it sees matrix columns, #3760 +# setDT should warn when it sees matrix columns, #3760 DF = data.frame(a=1:5) DF$m = matrix(6:15, ncol=2L) -test(2089, setDT(DF), error='Some columns are a multi-column type.*Please use as.data.table') +test(2089.1, names(setDT(DF)), c("a","m"), + warning="Some columns are a multi-column type.*[[]2[]].*setDT will retain these columns as-is but.*Please consider as.data.table") +test(2089.2, as.data.table(DF), data.table(a=1:5, m.V1=6:10, m.V2=11:15)) # the DF here is now a data.table, so as.data.table.data.table() is matrix-column-aware +test(2089.3, print(DF), output="") +DF = data.frame(a=1:5) +DF$m = matrix(6:15, ncol=2L, dimnames=list(NULL, c("foo","bar"))) +test(2089.4, colnames(DF), c("a","m")) +test(2089.5, colnames(DF$m), c("foo","bar")) +test(2089.6, as.data.table(DF), data.table(a=1:5, m.foo=6:10, m.bar=11:15)) + +# a single matrix item is unambiguous and different to a matrix embedded alongside other items +M = list(matrix(1:6,ncol=1L)) +test(2089.7, setDT(M), data.table(V1=1:6)) # worked in 1.12.2 and miceFast relies on, #3581 +M = list(matrix(1:6,ncol=2L)) +test(2089.8, setDT(M), data.table(V1=1:3, V2=4:6)) # returned a single column in v1.12.2, now from 1.12.4 returns 2 columns as expected # get/mget in j should restore all columns to .SD, #1744 and #1965 DT = data.table(a=1, b=2, c=3) diff --git a/revdep.R b/revdep.R index d8e132844b..c76a430774 100644 --- a/revdep.R +++ b/revdep.R @@ -193,8 +193,8 @@ log = function(bioc=FALSE, fnam="~/fail.log") { x = c(.fail.cran, if (bioc) .fail.bioc) cat("Writing 00check.log for",length(x),"packages to",fnam,":\n") cat(paste(x,collapse=" "), "\n") + require(BiocManager) # to ensure Bioc version is included in attached packages sessionInfo. It includes the minor version this way; e.g. 1.30.4 cat(capture.output(sessionInfo()), "\n", file=fnam, sep="\n") - cat(capture.output(BiocManager::install(), type="message"), "\n", file=fnam, sep="\n", append=TRUE) for (i in x) { system(paste0("ls | grep '",i,".*tar.gz' >> ",fnam)) system(paste0("grep -H . ./",i,".Rcheck/00check.log >> ",fnam)) diff --git a/src/assign.c b/src/assign.c index acd44ce61d..a169931853 100644 --- a/src/assign.c +++ b/src/assign.c @@ -243,11 +243,13 @@ SEXP alloccolwrapper(SEXP dt, SEXP overAllocArg, SEXP verbose) { SEXP ans = PROTECT(alloccol(dt, length(dt)+overAlloc, LOGICAL(verbose)[0])); for(R_len_t i = 0; i < LENGTH(ans); i++) { - // clear the same excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890). - + // clear names; also excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890). setAttrib(VECTOR_ELT(ans, i), R_NamesSymbol, R_NilValue); - setAttrib(VECTOR_ELT(ans, i), R_DimSymbol, R_NilValue); - setAttrib(VECTOR_ELT(ans, i), R_DimNamesSymbol, R_NilValue); + + // But don't clear dim and dimnames. Because as from 1.12.4 we keep the matrix column as-is and ask user to use as.data.table to + // unpack matrix columns when they really need to; test 2089.2 + // setAttrib(VECTOR_ELT(ans, i), R_DimSymbol, R_NilValue); + // setAttrib(VECTOR_ELT(ans, i), R_DimNamesSymbol, R_NilValue); } UNPROTECT(1);