Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@

15. `DT[order(col)[1:5], ...]` (i.e. where `i` is a compound expression involving `order()`) is now optimized to use `data.table`'s multithreaded `forder`, [#1921](https://github.com/Rdatatable/data.table/issues/1921). This example is not a fully optimal top-N query since the full ordering is still computed. The improvement is that the call to `order()` is computed faster for any `i` expression using `order`.

16. `as.data.table` now unpacks columns in a `data.frame` which are themselves a `data.frame`. This need arises when parsing JSON, a corollary in [#3369](https://github.com/Rdatatable/data.table/issues/3369#issuecomment-462662752). `data.table` does not allow columns to be objects which themselves have columns (such as `matrix` and `data.frame`), unlike `data.frame` which does. Bug fix 19 in v1.12.2 (see below) added a helpful error (rather than segfault) to detect such invalid `data.table`, and promised that `as.data.table()` would unpack these columns in the next release (i.e. this release) so that the invalid `data.table` is not created in the first place.
16. `as.data.table` now unpacks columns in a `data.frame` which are themselves a `data.frame` or `matrix`. This need arises when parsing JSON, a corollary in [#3369](https://github.com/Rdatatable/data.table/issues/3369#issuecomment-462662752). Bug fix 19 in v1.12.2 (see below) added a helpful error (rather than segfault) to detect such invalid `data.table`, and promised that `as.data.table()` would unpack these columns in the next release (i.e. this release) so that the invalid `data.table` is not created in the first place. Further, `setDT` now warns if it observes such columns and suggests using `as.data.table` instead, [#3760](https://github.com/Rdatatable/data.table/issues/3760).

17. `CJ` has been ported to C and parallelized, thanks to a PR by Michael Chirico, [#3596](https://github.com/Rdatatable/data.table/pull/3596). All types benefit, but, as in many `data.table` operations, factors benefit more than character.

Expand Down Expand Up @@ -272,17 +272,15 @@

32. `x[, round(.SD, 1)]` and similar operations on the whole of `.SD` could return a locked result, incorrectly preventing `:=` on the result, [#2245](https://github.com/Rdatatable/data.table/issues/2245). Thanks @grayskripko for raising.

33. `setDT` now detects and halts if passed a `data.frame` containing matrix columns, [#3760](https://github.com/Rdatatable/data.table/issues/3760). Such objects cannot be converted to `data.table` by reference. The message suggests using `as.data.table` instead which will convert each matrix column to a new `data.table` column.
33. Using `get`/`mget` in `j` could cause `.SDcols` to be ignored or reordered, [#1744](https://github.com/Rdatatable/data.table/issues/1744), [#1965](https://github.com/Rdatatable/data.table/issues/1965), and [#2036](https://github.com/Rdatatable/data.table/issues/2036). Thanks @franknarf1, @MichaelChirico, and @TonyBonen, for the reports.

34. Using `get`/`mget` in `j` could cause `.SDcols` to be ignored or reordered, [#1744](https://github.com/Rdatatable/data.table/issues/1744), [#1965](https://github.com/Rdatatable/data.table/issues/1965), and [#2036](https://github.com/Rdatatable/data.table/issues/2036). Thanks @franknarf1, @MichaelChirico, and @TonyBonen, for the reports.
34. `DT[, i-1L, with=FALSE]` would misinterpret the minus sign and return an incorrect result, [#2019](https://github.com/Rdatatable/data.table/issues/2109). Thanks @cguill95 for the report.

35. `DT[, i-1L, with=FALSE]` would misinterpret the minus sign and return an incorrect result, [#2019](https://github.com/Rdatatable/data.table/issues/2109). Thanks @cguill95 for the report.
35. `DT[id==1, DT2[.SD, on="id"]]` (i.e. joining from `.SD` in `j`) could incorrectly fail in some cases due to `.SD` being locked, [#1926](https://github.com/Rdatatable/data.table/issues/1926), and when updating-on-join with factors [#3559](https://github.com/Rdatatable/data.table/issues/3559) [#2099](https://github.com/Rdatatable/data.table/issues/2099). Thanks @franknarf1 and @Henrik-P for the reports and for diligently tracking use cases for almost 3 years!

36. `DT[id==1, DT2[.SD, on="id"]]` (i.e. joining from `.SD` in `j`) could incorrectly fail in some cases due to `.SD` being locked, [#1926](https://github.com/Rdatatable/data.table/issues/1926), and when updating-on-join with factors [#3559](https://github.com/Rdatatable/data.table/issues/3559) [#2099](https://github.com/Rdatatable/data.table/issues/2099). Thanks @franknarf1 and @Henrik-P for the reports and for diligently tracking use cases for almost 3 years!
36. `as.IDate.POSIXct` returned `NA` for UTC times before Dec 1901 and after Jan 2038, [#3780](https://github.com/Rdatatable/data.table/issues/3780). Thanks @gschett for the report.

37. `as.IDate.POSIXct` returned `NA` for UTC times before Dec 1901 and after Jan 2038, [#3780](https://github.com/Rdatatable/data.table/issues/3780). Thanks @gschett for the report.

38. `rbindlist` now returns correct idcols for lists with different length vectors, [#3785](https://github.com/Rdatatable/data.table/issues/3785), [#3786](https://github.com/Rdatatable/data.table/pull/3786). Thanks to @shrektan for the report and fix.
37. `rbindlist` now returns correct idcols for lists with different length vectors, [#3785](https://github.com/Rdatatable/data.table/issues/3785), [#3786](https://github.com/Rdatatable/data.table/pull/3786). Thanks to @shrektan for the report and fix.

#### NOTES

Expand Down
5 changes: 4 additions & 1 deletion R/as.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ as.data.table.data.frame = function(x, keep.rownames=FALSE, key=NULL, ...) {
setnames(ans, 'rn', keep.rownames[1L])
return(ans)
}
if (any(!sapply(x,is.atomic))) {
if (any(vapply_1i(x, function(xi) length(dim(xi))))) { # not is.atomic because is.atomic(matrix) is true
# a data.frame with a column that is data.frame needs to be expanded; test 2013.4
return(as.data.table.list(x, keep.rownames=keep.rownames, ...))
}
Expand All @@ -234,6 +234,9 @@ as.data.table.data.frame = function(x, keep.rownames=FALSE, key=NULL, ...) {

as.data.table.data.table = function(x, ...) {
# as.data.table always returns a copy, automatically takes care of #473
if (any(vapply_1i(x, function(xi) length(dim(xi))))) { # for test 2089.2
return(as.data.table.list(x, ...))
}
x = copy(x) # #1681
# fix for #1078 and #1128, see .resetclass() for explanation.
setattr(x, 'class', .resetclass(x, "data.table"))
Expand Down
12 changes: 8 additions & 4 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2629,10 +2629,11 @@ setDT = function(x, keep.rownames=FALSE, key=NULL, check.names=FALSE) {
stop("Cannot convert '", cname, "' to data.table by reference because binding is locked. It is very likely that '", cname, "' resides within a package (or an environment) that is locked to prevent modifying its variable bindings. Try copying the object to your current environment, ex: var <- copy(var) and then using setDT again.")
}
}
# check no matrix-like columns, #3760
colndim = vapply_1i(x, function(xi) length(dim(xi)))
if (any(idx <- colndim > 1L)) {
stop("Some columns are a multi-column type (such as a matrix column): ", brackify(which(idx)),". These cannot be converted to data.table by reference. Please use as.data.table() instead which will create a new column for each embedded column.")
# check no matrix-like columns, #3760. Other than a single list(matrix) is unambiguous and depended on by some revdeps, #3581
if (length(x)>1L) {
idx = vapply_1i(x, function(xi) length(dim(xi)))>1L
if (any(idx))
warning("Some columns are a multi-column type (such as a matrix column): ", brackify(which(idx)),". setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me it sounds better to raise error, and ask revdep maintainer for update to as.data.table

Copy link
Copy Markdown
Member Author

@mattdowle mattdowle Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but revdeps are just a proxy for the (larger) outside-CRAN usage. The goal is to minimize required-communication with revdep maintainers so that we communicate changes via the messages and warnings wherever possible. In this case, there is no change to behavior but we went from nothing straight to error. The warning is a softer first step. That's my current thinking anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also was a bit hesitant about warning initially, but actually you can get this error from data.table() and as.data.table() since both call as.data.table.list which works by running setDT at the end.

in the case of data.table(1D-array) it would require overhauling how as.data.table.array works or building a new branch in as.data.table.list for the vector-embedded-in-higher-dimensions case

(which maybe we could/should do but perhaps file for another day?)

}
if (is.data.table(x)) {
# fix for #1078 and #1128, see .resetclass() for explanation.
Expand All @@ -2652,6 +2653,9 @@ setDT = function(x, keep.rownames=FALSE, key=NULL, check.names=FALSE) {
x[, (nm[1L]) := rn]
setcolorder(x, nm)
}
} else if (is.list(x) && length(x)==1L && is.matrix(x[[1L]])) {
# a single list(matrix) is unambiguous and depended on by some revdeps, #3581
x = as.data.table.matrix(x[[1L]])
} else if (is.null(x) || (is.list(x) && !length(x))) {
x = null.data.table()
} else if (is.list(x)) {
Expand Down
4 changes: 2 additions & 2 deletions R/print.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,8 @@ format.data.table = function (x, ..., justify="none", timezone = FALSE) {
x[idx] = paste0(substr(x[idx], 1L, as.integer(trunc.char)), "...")
x
}
do.call("cbind",lapply(x,function(col,...){
if (!is.null(dim(col))) stop("Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.")
do.call("cbind",lapply(x,function(col,...) {
if (!is.null(dim(col))) return("<multi-column>")
if(timezone) col = format.timezone(col)
if (is.list(col)) col = vapply_1c(col, format.item)
else col = format(char.trunc(col), justify=justify, ...) # added an else here to fix #5435
Expand Down
28 changes: 22 additions & 6 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -2949,12 +2949,14 @@ tab2 <- as.data.table(as.data.frame(table(DF$x, DF$y), stringsAsFactors=FALSE));
tab3 <- as.data.table(as.data.frame(table(DF$x, DF$y, DF$z), stringsAsFactors=FALSE)); setattr(tab3, 'names', c("V1", "V2", "V3", "N"))
test(1025, as.data.table(table(DF$x)), tab1)
test(1026, as.data.table(table(DF$x, DF$y)), tab2)
test(1027, as.data.table(table(DF$x, DF$y, DF$z)), tab3)
test(1027.1, as.data.table(table(DF$x, DF$y, DF$z)), tab3)
# catch printing of data.table(table()), #4847 (as.data.table should be used instead)
# new, updated 14th Feb, 2015. data.table(table) now redirects to as.data.table
test(1027.1, data.table(table(1:99)), as.data.table(table(1:99)))
# data.table() and rbindlist() in v1.8.11 now catch and removes the dim attribute. For it on to test print catches it :
test(1027.2, {DT<-data.table(table(1:99));setattr(DT[[1]],"dim",99L);print(DT,nrows=100)}, error="Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.")
test(1027.2, data.table(table(1:99)), as.data.table(table(1:99)))
# data.table() and rbindlist() in v1.8.11 caught and removed the dim attribute.
# from 1.12.4 the dim attribute is preserved and the print method outputs "multi-column" for such columns (better than error)
test(1027.3, {DT<-data.table(table(1:99));setattr(DT[[1]],"dim",99L);print(DT,nrows=100)},
output="<multi-column>")

# as.data.table.x where x is integer, numeric, etc...
set.seed(45)
Expand Down Expand Up @@ -15836,10 +15838,24 @@ test(2087.3, setkey(x, a, verbose=TRUE), data.table(a=paste0(1:2), b=paste0(1:2)
x = data.table(a=c(0.85, -0.38, 1.19), b=c(0.56, 0.63, -1.30))
test(2088, x[, round(.SD, 1)][, c:=8.88], data.table(a=c(.8, -.4, 1.2), b=c(.6,.6,-1.3), c=8.88))

# setDT should halt when it sees matrix columns, #3760
# setDT should warn when it sees matrix columns, #3760
DF = data.frame(a=1:5)
DF$m = matrix(6:15, ncol=2L)
test(2089, setDT(DF), error='Some columns are a multi-column type.*Please use as.data.table')
test(2089.1, names(setDT(DF)), c("a","m"),
warning="Some columns are a multi-column type.*[[]2[]].*setDT will retain these columns as-is but.*Please consider as.data.table")
test(2089.2, as.data.table(DF), data.table(a=1:5, m.V1=6:10, m.V2=11:15)) # the DF here is now a data.table, so as.data.table.data.table() is matrix-column-aware
test(2089.3, print(DF), output="<multi-column>")
DF = data.frame(a=1:5)
DF$m = matrix(6:15, ncol=2L, dimnames=list(NULL, c("foo","bar")))
test(2089.4, colnames(DF), c("a","m"))
test(2089.5, colnames(DF$m), c("foo","bar"))
test(2089.6, as.data.table(DF), data.table(a=1:5, m.foo=6:10, m.bar=11:15))

# a single matrix item is unambiguous and different to a matrix embedded alongside other items
M = list(matrix(1:6,ncol=1L))
test(2089.7, setDT(M), data.table(V1=1:6)) # worked in 1.12.2 and miceFast relies on, #3581
M = list(matrix(1:6,ncol=2L))
test(2089.8, setDT(M), data.table(V1=1:3, V2=4:6)) # returned a single column in v1.12.2, now from 1.12.4 returns 2 columns as expected

# get/mget in j should restore all columns to .SD, #1744 and #1965
DT = data.table(a=1, b=2, c=3)
Expand Down
2 changes: 1 addition & 1 deletion revdep.R
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,8 @@ log = function(bioc=FALSE, fnam="~/fail.log") {
x = c(.fail.cran, if (bioc) .fail.bioc)
cat("Writing 00check.log for",length(x),"packages to",fnam,":\n")
cat(paste(x,collapse=" "), "\n")
require(BiocManager) # to ensure Bioc version is included in attached packages sessionInfo. It includes the minor version this way; e.g. 1.30.4
cat(capture.output(sessionInfo()), "\n", file=fnam, sep="\n")
cat(capture.output(BiocManager::install(), type="message"), "\n", file=fnam, sep="\n", append=TRUE)
for (i in x) {
system(paste0("ls | grep '",i,".*tar.gz' >> ",fnam))
system(paste0("grep -H . ./",i,".Rcheck/00check.log >> ",fnam))
Expand Down
10 changes: 6 additions & 4 deletions src/assign.c
Original file line number Diff line number Diff line change
Expand Up @@ -243,11 +243,13 @@ SEXP alloccolwrapper(SEXP dt, SEXP overAllocArg, SEXP verbose) {
SEXP ans = PROTECT(alloccol(dt, length(dt)+overAlloc, LOGICAL(verbose)[0]));

for(R_len_t i = 0; i < LENGTH(ans); i++) {
// clear the same excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890).

// clear names; also excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890).
setAttrib(VECTOR_ELT(ans, i), R_NamesSymbol, R_NilValue);
setAttrib(VECTOR_ELT(ans, i), R_DimSymbol, R_NilValue);
setAttrib(VECTOR_ELT(ans, i), R_DimNamesSymbol, R_NilValue);

// But don't clear dim and dimnames. Because as from 1.12.4 we keep the matrix column as-is and ask user to use as.data.table to
// unpack matrix columns when they really need to; test 2089.2
// setAttrib(VECTOR_ELT(ans, i), R_DimSymbol, R_NilValue);
// setAttrib(VECTOR_ELT(ans, i), R_DimNamesSymbol, R_NilValue);
}

UNPROTECT(1);
Expand Down