Changelog
Source:NEWS.md
vctrs 0.6.4
CRAN release: 2023-10-12
Fixed a performance issue with
vec_c()
and ALTREP vectors (in particular, the new ALTREP list vectors in R-devel) (#1884).Fixed an issue with complex vector tests related to changes in R-devel (#1883).
Added a class to the
vec_locate_matches()
error that is thrown when an overflow would otherwise occur (#1845).Fixed an issue with
vec_rank()
and 0-column data frames (#1863).
vctrs 0.6.3
CRAN release: 2023-06-14
Fixed an issue where certain ALTREP row names were being materialized when passed to
new_data_frame()
. We’ve fixed this by removing a safeguard innew_data_frame()
that performed a compatibility check when bothn
androw.names
were provided. Because this is a low level function designed for performance, it is up to the caller to ensure these inputs are compatible (tidyverse/dplyr#6596).Fixed an issue where
vec_set_*()
used with data frames could accidentally return an object with the type of the proxy rather than the type of the original inputs (#1837).Fixed a rare
vec_locate_matches()
bug that could occur when using a max/minfilter
(tidyverse/dplyr#6835).
vctrs 0.6.0
CRAN release: 2023-03-15
New
vec_run_sizes()
for computing the size of each run within a vector. It is identical to thetimes
column fromvec_unrep()
, but is faster if you don’t need the run key (#1210).New
sizes
argument tovec_chop()
which allows you to partition a vector using an integer vector describing the size of each expected slice. It is particularly useful in combination withvec_run_sizes()
andlist_sizes()
(#1210, #1598).New
obj_is_vector()
,obj_check_vector()
, andvec_check_size()
validation helpers. We believe these are a better approach to vector validation thanvec_assert()
andvec_is()
, which have been marked as questioning because the semantics of theirptype
arguments are hard to define and can often be replaced byvec_cast()
or a type predicate function likerlang::is_logical()
(#1784).vec_is_list()
andvec_check_list()
have been renamed toobj_is_list()
andobj_check_list()
, in line with the newobj_is_vector()
helper. The old functions have been silently deprecated, but an official deprecation process will start in the next vctrs release (#1803).vec_locate_matches()
gains a newrelationship
argument that holistically handles multiple matches betweenneedles
andhaystack
. In particular,relationship = "many-to-one"
replacesmultiple = "error"
andmultiple = "warning"
, which have been removed from the documentation and silently soft-deprecated. Official deprecation for those options will start in a future release (#1791).vec_locate_matches()
has changed its defaultneedles_arg
andhaystack_arg
values from""
to"needles"
and"haystack"
, respectively. This generally generates more informative error messages (#1792).vec_chop()
has gained empty...
betweenx
and the optionalindices
argument. For backwards compatibility, supplyingvec_chop(x, indices)
without namingindices
still silently works, but will be deprecated in a future release (#1813).vec_slice()
has gained anerror_call
argument (#1785).The
numeric_version
type from base R is now better supported in equality, comparison, and order based operations (tidyverse/dplyr#6680).R >=3.5.0 is now explicitly required. This is in line with the tidyverse policy of supporting the 5 most recent versions of R.
vctrs 0.5.2
CRAN release: 2023-01-23
New
vec_expand_grid()
, which is a lower level helper that is similar totidyr::expand_grid()
(#1325).New
vec_set_intersect()
,vec_set_difference()
,vec_set_union()
, andvec_set_symmetric_difference()
which compute set operations likeintersect()
,setdiff()
, andunion()
, but the vctrs variants don’t strip attributes and work with data frames (#1755, #1765).vec_identify_runs()
is now faster when used with data frames (#1684).The maximum load factor of the internal dictionary was reduced from 77% to 50%, which improves performance of functions like
vec_match()
,vec_set_intersect()
, andvec_unique()
in some cases (#1760).Fixed a bug with the internal
vec_order_radix()
function related to matrix columns (#1753).
vctrs 0.5.0
CRAN release: 2022-10-21
vctrs is now compliant with
-Wstrict-prototypes
as requested by CRAN (#1729).vec_ptype2()
now consistently falls back to bare data frame in case of incompatible data frame subclasses. This is part of a general move towards relaxed coercion rules.Common type and cast errors now inherit from
"vctrs_error_ptype2"
and"vctrs_error_cast"
respectively. They are still both subclasses from"vctrs_error_incompatible_type"
(which used to be their most specific class and is now a parent class).New
list_all_size()
andlist_check_all_size()
to quickly determine if a list contains elements of a particularsize
(#1582).list_unchop()
has gained empty...
to force optional arguments to be named (#1715).vec_rep_each(times = 0)
now works correctly with logical vectors that are considered unspecified and with named vectors (#1673).-
list_of()
was relaxed to make it easier to combine. It is now coercible withlist()
(#1161). When incompatiblelist_of()
types are combined, the result is now a barelist()
.Following this change, the role of
list_of()
is mainly to carry type information for potential optimisations, rather than to guarantee a certain type throughout an analysis. validate_list_of()
has been removed. It hasn’t proven to be practically useful, and isn’t used by any packages on CRAN (#1697).Directed calls to
vec_c()
, likevec_c(.ptype = <type>)
, now mention the position of the problematic argument when there are cast errors (#1690).list_unchop()
no longer drops names in some cases whenindices
were supplied (#1689).-
"unique_quiet"
and"universal_quiet"
are newly accepted byvec_as_names(repair =)
andvec_names2(repair =)
. These options exist to help users who call these functions indirectly, via another function which only exposesrepair
but notquiet
. Specifyingrepair = "unique_quiet"
is like specifyingrepair = "unique", quiet = TRUE
. When the"*_quiet"
options are used, any setting ofquiet
is silently overridden (@jennybc, #1629)."unique_quiet"
and"universal_quiet"
are also newly accepted for the name repair argument of several other functions that do not expose aquiet
argument:data_frame()
,df_list()
,vec_c()
,list_unchop()
,vec_interleave()
,vec_rbind()
, andvec_cbind()
(@jennybc, #1716). list_unchop()
has gainederror_call
anderror_arg
arguments (#1641, #1692).vec_c()
has gained.error_call
and.error_arg
arguments (#1641, #1692).Improved the performance of list-of common type methods (#1686, #875).
The list-of method for
as_list_of()
now places the optional.ptype
argument after the...
(#1686).vec_rbind()
now appliesbase::c()
fallback recursively within packed df-cols (#1331, #1462, #1640).vec_c()
,vec_unchop()
, andvec_rbind()
now proxy and restore recursively (#1107). This preventsvec_restore()
from being called with partially filled vectors and improves performance (#1217, #1496).New
vec_any_missing()
for quickly determining if a vector has any missing values (#1672).vec_equal_na()
has been renamed tovec_detect_missing()
to align better with vctrs naming conventions.vec_equal_na()
will stick around for a few minor versions, but has been formally soft-deprecated (#1672).vec_c(outer = c(inner = 1))
now produces correct error messages (#522).If a data frame is returned as the proxy from
vec_proxy_equal()
,vec_proxy_compare()
, orvec_proxy_order()
, then the corresponding proxy function is now automatically applied recursively along all of the columns. Additionally, packed data frame columns will be unpacked, and 1 column data frames will be unwrapped. This ensures that the simplest possible types are provided to the native C algorithms, improving both correctness and performance (#1664).When used with record vectors,
vec_proxy_compare()
andvec_proxy_order()
now call the correct proxy function while recursing over the fields (#1664).The experimental function
vec_list_cast()
has been removed from the package (#1382).Native classes like dates and datetimes now accept dimensions (#1290, #1329).
vec_compare()
now throws a more informative error when attempting to compare complex vectors (#1655).vec_rep()
and friends gainerror_call
,x_arg
, andtimes_arg
arguments so they can be embedded in frontends (#1303).Record vectors now fail as expected when indexed along dimensions greater than 1 (#1295).
vec_order()
andvec_sort()
now have...
between the required and optional arguments to make them easier to extend (#1647).S3 vignette was extended to show how to make the polynomial class atomic instead of a list (#1030).
The experimental
n
argument ofvec_restore()
has been removed. It was only used to inform on the size of data frames in case a bare list is restored. It is now expected that bare lists be initialised to data frame so that the size is carried through row attributes. This makes the generic simpler and fixes some performance issues (#650).The
anyNA()
method forvctrs_vctr
(and thusvctrs_list_of
) now supports therecursive
argument (#1278).vec_as_location()
andnum_as_location()
have gained amissing = "remove"
option (#1595).vec_as_location()
no longer matchesNA_character_
and""
indices if those invalid names appear innames
(#1489).vec_unchop()
has been renamed tolist_unchop()
to better indicate that it requires list input.vec_unchop()
will stick around for a few minor versions, but has been formally soft-deprecated (#1209).Lossy cast errors during scalar subscript validation now have the correct message (#1606).
Fixed confusing error message with logical
[[
subscripts (#1608).New
vec_rank()
to compute various types of sample ranks (#1600).num_as_location()
now throws the right error when there are out-of-bounds negative values andoob = "extend"
andnegative = "ignore"
are set (#1614, #1630).num_as_location()
now works correctly when a combination ofzero = "error"
andnegative = "invert"
are used (#1612).data_frame()
anddf_list()
have gained.error_call
arguments (#1610).vec_locate_matches()
has gained anerror_call
argument (#1611)."select"
and"relocate"
have been added as valid subscript actions to support tidyselect and dplyr (#1596).num_as_location()
has a newoob = "remove"
argument to remove out-of-bounds locations (#1595).vec_rbind()
andvec_cbind()
now have.error_call
arguments (#1597).df_list()
has gained a new.unpack
argument to optionally disable data frame unpacking (#1616).vec_check_list(arg = "")
now throws the correct error (#1604).The
difftime
todifftime
vec_cast()
method now standardizes the internal storage type to double, catching potentially corrupt integer storagedifftime
vectors (#1602).vec_as_location2()
andvec_as_subscript2()
more correctly utilize theircall
arguments (#1605).vec_count(sort = "count")
now uses a stable sorting method. This ensures that different keys with the same count are sorted in the order that they originally appeared in (#1588).Lossy cast error conditions now show the correct message when
conditionMessage()
is called on them (#1592).Fixed inconsistent reporting of conflicting inputs in
vec_ptype_common()
(#1570).vec_ptype_abbr()
andvec_ptype_full()
now suffix 1d arrays with[1d]
.vec_ptype_abbr()
andvec_ptype_full()
methods are no longer inherited (#1549).vec_cast()
now throws the correct error when attempting to cast a subclassed data frame to a non-data frame type (#1568).vec_locate_matches()
now uses a more conservative heuristic when taking the joint ordering proxy. This allows it to work correctly with sf’s sfc vectors and the classes from the bignum package (#1558).An sfc method for
vec_proxy_order()
was added to better support the sf package. These vectors are generally treated like list-columns even though they don’t explicitly have a"list"
class, and thevec_proxy_order()
method now forwards to the list method to reflect that (#1558).vec_proxy_compare()
now works correctly for raw vectors wrapped inI()
.vec_proxy_order()
now works correctly for raw and list vectors wrapped inI()
(#1557).
vctrs 0.4.1
CRAN release: 2022-04-13
OOB errors with
character()
indexes use “that don’t exist” instead of “past the end” (#1543).Fixed memory protection issues related to common type determination (#1551, tidyverse/tidyr#1348).
vctrs 0.4.0
CRAN release: 2022-03-30
New experimental
vec_locate_sorted_groups()
for returning the locations of groups in sorted order. This is equivalent to, but faster than, callingvec_group_loc()
and then sorting by thekey
column of the result.New experimental
vec_locate_matches()
for locating where each observation in one vector matches one or more observations in another vector. It is similar tovec_match()
, but returns all matches by default (rather than just the first), and can match on binary conditions other than equality. The algorithm is inspired by data.table’s very fast binary merge procedure.The
vec_proxy_equal()
,vec_proxy_compare()
, andvec_proxy_order()
methods forvctrs_rcrd
are now applied recursively over the fields (#1503).Lossy cast errors now inherit from incompatible type errors.
vec_is_list()
now returnsTRUE
forAsIs
lists (#1463).-
vec_assert()
,vec_ptype2()
,vec_cast()
, andvec_as_location()
now usecaller_arg()
to infer a defaultarg
value from the caller.This may result in unhelpful arguments being mentioned in error messages. In general, you should consider snapshotting vctrs error messages thrown in your package and supply
arg
andcall
arguments if the error context is not adequately reported to your users. vec_ptype_common()
,vec_cast_common()
,vec_size_common()
, andvec_recycle_common()
gaincall
andarg
arguments for specifying an error context.vec_compare()
can now compare zero column data frames (#1500).new_data_frame()
now errors on negative and missingn
values (#1477).vec_order()
now correctly orders zero column data frames (#1499).vctrs now depends on cli to help with error message generation.
New
vec_check_list()
andlist_check_all_vectors()
input checkers, and an accompanyinglist_all_vectors()
predicate.New
vec_interleave()
for combining multiple vectors together, interleaving their elements in the process (#1396).vec_equal_na(NULL)
now returnslogical(0)
rather than erroring (#1494).vec_as_location(missing = "error")
now fails withNA
andNA_character_
in addition toNA_integer_
(#1420, @krlmlr).-
Starting with rlang 1.0.0, errors are displayed with the contextual function call. Several vctrs operations gain a
call
argument that makes it possible to report the correct context in error messages. This concerns:-
vec_cast()
andvec_ptype2()
-
vec_default_cast()
andvec_default_ptype2()
vec_assert()
vec_as_names()
-
stop_
constructors likestop_incompatible_type()
Note that default
vec_cast()
andvec_ptype2()
methods automatically support this if they pass...
to the correspondingvec_default_
functions. If you throw a non-internal error from a non-default method, add acall = caller_env()
argument in the method and pass it torlang::abort()
. -
If
NA_character_
is specified as a name forvctrs_vctr
objects, it is now automatically repaired to""
(#780).""
is now an allowed name forvctrs_vctr
objects and all its subclasses (vctrs_list_of
in particular) (#780).list_of()
is now much faster when many values are provided.vec_as_location()
evaluatesarg
only in case of error, for performance (#1150, @krlmlr).levels.vctrs_vctr()
now returnsNULL
instead of failing (#1186, @krlmlr).vec_assert()
produces a more informative error whensize
is invalid (#1470).vec_duplicate_detect()
is a bit faster when there are many unique values.vec_proxy_order()
is described invignette("s3-vectors")
(#1373, @krlmlr).vec_chop()
now materializes ALTREP vectors before chopping, which is more efficient than creating many small ALTREP pieces (#1450).New
list_drop_empty()
for removing empty elements from a list (#1395).list_sizes()
now propagates the names of the list onto the result.Name repair messages are now signaled by
rlang::names_inform_repair()
. This means that the messages are now sent to stdout by default rather than to stderr, resulting in prettier messages. Additionally, name repair messages can now be silenced through the global optionrlib_name_repair_verbosity
, which is useful for testing purposes. See?names_inform_repair
for more information (#1429).vctrs_vctr
methods forna.omit()
,na.exclude()
, andna.fail()
have been added (#1413).vec_init()
is now slightly faster (#1423).vec_set_names()
no longer corruptsvctrs_rcrd
types (#1419).vec_detect_complete()
now computes completeness forvctrs_rcrd
types in the same way as data frames, which means that if any field is missing, the entire record is considered incomplete (#1386).The
na_value
argument ofvec_order()
andvec_sort()
now correctly respect missing values in lists (#1401).vec_rep()
andvec_rep_each()
are much faster fortimes = 0
andtimes = 1
(@mgirlich, #1392).vec_equal_na()
andvec_fill_missing()
now work with integer64 vectors (#1304).The
xtfrm()
method for vctrs_vctr objects no longer accidentally breaks ties (#1354).min()
,max()
andrange()
no longer throw an error ifna.rm = TRUE
is set and all values areNA
(@gorcha, #1357). In this case, and where an empty input is given, it will returnInf
/-Inf
, orNA
ifInf
can’t be cast to the input type.vec_group_loc()
, used for grouping in dplyr, now correctly handles vectors with billions of elements (up to.Machine$integer.max
) (#1133).
vctrs 0.3.7
CRAN release: 2021-03-29
vec_ptype_abbr()
gains arguments to control whether to indicate named vectors with a prefix (prefix_named
) and indicate shaped vectors with a suffix (suffix_shape
) (#781, @krlmlr).vec_ptype()
is now an optional performance generic. It is not necessary to implement, but if your class has a static prototype, you might consider implementing a customvec_ptype()
method that returns a constant to improve performance in some cases (such as common type imputation).New
vec_detect_complete()
, inspired bystats::complete.cases()
. For most vectors, this is identical to!vec_equal_na()
. For data frames and matrices, this detects rows that only contain non-missing values.vec_order()
can now order complex vectors (#1330).Removed dependency on digest in favor of
rlang::hash()
.Fixed an issue where
vctrs_rcrd
objects were not being proxied correctly when used as a data frame column (#1318).register_s3()
is now licensed with the “unlicense” which makes it very clear that it’s fine to copy and paste into your own package (@maxheld83, #1254).
vctrs 0.3.6
CRAN release: 2020-12-17
Fixed an issue with tibble 3.0.0 where removing column names with
names(x) <- NULL
is now deprecated (#1298).Fixed a GCC 11 issue revealed by CRAN checks.
vctrs 0.3.5
CRAN release: 2020-11-17
New experimental
vec_fill_missing()
for filling in missing values with the previous or following value. It is similar totidyr::fill()
, but also works with data frames and has an additionalmax_fill
argument to limit the number of sequential missing values to fill.New
vec_unrep()
to compress a vector with repeated values. It is very similar to run length encoding, and works nicely alongsidevec_rep_each()
as a way to invert the compression.vec_cbind()
with only empty data frames now preserves the common size of the inputs in the result (#1281).vec_c()
now correctly returns a named result with named empty inputs (#1263).vctrs has been relicensed as MIT (#1259).
Functions that make comparisons within a single vector, such as
vec_unique()
, or between two vectors, such asvec_match()
, now convert all character input to UTF-8 before making comparisons (#1246).New
vec_identify_runs()
which returns a vector of identifiers for the elements ofx
that indicate which run of repeated values they fall in (#1081).Fixed an encoding translation bug with lists containing data frames which have columns where
vec_size()
is different from the low levelRf_length()
(#1233).
vctrs 0.3.3
CRAN release: 2020-08-27
The
table
class is now implemented as a wrapper type that delegates its coercion methods. It used to be restricted to integer tables (#1190).Named one-dimensional arrays now behave consistently with simple vectors in
vec_names()
andvec_rbind()
.new_rcrd()
now usesdf_list()
to validate the fields. This makes it more flexible as the fields can now be of any type supported by vctrs, including data frames.Thanks to the previous change the
[[
method of records now preserves list fields (#1205).vec_data()
now preserves data frames. This is consistent with the notion that data frames are a primitive vector type in vctrs. This shouldn’t affect code that uses[[
andlength()
to manipulate the data. On the other hand, the vctrs primitives likevec_slice()
will now operate rowwise whenvec_data()
returns a data frame.outer
is now passed unrecycled to name specifications. Instead, the return value is recycled (#1099).-
Name specifications can now return
NULL
. The names vector will only be allocated if the spec function returns non-NULL
during the concatenation. This makes it possible to ignore outer names without having to create an empty names vector when there are no inner names:zap_outer_spec <- function(outer, inner) if (is_character(inner)) inner # `NULL` names rather than a vector of "" names(vec_c(a = 1:2, .name_spec = zap_outer_spec)) #> NULL # Names are allocated when inner names exist names(vec_c(a = 1:2, c(b = 3L), .name_spec = zap_outer_spec)) #> [1] "" "" "b"
Fixed several performance issues in
vec_c()
andvec_unchop()
with named vectors.The restriction that S3 lists must have a list-based proxy to be considered lists by
vec_is_list()
has been removed (#1208).New performant
data_frame()
constructor for creating data frames in a way that follows tidyverse semantics. Among other things, inputs are recycled using tidyverse recycling rules, strings are never converted to factors, list-columns are easier to create, and unnamed data frame input is automatically spliced.New
df_list()
for safely and consistently constructing the data structure underlying a data frame, a named list of equal-length vectors. It is useful in combination withnew_data_frame()
for creating user-friendly constructors for data frame subclasses that use the tidyverse rules for recycling and determining types.Fixed performance issue with
vec_order()
on classed vectors which affecteddplyr::group_by()
(tidyverse/dplyr#5423).vec_set_names()
no longer alters the input in-place (#1194).New
vec_proxy_order()
that provides an ordering proxy for use invec_order()
andvec_sort()
. The default method falls through tovec_proxy_compare()
. Lists are special cased, and return an integer vector proxy that orders by first appearance.List columns in data frames are no longer comparable through
vec_compare()
.The experimental
relax
argument has been removed fromvec_proxy_compare()
.
vctrs 0.3.2
CRAN release: 2020-07-15
Fixed a performance issue in
bind_rows()
with S3 columns (#1122, #1124, #1151, tidyverse/dplyr#5327).vec_slice()
now checks sizes of data frame columns in case the data structure is corrupt (#552).The native routines in vctrs now dispatch and evaluate in the vctrs namespace. This improves the continuity of evaluation in backtraces.
new_data_frame()
is now twice as fast whenclass
is supplied.New
vec_names2()
,vec_names()
andvec_set_names()
(#1173).
vctrs 0.3.1
CRAN release: 2020-06-05
vec_slice()
no longer restores attributes of foreign objects for which a[
method exist. This fixes an issue withts
objects which were previously incorrectly restored.The
as.list()
method forvctrs_rcrd
objects has been removed in favor of directly using the method forvctrs_vctr
, which callsvec_chop()
.vec_c()
andvec_rbind()
now fall back tobase::c()
if the inputs have a common class hierarchy for which ac()
method is implemented but no self-to-selfvec_ptype2()
method is implemented.vec_rbind()
now internally callsvec_proxy()
andvec_restore()
on the data frame common type that is used to create the output (#1109).vec_as_location2("0")
now works correctly (#1131).?reference-faq-compatibility
is a new reference guide on vctrs primitives. It includes an overview of the fallbacks to base R generics implemented in vctrs for compatibility with existing classes.The documentation of vctrs functions now includes a Dependencies section to reference which other vctrs operations are called from that function. By following the dependencies links recursively, you will find the vctrs primitives on which an operation relies.
CRAN results
- Fixed type declaration mismatches revealed by LTO build.
- Fixed r-devel issue with new
c.factor()
method.
vctrs 0.3.0
CRAN release: 2020-05-11
This version features an overhaul of the coercion system to make it more consistent and easier to implement. See the Breaking changes and Type system sections for details.
There are three new documentation topics if you’d like to learn how to implement coercion methods to make your class compatible with tidyverse packages like dplyr:
https://vctrs.r-lib.org/reference/theory-faq-coercion.html for an overview of the coercion mechanism in vctrs.
https://vctrs.r-lib.org/reference/howto-faq-coercion.html for a practical guide about implementing methods for vectors.
https://vctrs.r-lib.org/reference/howto-faq-coercion-data-frame.html for a practical guide about implementing methods for data frames.
Reverse dependencies troubleshooting
The following errors are caused by breaking changes.
-
"Can't convert <character> to <list>."
vec_cast()
no longer converts to list. Usevec_chop()
oras.list()
instead. -
"Can't convert <integer> to <character>."
vec_cast()
no longer converts to character. Useas.character()
to deparse objects. -
"names for target but not for current"
Names of list-columns are now preserved by
vec_rbind()
. Adjust tests accordingly.
Breaking changes
-
Double-dispatch methods for
vec_ptype2()
andvec_cast()
are no longer inherited (#710). Class implementers must implement one set of methods for each compatible class.For example, a tibble subclass no longer inherits from the
vec_ptype2()
methods betweentbl_df
anddata.frame
. This means that you explicitly need to implementvec_ptype2()
methods withtbl_df
anddata.frame
.This change requires a bit more work from class maintainers but is safer because the coercion hierarchies are generally different from class hierarchies. See the S3 dispatch section of
?vec_ptype2
for more information. -
vec_cast()
is now restricted to the same conversions asvec_ptype2()
methods (#606, #741). This change is motivated by safety and performance:It is generally sloppy to generically convert arbitrary inputs to one type. Restricted coercions are more predictable and allow your code to fail earlier when there is a type issue.
When unrestricted conversions are useful, this is generally towards a known type. For example,
glue::glue()
needs to convert arbitrary inputs to the known character type. In this case, using double dispatch instead of a single dispatch generic likeas.character()
is wasteful.To implement the useful semantics of coercible casts (already used in
vec_assign()
), two double dispatch were needed. Now it can be done with one double dispatch by callingvec_cast()
directly.
stop_incompatible_cast()
now throws an error of classvctrs_error_incompatible_type
rather thanvctrs_error_incompatible_cast
. This means thatvec_cast()
also throws errors of this class, which better aligns it withvec_ptype2()
now that they are restricted to the same conversions.The
y
argument ofstop_incompatible_cast()
has been renamed toto
to better matchto_arg
.
Type system
-
Double-dispatch methods for
vec_ptype2()
andvec_cast()
are now easier to implement. They no longer need any the boiler plate. Implementing a method for classesfoo
andbar
is now as simple as:#' @export vec_ptype2.foo.bar <- function(x, y, ...) new_foo()
vctrs also takes care of implementing the default and unspecified methods. If you have implemented these methods, they are no longer called and can now be removed.
One consequence of the new dispatch mechanism is that
NextMethod()
is now completely unsupported. This is for the best as it never worked correctly in a double-dispatch setting. Parent methods must now be called manually. vec_ptype2()
methods now get zero-size prototypes as inputs. This guarantees that methods do not peek at the data to determine the richer type.vec_is_list()
no longer allows S3 lists that implement avec_proxy()
method to automatically be considered lists. A S3 list must explicitly inherit from"list"
in the base class to be considered a list.vec_restore()
no longer restores row names if the target is not a data frame. This fixes an issue wherePOSIXlt
objects would carry arow.names
attribute after a proxy/restore roundtrip.vec_cast()
to and from data frames preserves the row names of inputs.-
The internal function
vec_names()
now returns row names if the input is a data frame. Similarly,vec_set_names()
sets row names on data frames. This is part of a general effort at making row names the vector names of data frames in vctrs.If necessary, the row names are repaired verbosely but without error to make them unique. This should be a mostly harmless change for users, but it could break unit tests in packages if they make assumptions about the row names.
Compatibility and fallbacks
-
With the double dispatch changes, the coercion methods are no longer inherited from parent classes. This is because the coercion hierarchy is in principle different from the S3 hierarchy. A consequence of this change is that subclasses that don’t implement coercion methods are now in principle incompatible.
This is particularly problematic with subclasses of data frames for which throwing incompatible errors would be too incovenient for users. To work around this, we have implemented a fallback to the relevant base data frame class (either
data.frame
ortbl_df
) in coercion methods (#981). This fallback is silent unless you set thevctrs:::warn_on_fallback
option toTRUE
.In the future we may extend this fallback principle to other base types when they are explicitly included in the class vector (such as
"list"
). -
Improved support for foreign classes in the combining operations
vec_c()
,vec_rbind()
, andvec_unchop()
. A foreign class is a class that doesn’t implementvec_ptype2()
. When all the objects to combine have the same foreign class, one of these fallbacks is invoked:If the class implements a
base::c()
method, the method is used for the combination. (FIXME:vec_rbind()
currently doesn’t use this fallback.)Otherwise if the objects have identical attributes and the same base type, we consider them to be compatible. The vectors are concatenated and the attributes are restored (#776).
These fallbacks do not make your class completely compatible with vctrs-powered packages, but they should help in many simple cases.
vec_c()
andvec_unchop()
now fall back tobase::c()
for S4 objects if the object doesn’t implementvec_ptype2()
but sets an S4c()
method (#919).
Vector operations
vec_rbind()
andvec_c()
with data frame inputs now consistently preserve the names of list-columns, df-columns, and matrix-columns (#689). This can cause some false positives in unit tests, if they are sensitive to internal names (#1007).vec_rbind()
now repairs row names silently to avoid confusing messages when the row names are not informative and were not created on purpose.vec_rbind()
gains option to treat input names as row names. This is disabled by default (#966).New
vec_rep()
andvec_rep_each()
for repeating an entire vector and elements of a vector, respectively. These two functions provide a clearer interface for the functionality ofvec_repeat()
, which is now deprecated.-
vec_cbind()
now callsvec_restore()
on inputs emptied of their columns before computing the common type. This has consequences for data frame classes with special columns that devolve into simpler classes when the columns are subsetted out. These classes are now always simplified byvec_cbind()
.For instance, column-binding a grouped data frame with a data frame now produces a tibble (the simplified class of a grouped data frame).
vec_match()
andvec_in()
gain parameters for argument tags (#944).The internal version of
vec_assign()
now has support for assigning names and inner names. For data frames, the names are assigned recursively.vec_assign()
gainsx_arg
andvalue_arg
parameters (#918).vec_group_loc()
, which powersdplyr::group_by()
, now has more efficient vector access (#911).vec_ptype()
gained anx_arg
argument.New
list_sizes()
for computing the size of every element in a list.list_sizes()
is tovec_size()
aslengths()
is tolength()
, except that it only supports lists. Atomic vectors and data frames result in an error.new_data_frame()
infers size from row names whenn = NULL
(#894).vec_c()
now acceptsrlang::zap()
as.name_spec
input. The returned vector is then always unnamed, and the names do not cause errors when they can’t be combined. They are still used to create more informative messages when the inputs have incompatible types (#232).
Classes
vctrs now supports the
data.table
class. The common type of a data frame and a data table is a data table.new_vctr()
now always appends a base"list"
class to list.data
to be compatible with changes tovec_is_list()
. This affectsnew_list_of()
, which now returns an object with a base class of"list"
.dplyr methods are now implemented for
vec_restore()
,vec_ptype2()
, andvec_cast()
. The user-visible consequence (and breaking change) is that row-binding a grouped data frame and a data frame or tibble now returns a grouped data frame. It would previously return a tibble.The
is.na<-()
method forvctrs_vctr
now supports numeric and character subscripts to indicate where to insert missing values (#947).The base classes
AsIs
andtable
have vctrs methods (#904, #906).POSIXlt
andPOSIXct
vectors are handled more consistently (#901).Ordered factors that do not have identical levels are now incompatible. They are now incompatible with all factors.
Indexing and names
vec_as_subscript()
now fails when the subscript is a matrix or an array, consistently withvec_as_location()
.Improved error messages in
vec_as_location()
when subscript is a matrix or array (#936).vec_as_location2()
properly picks upsubscript_arg
(tidyverse/tibble#735).vec_as_names()
now has more informative error messages when names are not unique (#882).vec_as_names()
gains arepair_arg
argument that when set will causerepair = "check_unique"
to generate an informative hint (#692).
Conditions
stop_incompatible_type()
now has anaction
argument for customizing whether the coercion error came fromvec_ptype2()
orvec_cast()
.stop_incompatible_cast()
is now a thin wrapper aroundstop_incompatible_type(action = "convert")
.stop_
functions now takedetails
after the dots. This argument can no longer be passed by position.Supplying both
details
andmessage
to thestop_
functions is now an internal error.x_arg
,y_arg
, andto_arg
are now compulsory arguments instop_
functions likestop_incompatible_type()
.Lossy cast errors are now considered internal. Please don’t test for the class or explicitly handle them.
New argument
loss_type
for the experimental functionmaybe_lossy_cast()
. It can take the values “precision” or “generality” to indicate in the error message which kind of loss is the error about (double to integer loses precision, character to factor loses generality).Coercion and recycling errors are now more consistent.
CRAN results
Fixed clang-UBSAN error “nan is outside the range of representable values of type ‘int’” (#902).
Fixed compilation of stability vignette following the date conversion changes on R-devel.
vctrs 0.2.4
CRAN release: 2020-03-10
Factors and dates methods are now implemented in C for efficiency.
new_data_frame()
now correctly updates attributes and supports merging of the"names"
and"row.names"
arguments (#883).vec_match()
gains anna_equal
argument (#718).vec_chop()
’sindices
argument has been restricted to positive integer vectors. Character and logical subscripts haven’t proven useful, and this alignsvec_chop()
withvec_unchop()
, for which only positive integer vectors make sense.New
vec_unchop()
for combining a list of vectors into a single vector. It is similar tovec_c()
, but gives greater control over how the elements are placed in the output through the use of a secondaryindices
argument.Breaking change: When
.id
is supplied,vec_rbind()
now creates the identifier column at the start of the data frame rather than at the end.numeric_version
andpackage_version
lists are now treated as vectors (#723).vec_slice()
now properly handles symbols and S3 subscripts.vec_as_location()
andvec_as_subscript()
are now fully implemented in C for efficiency.num_as_location()
gains a new argument,zero
, for controlling whether to"remove"
,"ignore"
, or"error"
on zero values (#852).
vctrs 0.2.3
CRAN release: 2020-02-20
The main feature of this release is considerable performance improvements with factors and dates.
vec_c()
now falls back tobase::c()
if the vector doesn’t implementvec_ptype2()
but implementsc()
. This should improve the compatibility of vctrs-based functions with foreign classes (#801).new_data_frame()
is now faster.New
vec_is_list()
for detecting if a vector is a list in the vctrs sense. For instance, objects of classlm
are not lists. In general, classes need to explicitly inherit from"list"
to be considered as lists by vctrs.-
Unspecified vectors of
NA
can now be assigned into a list (#819).x <- list(1, 2) vec_slice(x, 1) <- NA x #> [[1]] #> NULL #> #> [[2]] #> 2
vec_ptype()
now errors on scalar inputs (#807).vec_ptype_finalise()
is now recursive over all data frame types, ensuring that unspecified columns are correctly finalised to logical (#800).vec_ptype()
now correctly handles unspecified columns in data frames, and will always return an unspecified column type (#800).vec_slice()
andvec_chop()
now work correctly withbit64::integer64()
objects when anNA
subscript is supplied. By extension, this means thatvec_init()
now works with these objects as well (#813).vec_rbind()
now binds row names. When named inputs are supplied andnames_to
isNULL
, the names define row names. Ifnames_to
is supplied, they are assigned in the column name as before.vec_cbind()
now uses the row names of the first named input.The
c()
method forvctrs_vctr
now throws an error whenrecursive
oruse.names
is supplied (#791).
vctrs 0.2.2
CRAN release: 2020-01-24
New
vec_as_subscript()
function to cast inputs to the base type of a subscript (logical, numeric, or character).vec_as_index()
has been renamed tovec_as_location()
. Usenum_as_location()
if you need more options to control how numeric subscripts are converted to a vector of locations.New
vec_as_subscript2()
,vec_as_location2()
, andnum_as_location2()
variants for validating scalar subscripts and locations (e.g. for indexing with[[
).vec_as_location()
now preserves names of its inputs if possible.-
vec_ptype2()
methods for base classes now prevent inheritance. This makes sense because the subtyping graph created byvec_ptype2()
methods is generally not the same as the inheritance relationships defined by S3 classes. For instance, subclasses are often a richer type than their superclasses, and should often be declared as supertypes (e.g.vec_ptype2()
should return the subclass).We introduced this breaking change in a patch release because
new_vctr()
now adds the base type to the class vector by default, which causedvec_ptype2()
to dispatch erroneously to the methods for base types. We’ll finish switching to this approach in vctrs 0.3.0 for the rest of the base S3 classes (dates, data frames, …). vec_equal_na()
now works with complex vectors.vctrs_vctr
class gains anas.POSIXlt()
method (#717).vec_slice()
now support Altvec vectors (@jimhester, #696).vec_proxy_equal()
is now applied recursively across the columns of data frames (#641).vec_split()
no longer returns theval
column as alist_of
. It is now returned as a bare list (#660).Complex numbers are now coercible with integer and double (#564).
zeallot has been moved from Imports to Suggests, meaning that
%<-%
is no longer re-exported from vctrs.vec_equal()
no longer propagates missing values when comparing list elements. This means thatvec_equal(list(NULL), list(NULL))
will continue to returnNA
becauseNULL
is the missing element for a list, but nowvec_equal(list(NA), list(NA))
returnsTRUE
because theNA
values are compared directly without checking for missingness.Lists of expressions are now supported in
vec_equal()
and functions that compare elements, such asvec_unique()
andvec_match()
. This ensures that they work with the result of modeling functions likeglm()
andmgcv::gam()
which store “family” objects containing expressions (#643).new_vctr()
gains an experimentalinherit_base_type
argument which determines whether or not the class of the underlying type will be included in the class.vec_ptype()
has relaxed default behaviour for base types; now if two vectors both inherit from (e.g.) “character”, the common type is also “character” (#497).vec_equal()
now correctly treatsNULL
as the missing value element for lists (#653).vec_cast()
now casts data frames to lists rowwise, i.e. to a list of data frames of size 1. This preserves the invariant ofvec_size(vec_cast(x, to)) == vec_size(x)
(#639).Positive and negative 0 are now considered equivalent by all functions that check for equality or uniqueness (#637).
New experimental functions
vec_group_rle()
for returning run length encoded groups;vec_group_id()
for constructing group identifiers from a vector;vec_group_loc()
for computing the locations of unique groups in a vector (#514).New
vec_chop()
for repeatedly slicing a vector. It efficiently captures the pattern ofmap(indices, vec_slice, x = x)
.Support for multiple character encodings has been added to functions that compare elements within a single vector, such as
vec_unique()
, and across multiple vectors, such asvec_match()
. When multiple encodings are encountered, a translation to UTF-8 is performed before any comparisons are made (#600, #553).Equality and ordering methods are now implemented for raw and complex vectors (@romainfrancois).
vctrs 0.2.0
CRAN release: 2019-07-05
With the 0.2.0 release, many vctrs functions have been rewritten with native C code to improve performance. Functions like vec_c()
and vec_rbind()
should now be fast enough to be used in packages. This is an ongoing effort, for instance the handling of factors and dates has not been rewritten yet. These classes still slow down vctrs primitives.
The API in 0.2.0 has been updated, please see a list of breaking changes below. vctrs has now graduated from experimental to a maturing package. Please note that API changes are still planned for future releases, for instance vec_ptype2()
and vec_cast()
might need to return a sentinel instead of failing with an error when there is no common type or possible cast.
Breaking changes
Lossy casts now throw errors of type
vctrs_error_cast_lossy
. Previously these were warnings. You can suppress these errors selectively withallow_lossy_cast()
to get the partial cast results. To implement your own lossy cast operation, call the new exported functionmaybe_lossy_cast()
.-
vec_c()
now fails when an input is supplied with a name but has internal names or is length > 1:vec_c(foo = c(a = 1)) #> Error: Can't merge the outer name `foo` with a named vector. #> Please supply a `.name_spec` specification. vec_c(foo = 1:3) #> Error: Can't merge the outer name `foo` with a vector of length > 1. #> Please supply a `.name_spec` specification.
You can supply a name specification that describes how to combine the external name of the input with its internal names or positions:
# Name spec as glue string: vec_c(foo = c(a = 1), .name_spec = "{outer}_{inner}") # Name spec as a function: vec_c(foo = c(a = 1), .name_spec = function(outer, inner) paste(outer, inner, sep = "_")) vec_c(foo = c(a = 1), .name_spec = ~ paste(.x, .y, sep = "_"))
vec_empty()
has been renamed tovec_is_empty()
.vec_dim()
andvec_dims()
are no longer exported.vec_na()
has been renamed tovec_init()
, as the primary use case is to initialize an output container.vec_slice<-
is now type stable (#140). It always returns the same type as the LHS. If needed, the RHS is cast to the correct type, but only if both inputs are coercible. See examples in?vec_slice
.-
We have renamed the
type
particle toptype
:Consequently,
vec_ptype()
was renamed tovec_ptype_show()
.
New features
-
New
vec_proxy()
generic. This is the main customisation point in vctrs along withvec_restore()
. You should only implement it when your type is designed around a non-vector class (atomic vectors, bare lists, data frames). In this case,vec_proxy()
should return such a vector class. The vctrs operations will be applied on the proxy andvec_restore()
is called to restore the original representation of your type.The most common case where you need to implement
vec_proxy()
is for S3 lists. In vctrs, S3 lists are treated as scalars by default. This way we don’t treat objects like model fits as vectors. To prevent vctrs from treating your S3 list as a scalar, unclass it from thevec_proxy()
method. For instance here is the definition forlist_of
:#' @export vec_proxy.vctrs_list_of <- function(x) { unclass(x) }
If you inherit from
vctrs_vctr
orvctrs_rcrd
you don’t need to implementvec_proxy()
. vec_c()
,vec_rbind()
, andvec_cbind()
gain a.name_repair
argument (#227, #229).-
vec_c()
,vec_rbind()
,vec_cbind()
, and all functions relying onvec_ptype_common()
now have more informative error messages when some of the inputs have nested data frames that are not convergent:df1 <- tibble(foo = tibble(bar = tibble(x = 1:3, y = letters[1:3]))) df2 <- tibble(foo = tibble(bar = tibble(x = 1:3, y = 4:6))) vec_rbind(df1, df2) #> Error: No common type for `..1$foo$bar$y` <character> and `..2$foo$bar$y` <integer>.
-
vec_cbind()
now turns named data frames to packed columns.<- tibble::tibble(x = 1:3, y = letters[1:3]) data <- vec_cbind(data, packed = data) data data# A tibble: 3 x 3 $x $y x y packed<int> <chr> <int> <chr> 1 1 a 1 a 2 2 b 2 b 3 3 c 3 c
Packed data frames are nested in a single column. This makes it possible to access it through a single name:
$packed data# A tibble: 3 x 2 x y<int> <chr> 1 1 a 2 2 b 3 3 c
We are planning to use this syntax more widely in the tidyverse.
-
New
vec_is()
function to check whether a vector conforms to a prototype and/or a size. Unlikevec_assert()
, it doesn’t throw errors but returnsTRUE
orFALSE
(#79).Called without a specific type or size,
vec_assert()
tests whether an object is a data vector or a scalar. S3 lists are treated as scalars by default. Implement avec_is_vector()
for your class to override this property (or derive fromvctrs_vctr
). New
vec_order()
andvec_sort()
for ordering and sorting generalised vectors.New
.names_to
parameter forvec_rbind()
. If supplied, this should be the name of a column where the names of the inputs are copied. This is similar to the.id
parameter ofdplyr::bind_rows()
.New
vec_seq_along()
andvec_init_along()
create useful sequences (#189).vec_slice()
now preserves character row names, if present.New
vec_split(x, by)
is a generalisation ofsplit()
that can divide a vector into groups formed by the unique values of another vector. Returns a two-column data frame containing unique values ofby
aligned with matchingx
values (#196).
Other features and bug fixes
Using classed errors of class
"vctrs_error_assert"
for failed assertions, and of class"vctrs_error_incompatible"
(with subclasses_type
,_cast
and_op
) for errors on incompatible types (#184).Character indexing is now only supported for named objects, an error is raised for unnamed objects (#171).
Predicate generics now consistently return logical vectors when passed a
vctrs_vctr
class. They used to restore the output to their input type (#251).list_of()
now has anas.character()
method. It usesvec_ptype_abbr()
to collapse complex objects into their type representation (tidyverse/tidyr#654).New
stop_incompatible_size()
to signal a failure due to mismatched sizes.New
validate_list_of()
(#193).vec_arith()
is consistent with base R when combiningdifftime
anddate
, with a warning if casts are lossy (#192).vec_c()
andvec_rbind()
now handle data.frame columns properly (@yutannihilation, #182).vec_cast(x, data.frame())
preserves the number of rows inx
.vec_equal()
now handles missing values symmetrically (#204).vec_equal_na()
now returnsTRUE
for data frames and records when every component is missing, not when any component is missing (#201).vec_init()
checks input is a vector.vec_proxy_compare()
gains an experimentalrelax
argument, which allows data frames to be orderable even if all their columns are not (#210).vec_size()
now works with positive short row names. This fixes issues with data frames created with jsonlite (#220).vec_slice<-
now has avec_assign()
alias. Usevec_assign()
when you don’t want to modify the original input.vec_slice()
now callsvec_restore()
automatically. Unlike the default[
method from base R, attributes are preserved by default.vec_slice()
can correct slice 0-row data frames (#179).New
vec_repeat()
for repeating each element of a vector the same number of times.vec_type2(x, data.frame())
ensures that the returned object has names that are a length-0 character vector.