This vignette shows you how to create your own S3 vector classes. It focuses on the aspects of making a vector class that every class needs to worry about; you’ll also need to provide methods that actually make the vector useful.

I assume that you’re already familiar with the basic machinery of S3, and the vocabulary I use in Advanced R: constructor, helper, and validator. If not, I recommend reading at least the first two sections of the S3 chapter of *Advanced R*.

`library(vctrs)`

This vignette works through five big topics:

- The basics of creating a new vector class with vctrs.
- The coercion and casting system.
- The record and list-of types.
- Equality and comparison proxies.
- Arithmetic operators.

They’re collectively demonstrated with a number of simple S3 classes:

Percent: a double vector that prints as a percentage. This illustrates the basic mechanics of class creation, coercion, and casting.

Decimal: a double vector that always prints with a fixed number of decimal places. This class has an attribute which needs a little extra care in casts and coercions.

Cached sum: a double vector that caches the total sum in an attribute. The attribute depends on the data, so needs extra care.

Rational: a pair of integer vectors that defines a rational number like

`2 / 3`

. This introduces you to the record style, and to the equality and comparison operators. It also needs special handling for`+`

,`-`

, and friends.Polynomial: a list of integer vectors that define polynomials like

`1 + x - x^3`

. Sorting such vectors correctly requires a custom equality method.Meter: a numeric vector with meter units. This is the simplest possible class with interesting algebraic properties.

Period and frequency: a pair of classes represent a period, or it’s inverse, frequency. This allows us to explore more arithmetic operators.

In this section you’ll learn how to create a new vctrs class by calling `new_vctr()`

. This creates an object with class `vctrs_vctr`

which has a number of methods. These are designed to make your life as easy as possible. For example:

The

`print()`

and`str()`

methods are defined in terms of`format()`

so you get a pleasant, consistent display as soon as you’ve made your`format()`

method.You can immediately put your new vector class in a data frame because

`as.data.frame.vctrs_vctr()`

does the right thing.Subsetting (

`[`

,`[[`

,`$`

),`length<-`

, and`rep()`

methods automatically preserve attributes because they use`vec_restore()`

. A default`vec_restore()`

works for all classes where the attributes are data-independent, and can easily be customised when the attributes do depend on the data.Default subset-assignment methods (

`[<-`

,`[[<-`

,`$<-`

) follow the principle that the new values should be coerced to match the existing vector. This gives predictable behaviour and clear error messages.

In this section, I’ll show you how to make a `percent`

class, i.e. a double vector that is printed as a percentage. We start by defining a low-level constructor that uses `vec_assert()`

to checks types and/or sizes then calls `new_vctr()`

.

`percent`

is built on a double vector of any length and doesn’t have any attributes.

```
new_percent <- function(x = double()) {
vec_assert(x, double())
new_vctr(x, class = "vctrs_percent")
}
x <- new_percent(c(seq(0, 1, length = 4), NA))
x
#> <vctrs_percent[5]>
#> [1] 0.0000000 0.3333333 0.6666667 1.0000000 NA
str(x)
#> vctrs_pr [1:5] 0.0000000, 0.3333333, 0.6666667, 1.0000000, NA
```

Note that we prefix the name of the class with the name of the package. This prevents conflicting definitions between packages. For packages that implement only one class (such as blob), it’s fine to use the package name without prefix as the class name.

We then follow up with a user friendly helper. Here we’ll use `vec_cast()`

to allow it to accept anything coercible to a double:

```
percent <- function(x = double()) {
x <- vec_cast(x, double())
new_percent(x)
}
```

Before you go on, check that user-friendly constructor returns a zero-length vector when called with no arguments. This makes it easy to use as a prototype.

Add a call to `setOldClass()`

for compatibility with the S4 system:

```
#' @importFrom methods setOldClass
methods::setOldClass(c("vctrs_percent", "vctrs_vctr"))
```

For the convenience of your users, consider implementing an `is_percent()`

function:

```
is_percent <- function(x) {
inherits(x, "vctrs_percent")
}
```

`format()`

methodThe first method for every class should almost always be a `format()`

method. This should return a character vector the same length as `x`

. The easiest way to do this is to rely on one of R’s low-level formatting functions like `formatC()`

:

```
format.vctrs_percent <- function(x, ...) {
out <- formatC(signif(vec_data(x) * 100, 3))
out[is.na(x)] <- NA
out[!is.na(x)] <- paste0(out[!is.na(x)], "%")
out
}
```

(Note the use of `vec_data()`

so `format()`

doesn’t get stuck in an infinite loop, and that I take a little care to not convert `NA`

to `"NA"`

; this leads to better printing.)

The format method is also used by data frames, tibbles, and `str()`

:

For optimal display, I recommend also defining an abbreviated type name, which should be 4-5 letters for commonly used vectors. This is used in tibbles and in `str()`

:

```
vec_ptype_abbr.vctrs_percent <- function(x) {
"prcnt"
}
tibble::tibble(x)
#> # A tibble: 5 x 1
#> x
#> <prcnt>
#> 1 0%
#> 2 33.3%
#> 3 66.7%
#> 4 100%
#> 5 NA
str(x)
#> prcnt [1:5] 0%, 33.3%, 66.7%, 100%, <NA>
```

If you need more control over printing in tibbles, implement a method for `pillar::pillar_shaft()`

. See https://tibble.tidyverse.org/articles/extending.html for details.

The next set of methods you are likely to need are those related to coercion and casting. Coercion and casting are two sides of the same coin: changing the prototype of an existing object. When the change happens *implicitly* (e.g in `c()`

) we call it **coercion**; when the change happens *explicitly* (e.g. with `as.integer(x)`

), we call it **casting**.

One of the main goals of vctrs is to put coercion and casting on a robust theoretical footing so it’s possible to make accurate predictions about what (e.g.) `c(x, y)`

should do when `x`

and `y`

have different prototypes. vctrs achieves this goal through two generics:

`vec_type2(x, y)`

defines possible set of coercions. It returns a prototype if`x`

and`y`

can be safely coerced to the same prototype; otherwise it returns an error. The set of automatic coercions is usually quite small because too many tend to make code harder to reason about and silently propagate mistakes.`vec_cast(x, to)`

defines the possible sets of casts. It returns`x`

translated to have prototype`to`

, or throws an error if the conversion isn’t possible. The set of possible casts is a superset of possible coercions because they’re requested explicitly.

Both generics use **double dispatch** which means that the implementation is selected based on the class of two arguments, not just one. S3 does not natively support double dispatch, but we can implement with a trick: doing single dispatch twice. In practice, this means you end up with method names with two classes, like `vec_type2.foo.bar()`

, and you need a little boilerplate to get started. The key idea that makes double dispatch work without any modifications to S3 is that a function (like `vec_type2.foo()`

) can be both an S3 generic and an S3 method.

```
vec_type2.MYCLASS <- function(x, y, ...) UseMethod("vec_type2.MYCLASS", y)
vec_type2.MYCLASS.default <- function(x, y, ..., x_arg = "", y_arg = "") {
stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
}
vec_type2.MYCLASS.vctrs_unspecified <- function(x, y, ...) x
vec_cast.MYCLASS <- function(x, to) UseMethod("vec_cast.MYCLASS")
vec_cast.MYCLASS.default <- function(x, to) stop_incompatible_cast(x, to)
vec_cast.MYCLASS.logical <- function(x, to) vec_unspecified_cast(x, to)
```

We’ll discuss what this boilerplate does in the upcoming sections; just remember you’ll always need to copy and paste it when creating a new S3 class.

We’ll make our percent class coercible back and forth with double vectors. I’ll start with the boilerplate for `vec_type2()`

:

```
vec_type2.vctrs_percent <- function(x, y, ...) UseMethod("vec_type2.vctrs_percent", y)
vec_type2.vctrs_percent.default <- function(x, y, ..., x_arg = "", y_arg = "") {
stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
}
vec_type2.vctrs_percent.vctrs_unspecified <- function(x, y, ...) x
```

The default method provides a user friendly error message if the coercion doesn’t exist. The `vctrs_unspecified`

is needed to handle `NA`

, which is technically a logical vector, but we want to stand in for a missing value of any type.

```
vec_type2("bogus", percent())
#> No common type for <character> and <vctrs_percent>.
vec_type2(percent(), vec_type(NA))
#> <vctrs_percent[0]>
```

Next, start by saying that a `vctrs_percent`

combined with a `vctrs_percent`

yields a `vctrs_percent`

, which we indicate by returning a prototype generated by the constructor.

Next we define methods that say that combining a `percent`

and double should yield a `double`

. We avoid returning a `percent`

here, because errors in the scale (1 vs. 0.01) are more obvious with raw numbers.

Because double dispatch is a bit of a hack, we need to provide two methods. It’s your responsibility to ensure that each pair return the same result: if they don’t you will get weird and unpredictable behaviour.

```
vec_type2.vctrs_percent.double <- function(x, y, ...) double()
vec_type2.double.vctrs_percent <- function(x, y, ...) double()
```

We can check that we’ve implemented this correctly with `vec_ptype()`

:

```
vec_ptype(percent(), double(), percent())
#> Prototype: <double>
#> 0. ( , <vctrs_percent> ) = <vctrs_percent>
#> 1. ( <vctrs_percent> , <double> ) = <double>
#> 2. ( <double> , <vctrs_percent> ) = <double>
```

Next we implement explicit casting, again starting with the boilerplate:

```
vec_cast.vctrs_percent <- function(x, to) UseMethod("vec_cast.vctrs_percent")
vec_cast.vctrs_percent.default <- function(x, to) stop_incompatible_cast(x, to)
vec_cast.vctrs_percent.logical <- function(x, to) vec_unspecified_cast(x, to)
```

Then providing a method to coerce a percent to a percent:

And then for converting back and forth between doubles. To convert a double to a percent we use the `percent()`

helper (not the constructor; this is unvalidated user input). To convert a `percent`

to a double, we strip the attributes.

```
vec_cast.vctrs_percent.double <- function(x, to) percent(x)
vec_cast.double.vctrs_percent <- function(x, to) vec_data(x)
```

Then we can check this works with `vec_cast()`

:

```
vec_cast(0.5, percent())
#> <vctrs_percent[1]>
#> [1] 50%
vec_cast(percent(0.5), double())
#> [1] 0.5
```

Once you’ve implemented `vec_type2()`

and `vec_cast()`

you get `vec_c()`

, `[<-`

and `[[<-`

implementations for free.

```
vec_c(percent(0.5), 1)
#> [1] 0.5 1.0
vec_c(NA, percent(0.5))
#> <vctrs_percent[2]>
#> [1] <NA> 50%
# but
vec_c(TRUE, percent(0.5))
#> No common type for <logical> and <vctrs_percent>.
x <- percent(c(0.5, 1, 2))
x[1:2] <- 2:1
#> Can't cast <integer> to <vctrs_percent>
x[[3]] <- 0.5
x
#> <vctrs_percent[3]>
#> [1] 50% 100% 50%
```

You’ll also get mostly correct behaviour for `c()`

. The exception is when you use `c()`

with a base R class:

```
# Correct
c(percent(0.5), 1)
#> [1] 0.5 1.0
c(percent(0.5), factor(1))
#> No common type for <vctrs_percent> and <factor<5b58e>>.
# Incorrect
c(factor(1), percent(0.5))
#> [1] 1.0 0.5
```

Unfortunately there’s no way to fix this problem with the current design of `c()`

.

Again, as a convenience, consider providing an `as_percent()`

function that makes use of the casts defined in your `vec_cast.vctrs_percent()`

methods:

```
as_percent <- function(x) {
vec_cast(x, new_percent())
}
```

Now that you’ve seen the basics with a very simple S3 class, we’ll gradually explore more complicated scenarios. This section creates a `decimal`

class that prints with the specified number of decimal places. This is very similar to `percent`

but now the class needs an attribute: the number of decimal places to display (an integer vector of length 1).

We start of as before, defining a low-level constructor, a user-friendly constructor, a `format()`

method, and a `vec_ptype_abbr()`

. Note that additional object attributes are simply passed along to `new_vctr()`

:

```
new_decimal <- function(x = double(), digits = 2L) {
vec_assert(x, ptype = double())
vec_assert(digits, ptype = integer(), size = 1)
new_vctr(x, digits = digits, class = "vctrs_decimal")
}
decimal <- function(x = double(), digits = 2L) {
x <- vec_cast(x, double())
digits <- vec_recycle(vec_cast(digits, integer()), 1L)
new_decimal(x, digits = digits)
}
digits <- function(x) attr(x, "digits")
format.vctrs_decimal <- function(x, ...) {
sprintf(paste0("%-0.", digits(x), "f"), x)
}
vec_ptype_abbr.vctrs_decimal <- function(x) {
paste0("dec")
}
x <- decimal(runif(10), 1L)
x
#> <vctrs_decimal[10]>
#> [1] 0.1 0.8 0.6 0.2 0.0 0.5 0.5 0.3 0.7 0.8
```

Note that I provide a little helper to extract the `digits`

attribute. This makes the code a little easier to read, and should not be exported.

By default, vctrs assumes that attributes are independent of the data, and so are automatically preserved. You’ll see what to do if the attributes are data dependent in the next section.

For the sake of exposition, we’ll assume that `digits`

is an important attribute of the class, and should be included in the full type:

```
vec_ptype_full.vctrs_decimal <- function(x) {
paste0("decimal<", digits(x), ">")
}
x
#> <decimal<1>[10]>
#> [1] 0.1 0.8 0.6 0.2 0.0 0.5 0.5 0.3 0.7 0.8
```

Now consider `vec_cast()`

and `vec_type2()`

. I start with the standard recipes:

```
vec_type2.vctrs_decimal <- function(x, y, ...) UseMethod("vec_type2.vctrs_decimal")
vec_type2.vctrs_decimal.default <- function(x, y, ..., x_arg = "", y_arg = "") {
stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
}
vec_type2.vctrs_decimal.vctrs_unspecified <- function(x, y, ...) x
vec_cast.vctrs_decimal <- function(x, to) UseMethod("vec_cast.vctrs_decimal")
vec_cast.vctrs_decimal.default <- function(x, to) stop_incompatible_cast(x, to)
vec_cast.vctrs_decimal.logical <- function(x, to) vec_unspecified_cast(x, to)
```

Casting and coercing from one decimal to another requires a little thought as the values of the `digits`

attribute might be different, and we need some way to reconcile them. Here I’ve chosen to chose the maximum of the two; other reasonable options are to take the value from the left-hand side or throw an error.

```
vec_type2.vctrs_decimal.vctrs_decimal <- function(x, y, ...) {
new_decimal(digits = max(digits(x), digits(y)))
}
vec_cast.vctrs_decimal.vctrs_decimal <- function(x, to, ...) {
new_decimal(vec_data(x), digits = digits(to))
}
vec_c(decimal(1/100, digits = 3), decimal(2/100, digits = 2))
#> <decimal<3>[2]>
#> [1] 0.010 0.020
```

Finally, I can implement coercion to and from other types, like doubles. When automatically coercing, I choose the richer type (i.e. the decimal).

```
vec_type2.vctrs_decimal.double <- function(x, y, ...) x
vec_type2.double.vctrs_decimal <- function(x, y, ...) y
vec_cast.vctrs_decimal.double <- function(x, to) new_decimal(x, digits = digits(to))
vec_cast.double.vctrs_decimal <- function(x, to) vec_data(x)
vec_c(decimal(1, digits = 1), pi)
#> <decimal<1>[2]>
#> [1] 1.0 3.1
vec_c(pi, decimal(1, digits = 1))
#> <decimal<1>[2]>
#> [1] 3.1 1.0
```

If type `x`

has greater resolution than `y`

, there will be some inputs that lose precision. These should generate errors using `stop_lossy_cast()`

. You can see that in action when casting from doubles to integers; only some doubles can become integers without losing resolution.

```
vec_cast(c(1, 2, 10), to = integer())
#> [1] 1 2 10
vec_cast(c(1.5, 2, 10.5), to = integer())
#> Lossy cast from <double> to <integer>.
```

The next level up in complexity is an object that has data-dependent attributes. To explore this idea we’ll create a vector that caches the sum of its values. As usual, we start with low-level and user-friendly constructors:

```
new_cached_sum <- function(x = double(), sum = 0L) {
vec_assert(x, ptype = double())
vec_assert(sum, ptype = double(), size = 1L)
new_vctr(x, sum = sum, class = "vctrs_cached_sum")
}
cached_sum <- function(x) {
x <- vec_cast(x, double())
new_cached_sum(x, sum(x))
}
```

For this class, we can use the default `format()`

method, and instead, we’ll customise the `obj_print_footer()`

method. This is a good place to display user facing attributes.

```
obj_print_footer.vctrs_cached_sum <- function(x, ...) {
cat("# Sum: ", format(attr(x, "sum"), digits = 3), "\n", sep = "")
}
x <- cached_sum(runif(10))
x
#> <vctrs_cached_sum[10]>
#> [1] 0.87460066 0.17494063 0.03424133 0.32038573 0.40232824 0.19566983
#> [7] 0.40353812 0.06366146 0.38870131 0.97554784
#> # Sum: 3.83
```

We’ll also override `sum()`

and `mean()`

to use the attribute. This is easiest to do with `vec_math()`

, which you’ll learn about later.

```
vec_math.vctrs_cached_sum <- function(fun, x, ...) {
cat("Using cache\n")
switch(fun,
sum = attr(x, "sum"),
mean = attr(x, "sum") / length(x),
vec_math_base(fun, x, ...)
)
}
sum(x)
#> Using cache
#> [1] 3.833615
```

As mentioned above, vctrs assumes that attributes are independent of the data. This means that when we take advantage of the default methods, they’ll work, but return the incorrect result:

To fix this, you need to provide a `vec_restore()`

method. Note that this method dispatches on the `to`

argument.

```
vec_restore.vctrs_cached_sum <- function(x, to, ..., i = NULL) {
new_cached_sum(x, sum(x))
}
x[1]
#> <vctrs_cached_sum[1]>
#> [1] 0.8746007
#> # Sum: 0.875
```

This works because most of the vctrs methods dispatch to the underlying base function, by first stripping off extra attributes with `vec_data()`

and then reapplying them again with `vec_restore()`

. The default `vec_restore()`

method copies over all attributes, which is not appropriate when the attributes depend on the data.

Note that `vec_restore.class`

is subtly different from `vec_cast.class.class()`

. `vec_restore()`

is used when restoring attributes that have been lost; `vec_cast()`

is used for coercions. This is easier to understand with a concrete example. Imagine factors were implemented with `new_vectr()`

. `vec_restore.factor()`

would restore attributes back to an integer vector, but you would not want to allow manually casting an integer to a factor with `vec_cast()`

.

Record-style objects use a list of equal-length vectors to represent individual components of the object. The best example of this is `POSIXlt`

, which underneath the hood is a list of 11 fields like year, month, and day. Record-style classes override `length()`

and subsetting methods to conceal this implementation detail.

```
x <- as.POSIXlt(ISOdatetime(2020, 1, 1, 0, 0, 1:3))
x
#> [1] "2020-01-01 00:00:01 UTC" "2020-01-01 00:00:02 UTC"
#> [3] "2020-01-01 00:00:03 UTC"
length(x)
#> [1] 3
length(unclass(x))
#> [1] 9
x[[1]] # the first date time
#> [1] "2020-01-01 00:00:01 UTC"
unclass(x)[[1]] # the first component, the number of seconds
#> [1] 1 2 3
```

vctrs makes it easy to create new record-style classes using `new_rcrd()`

, which has a wide selection of default methods.

A fraction, or rational number can be represented by a pair of integer vectors representing the numerator (the number on top) and the denominator (the number on bottom), where the length of each vector must be the same. To represent such a data structure we turn to a new base data type: the record (or rcrd for short).

As usual we start with low-level and user-friendly constructors. The low-level constructor calls `new_rcrd()`

which needs a named list of equal-length vectors.

```
new_rational <- function(n = integer(), d = integer()) {
vec_assert(n, ptype = integer())
vec_assert(d, ptype = integer())
new_rcrd(list(n = n, d = d), class = "vctrs_rational")
}
```

Our user friendly constructor casts `n`

and `d`

to integers and recycles them to the same length.

```
rational <- function(n, d) {
c(n, d) %<-% vec_cast_common(n, d, .to = integer())
c(n, d) %<-% vec_recycle_common(n, d)
new_rational(n, d)
}
x <- rational(1, 1:10)
```

Behind the scenes, `x`

is a named list with two elements. But those details are hidden so that it behaves like a vector:

To access the underlying fields we need to use `field()`

and `fields()`

:

```
fields(x)
#> [1] "n" "d"
field(x, "n")
#> [1] 1 1 1 1 1 1 1 1 1 1
```

This allows us to create a format method:

```
format.vctrs_rational <- function(x, ...) {
n <- field(x, "n")
d <- field(x, "d")
out <- paste0(n, "/", d)
out[is.na(n) | is.na(d)] <- NA
out
}
vec_ptype_abbr.vctrs_rational <- function(x) "rtnl"
vec_ptype_full.vctrs_rational <- function(x) "rational"
x
#> <rational[10]>
#> [1] 1/1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10
```

vctrs uses the `format()`

method in `str()`

, hiding the underlying implementation details from the user:

For `rational`

, `vec_type2()`

and `vec_cast()`

follow the same pattern as `percent()`

. I allow coercion from integer and to doubles.

```
vec_type2.vctrs_rational <- function(x, y, ...) UseMethod("vec_type2.vctrs_rational", y)
vec_type2.vctrs_rational.default <- function(x, y, ..., x_arg = "", y_arg = "") {
stop_incompatible_type(x, y, x_arg = x_arg, y_arg = y_arg)
}
vec_type2.vctrs_rational.vctrs_unspecified <- function(x, y, ...) x
vec_type2.vctrs_rational.vctrs_rational <- function(x, y, ...) new_rational()
vec_type2.vctrs_rational.integer <- function(x, y, ...) new_rational()
vec_type2.integer.vctrs_rational <- function(x, y, ...) new_rational()
vec_cast.vctrs_rational <- function(x, to) UseMethod("vec_cast.vctrs_rational")
vec_cast.vctrs_rational.default <- function(x, to) stop_incompatible_cast(x, to)
vec_cast.vctrs_rational.logical <- function(x, to) vec_unspecified_cast(x, to)
vec_cast.vctrs_rational.vctrs_rational <- function(x, to) x
vec_cast.double.vctrs_rational <- function(x, to) field(x, "n") / field(x, "d")
vec_cast.vctrs_rational.integer <- function(x, to) rational(x, 1)
vec_c(rational(1, 2), 1L, NA)
#> <rational[3]>
#> [1] 1/2 1/1 <NA>
```

The previous implementation of `decimal`

was built on top of doubles. This is a bad idea because decimal vectors are typically used when you care about precise values (i.e. dollars and cents in a bank account), and double values suffer from floating point problems.

A better implementation of a decimal class would be to use pair of integers, one for the value to the left of the decimal point, and the other for the value to the right (divided by a `scale`

). The code is a very quick sketch of how you might start creating such a class:

```
new_decimal2 <- function(l, r, scale = 2L) {
vec_assert(l, ptype = integer())
vec_assert(r, ptype = integer())
vec_assert(scale, ptype = integer(), size = 1L)
new_rcrd(list(l = l, r = r), scale = scale, class = "vctrs_decimal2")
}
decimal2 <- function(l, r, scale = 2L) {
l <- vec_cast(l, integer())
r <- vec_cast(r, integer())
c(l, r) %<-% vec_recycle_common(l, r)
scale <- vec_cast(scale, integer())
# should check that r < 10^scale
new_decimal2(l = l, r = r, scale = scale)
}
format.vctrs_decimal2 <- function(x, ...) {
val <- field(x, "l") + field(x, "r") / 10^attr(x, "scale")
sprintf(paste0("%.0", attr(x, "scale"), "f"), val)
}
decimal2(10, c(0, 5, 99))
#> <vctrs_decimal2[3]>
#> [1] 10.00 10.05 10.99
```

vctrs provides two “proxy” generics that lets you control how your class determines equality and ordering:

`vec_proxy_equal()`

specifies how to test the elements of your vector for equality. This proxy underpins`==`

and`!=`

, but also`unique()`

,`anyDuplicated()`

, and`is.na()`

.`vec_proxy_compare()`

specifies how to compare the elements of your vector. This proxy is used in`<`

,`<=`

,`>=`

,`>`

,`min()`

,`max()`

,`median()`

,`quantile()`

, and`xtfrm()`

(used in`order()`

and`sort()`

) methods.

It’s a good idea to define methods for these generics because you get a lot of behaviour for relatively little work.

These proxy functions should always return a simple object (typically either a bare vector or a data frame), that possesses the same properties as your class. This permits efficient implementation of the vctrs internals because it allows dispatch to happen once in R, and then efficient computations can be written in C.

Let’s explore these ideas by with the rational class we started on above. By default, `vec_proxy_equal()`

converts a record to a data frame, and the default comparison works column by column:

```
x <- rational(c(1, 2, 1, 2), c(1, 1, 2, 2))
x
#> <rational[4]>
#> [1] 1/1 2/1 1/2 2/2
vec_proxy_equal(x)
#> n d
#> 1 1 1
#> 2 2 1
#> 3 1 2
#> 4 2 2
x == rational(1, 1)
#> [1] TRUE FALSE FALSE FALSE
```

This makes sense as a default, but isn’t correct here because `rational(1, 1)`

represents the same number as `rational(2, 2)`

so they should be equal. We can fix that by implementation `vec_proxy_equal()`

method that by divides `n`

and `d`

by their greatest common divisor:

```
# Thanks to Matthew Lundberg: https://stackoverflow.com/a/21504113/16632
gcd <- function(x, y) {
r <- x %% y
ifelse(r, gcd(y, r), y)
}
vec_proxy_equal.vctrs_rational <- function(x) {
n <- field(x, "n")
d <- field(x, "d")
gcd <- gcd(n, d)
data.frame(n = n / gcd, d = d / gcd)
}
vec_proxy_equal(x)
#> n d
#> 1 1 1
#> 2 2 1
#> 3 1 2
#> 4 1 1
x == rational(1, 1)
#> [1] TRUE FALSE FALSE TRUE
```

`vec_proxy_equal()`

is also used by `unique()`

:

We need fix `sort()`

similarly, since it currently sorts by `n`

, then by `d`

:

The easiest fix is to convert the fraction to a decimal and then sort that:

```
vec_proxy_compare.vctrs_rational <- function(x) {
field(x, "n") / field(x, "d")
}
sort(x)
#> <rational[4]>
#> [1] 1/2 1/1 2/2 2/1
```

(We could have used the same approach in `vec_proxy_equal()`

, but when working with floating point numbers it’s not necessarily true that `x == y`

implies that `d * x == d * y`

.)

A related problem occurs if we build our vector on top of a list. The following code defines a polynomial class that represents polynomials (like `1 + 3x - 2x^2`

) using a list of integer vectors (like `c(1, 3, -2)`

). Note the use of `new_list_of()`

in the constructor.

```
new_poly <- function(x) {
new_list_of(x, ptype = integer(), class = "vctrs_poly")
}
poly <- function(...) {
x <- list(...)
x <- lapply(x, vec_cast, integer())
new_poly(x)
}
vec_ptype_full.vctrs_poly <- function(x) "polynomial"
vec_ptype_abbr.vctrs_poly <- function(x) "poly"
format.vctrs_poly <- function(x, ...) {
format_one <- function(x) {
if (length(x) == 0) {
return("")
} else if (length(x) == 1) {
format(x)
} else {
suffix <- c(paste0("\u22C5x^", seq(length(x) - 1, 1)), "")
out <- paste0(x, suffix)
out <- out[x != 0L]
paste0(out, collapse = " + ")
}
}
vapply(x, format_one, character(1))
}
obj_print_data.vctrs_poly <- function(x) {
if (length(x) == 0)
return()
print(format(x), quote = FALSE)
}
p <- poly(1, c(1, 0, 1), c(1, 0, 0, 0, 2))
p
#> <polynomial[3]>
#> [1] 1 1⋅x^2 + 1 1⋅x^4 + 2
```

The resulting objects will inherit from the `vctrs_list_of`

class, which provides tailored methods for `$`

, `[[`

, the corresponding assignment operators, and other methods.

```
class(p)
#> [1] "vctrs_poly" "vctrs_list_of" "vctrs_vctr"
p[2]
#> <polynomial[1]>
#> [1] 1⋅x^2 + 1
p[[2]]
#> [1] 1 0 1
```

Equality works out of the box because we can tell if two integer vectors are equal:

```
p == poly(c(1, 0, 1))
#> [1] FALSE TRUE FALSE
```

But we can’t order them, because lists are not comparable:

So we need to define a `vec_proxy_compare()`

method:

```
vec_proxy_compare.vctrs_poly <- function(x) {
x_raw <- vec_data(x)
# First figure out the maximum length
n <- max(vapply(x_raw, length, integer(1)))
# Then expand all vectors to this length by filling in with zeros
full <- lapply(x_raw, function(x) c(rep(0L, n - length(x)), x))
# Then turn into a data frame
as.data.frame(do.call(rbind, full))
}
sort(poly(3, 2, 1))
#> <polynomial[3]>
#> [1] 1 2 3
sort(poly(1, c(1, 0, 0), c(1, 0)))
#> <polynomial[3]>
#> [1] 1 1⋅x^1 1⋅x^2
```

vctrs also provides two mathematical generics that allow you to define a broad swath of mathematical behaviour at once:

`vec_math(fun, x, ...)`

specifies the behaviour of mathematical functions like`abs()`

,`sum()`

, and`mean()`

. (See`?vec_math()`

for the complete list.)`vec_arith(op, x, y)`

specifies the behaviour of the arithmetic operations like`+`

,`-`

, and`%%`

. (See`?vec_arith()`

for the complete list.)

Both generics define the behaviour for multiple functions because `sum.vctrs_vctr(x)`

calls `vec_math.vctrs_vctr("sum", x)`

, and `x + y`

calls `vec_math.x_class.y_class("+", x, y)`

. They’re accompanied by `vec_math_base()`

and `vec_arith_base()`

which make it easy to call the underlying base R functions.

`vec_arith()`

uses double dispatch, and needs the following standard boilerplate:

```
vec_arith.MYCLASS <- function(op, x, y) {
UseMethod("vec_arith.MYCLASS", y)
}
vec_arith.MYCLASS.default <- function(op, x, y) {
stop_incompatible_op(op, x, y)
}
```

I showed an example of `vec_math()`

to define `sum()`

and `mean()`

methods `cached_sum`

. Now let’s talk about exactly how it works. Most `vec_math()`

functions will have a similar form. You use a switch statement to handle the methods that you care about, and fall back to `vec_math_base()`

for those that you don’t care about.

```
vec_math.vctrs_cached_sum <- function(fun, x, ...) {
switch(fun,
sum = attr(x, "sum"),
mean = attr(x, "sum") / length(x),
vec_math_base(fun, x, ...)
)
}
```

To explore the infix arithmetic operators exposed by `vec_arith()`

I’ll create a new class that represents a measurement in `meter`

s:

```
new_meter <- function(x) {
stopifnot(is.double(x))
new_vctr(x, class = "vctrs_meter")
}
format.vctrs_meter <- function(x, ...) {
paste0(format(vec_data(x)), " m")
}
meter <- function(x) {
x <- vec_cast(x, double())
new_meter(x)
}
x <- meter(1:10)
x
#> <vctrs_meter[10]>
#> [1] 1 m 2 m 3 m 4 m 5 m 6 m 7 m 8 m 9 m 10 m
```

Because `meter`

is built on top of a double vector, basic mathematic operations work:

But we can’t do arithmetic:

```
x + 1
#> <vctrs_meter> + <double> is not permitted
meter(10) + meter(1)
#> <vctrs_meter> + <vctrs_meter> is not permitted
meter(10) * 3
#> <vctrs_meter> * <double> is not permitted
```

To allow these infix functions to work, we’ll need to provide `vec_arith()`

generic. But before we do that, let’s think about what combinations of inputs we should support:

It makes sense to add and subtract meters: that yields another meter. We can divide a meter by another meter (yielding a unitless number), but we can’t multiply meters (because that would yield an area.)

For a combination of meter and number multiplication and division by a number are acceptable. Addition and subtraction don’t make much sense as we, strictly speaking, are dealing with objects of different nature.

`vec_arith()`

is another function that uses double dispatch, so as usual we start with a template.

```
vec_arith.vctrs_meter <- function(op, x, y) {
UseMethod("vec_arith.vctrs_meter", y)
}
vec_arith.vctrs_meter.default <- function(op, x, y) {
stop_incompatible_op(op, x, y)
}
```

Then write the method for two meter objects. We use a switch statement to cover the cases we care about, and `stop_incompatible_op()`

to throw an informative error message for everything else.

```
vec_arith.vctrs_meter.vctrs_meter <- function(op, x, y) {
switch(
op,
"+" = ,
"-" = new_meter(vec_arith_base(op, x, y)),
"/" = vec_arith_base(op, x, y),
stop_incompatible_op(op, x, y)
)
}
meter(10) + meter(1)
#> <vctrs_meter[1]>
#> [1] 11 m
meter(10) - meter(1)
#> <vctrs_meter[1]>
#> [1] 9 m
meter(10) / meter(1)
#> [1] 10
meter(10) * meter(1)
#> <vctrs_meter> * <vctrs_meter> is not permitted
```

Next we write the pair of methods for arithmetic with a meter and a number. These are almost identical, but while `meter(10) / 2`

makes sense, `2 / meter(10)`

does not (as addition and subtraction).

```
vec_arith.vctrs_meter.numeric <- function(op, x, y) {
switch(
op,
"/" = ,
"*" = new_meter(vec_arith_base(op, x, y)),
stop_incompatible_op(op, x, y)
)
}
vec_arith.numeric.vctrs_meter <- function(op, x, y) {
switch(
op,
"*" = new_meter(vec_arith_base(op, x, y)),
stop_incompatible_op(op, x, y)
)
}
meter(2) * 10
#> <vctrs_meter[1]>
#> [1] 20 m
10 * meter(2)
#> <vctrs_meter[1]>
#> [1] 20 m
meter(20) / 10
#> <vctrs_meter[1]>
#> [1] 2 m
10 / meter(20)
#> <double> / <vctrs_meter> is not permitted
meter(20) + 10
#> <vctrs_meter> + <double> is not permitted
```

For completeness, we also need `vec_arith.vctrs_meter.MISSING`

for the unary `+`

and `-`

operators:

`NAMESPACE`

declarationsDefining S3 methods interactively is fine for iteration and exploration, but if your vector lives in a package, you also need to register the S3 methods by listing them in the `NAMESPACE`

file. The namespace declarations are a little tricky because (e.g.) `vec_cast.vctrs_percent()`

is both a generic function (which must be exported with `export()`

) and an S3 method (which must be registered with `S3method()`

).

This problem wasn’t considered in the design of roxygen2, so you have to be quite explicit:

```
#' @method vec_cast vctrs_percent
#' @export
#' @export vec_cast.vctrs_percent
vec_cast.vctrs_percent <- function(x, to) {
}
```

You also need to register the individual double-dispatch methods. Again, this is harder than it should be because roxygen’s heuristics aren’t quite right. That means you need to describe the `@method`

explicitly:

Hopefully future versions of roxygen will make these exports less painful.