Title: | Fast Onehot Encoding for Data.frames |
---|---|
Description: | Quickly create numeric matrices for machine learning algorithms that require them. It converts factor columns into onehot vectors. |
Authors: | Eric E. Graves [aut, cre] |
Maintainer: | Eric E. Graves <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2025-02-24 03:47:06 UTC |
Source: | https://github.com/gravesee/onehot |
Make column names for a onehot object
make_names(info, sep)
make_names(info, sep)
x |
a |
data(iris) encoder <- onehot(iris) make_names(encoder$Species)
data(iris) encoder <- onehot(iris) make_names(encoder$Species)
Onehot Encode a data.frame
onehot(data, sentinel = -999, max_levels = 10, add_NA_factors = TRUE)
onehot(data, sentinel = -999, max_levels = 10, add_NA_factors = TRUE)
data |
data.frame to convert factors into onehot encoded columns |
sentinel |
Numeric value with which to replace NAs. Applies to numeric columns only. |
max_levels |
maximum number of levels to onehot encode per factor variable. Factors with levels exceeding this number will be skipped. |
add_NA_factors |
if TRUE, adds NA indicator column for factors. |
By default, with addNA=FALSE
, no NAs are returned for
non-factor columns. Indicator columns are created for factor levels and NA
factors are ignored. The exception is when NA is an explicit factor level.
stringsAsFactrs=TRUE
will convert character columns to factors first.
Other wise characters are ignored. Only factor, numeric, integer, and logical
vectors are valid for onehot. Other classes will be skipped entirely.
addNA=TRUE
will create indicator columns for every field. This will
add ncols columns to the output matrix. A sparse matrix may be better in
such cases.
a onehot
object descrbing how to transform the data
data(iris) encoder <- onehot(iris) ## add NA indicator columns encoder <- onehot(iris, add_NA_factors=TRUE) ## limit which factors are onehot encoded encoder <- onehot(iris, max_levels=5) ## Impute numeric NA values with sentinel value encoder <- onehot(iris, sentinel=-1)
data(iris) encoder <- onehot(iris) ## add NA indicator columns encoder <- onehot(iris, add_NA_factors=TRUE) ## limit which factors are onehot encoded encoder <- onehot(iris, max_levels=5) ## Impute numeric NA values with sentinel value encoder <- onehot(iris, sentinel=-1)
Predict onehot objects
## S3 method for class 'onehot' predict(object, data, sparse = FALSE, sep = "_", ...)
## S3 method for class 'onehot' predict(object, data, sparse = FALSE, sep = "_", ...)
object |
an object of class |
data |
a data.frame to onehot encode useing |
sparse |
if TRUE, returns a |
... |
further arguments passed to or from other methods |
a matrix with factor variable onehot encoded
data(iris) encoder <- onehot(iris) x <- predict(encoder, iris) x_sparse <- predict(encoder, iris, sparse=TRUE)
data(iris) encoder <- onehot(iris) x <- predict(encoder, iris) x_sparse <- predict(encoder, iris, sparse=TRUE)
Print information about a onehot object
## S3 method for class 'onehot' print(x, ...)
## S3 method for class 'onehot' print(x, ...)
x |
onehot object to print |
... |
other arguments pass to or from other functions |
Generate SAS code for onehot object
sas(x, sep = "_", ...)
sas(x, sep = "_", ...)
x |
a |
sep |
a character vector used to separate the name of a factor from the value. |
Returns a character vector of SAS code that can be written to file
useing writeLines
Summarize onehot object
## S3 method for class 'onehot' summary(object, ...)
## S3 method for class 'onehot' summary(object, ...)
object |
a onehot object |
... |
other arguments pass to or from other functions |
## Create some dummy data with different column types x <- data.frame(HairEyeColor) x$Hair <- as.character(x$Hair) ## Create a onehot object encoder <- onehot(x) ## Return a list with summary information summary(encoder)
## Create some dummy data with different column types x <- data.frame(HairEyeColor) x$Hair <- as.character(x$Hair) ## Create a onehot object encoder <- onehot(x) ## Return a list with summary information summary(encoder)