Package 'onehot'

Title: Fast Onehot Encoding for Data.frames
Description: Quickly create numeric matrices for machine learning algorithms that require them. It converts factor columns into onehot vectors.
Authors: Eric E. Graves [aut, cre]
Maintainer: Eric E. Graves <[email protected]>
License: MIT + file LICENSE
Version: 0.1.3
Built: 2025-02-24 03:47:06 UTC
Source: https://github.com/gravesee/onehot

Help Index


Make column names for a onehot object

Description

Make column names for a onehot object

Usage

make_names(info, sep)

Arguments

x

a onehot object

Examples

data(iris)
encoder <- onehot(iris)
make_names(encoder$Species)

Onehot Encode a data.frame

Description

Onehot Encode a data.frame

Usage

onehot(data, sentinel = -999, max_levels = 10, add_NA_factors = TRUE)

Arguments

data

data.frame to convert factors into onehot encoded columns

sentinel

Numeric value with which to replace NAs. Applies to numeric columns only.

max_levels

maximum number of levels to onehot encode per factor variable. Factors with levels exceeding this number will be skipped.

add_NA_factors

if TRUE, adds NA indicator column for factors.

Details

By default, with addNA=FALSE, no NAs are returned for non-factor columns. Indicator columns are created for factor levels and NA factors are ignored. The exception is when NA is an explicit factor level.

stringsAsFactrs=TRUE will convert character columns to factors first. Other wise characters are ignored. Only factor, numeric, integer, and logical vectors are valid for onehot. Other classes will be skipped entirely.

addNA=TRUE will create indicator columns for every field. This will add ncols columns to the output matrix. A sparse matrix may be better in such cases.

Value

a onehot object descrbing how to transform the data

Examples

data(iris)
encoder <- onehot(iris)

## add NA indicator columns
encoder <- onehot(iris, add_NA_factors=TRUE)

## limit which factors are onehot encoded
encoder <- onehot(iris, max_levels=5)

## Impute numeric NA values with sentinel value
encoder <- onehot(iris, sentinel=-1)

Predict onehot objects

Description

Predict onehot objects

Usage

## S3 method for class 'onehot'
predict(object, data, sparse = FALSE, sep = "_", ...)

Arguments

object

an object of class onehot

data

a data.frame to onehot encode useing object

sparse

if TRUE, returns a dgCMatrix-class

...

further arguments passed to or from other methods

Value

a matrix with factor variable onehot encoded

Examples

data(iris)
encoder <- onehot(iris)
x <- predict(encoder, iris)
x_sparse <- predict(encoder, iris, sparse=TRUE)

Print information about a onehot object

Description

Print information about a onehot object

Usage

## S3 method for class 'onehot'
print(x, ...)

Arguments

x

onehot object to print

...

other arguments pass to or from other functions


Generate SAS code for onehot object

Description

Generate SAS code for onehot object

Usage

sas(x, sep = "_", ...)

Arguments

x

a onehot object

sep

a character vector used to separate the name of a factor from the value.

Value

Returns a character vector of SAS code that can be written to file useing writeLines


Summarize onehot object

Description

Summarize onehot object

Usage

## S3 method for class 'onehot'
summary(object, ...)

Arguments

object

a onehot object

...

other arguments pass to or from other functions

Examples

## Create some dummy data with different column types
x <- data.frame(HairEyeColor)
x$Hair <- as.character(x$Hair)

## Create a onehot object
encoder <- onehot(x)

## Return a list with summary information
summary(encoder)