Usage of Factor in R

Hello everybody,

today I want to make post about usage of factor in R.

So, first of all, what is factor at all? 

I was surprised to see that factor is very overloaded term and can have a lot of meanings in many areas.  So, for what factors can be used in R?

First of all they can be used for storing categorical or nominal values. For example it can be convenient for saving gender, blood group, car type, etc. 

So, let's say you wrote something like this: 

cars.classes <- c("Coupe", "Cabriolet", "Sport", "Hatchback")

For now cars.classes is not a factor, but just character vector. 

If you execute this code in RStudio, and then execute in RStudio following line:

cars.classes

you'll see four character strigns:

[1] "Coupe"     "Cabriolet" "Sport"     "Hatchback"

That can be idea for analysis, but if you need perfomance, then you need to think about some way to transform your character strings as numbers. In C# and C++ it is similar to representing categories as enums. 

So for C# prospective how to present strings as enums from C# prospective is simple way:

cars.classes <- factor(c("Coupe", "Cabriolet", "Sport", "Hatchback"))
cars.classes

and here what will be outputted:

[1] Coupe     Cabriolet Sport     Hatchback

Levels: Cabriolet Coupe Hatchback Sport

And take note, that if to compare with first ouptut, you will not see ". What it means? It means, that R treats Coupe, Cabriolet, Sport, Hatchback as numbers. If you as me, want to know what it does behind curtains, you can check it with as.numeric operator, how he understand them:

if you type in console:

as.numeric(cars.classes) 

you will see the following:

> as.numeric(cars.classes)

[1] 2 1 4 3

So, R treats classes as numbers, but question may be why it gave so strange numbers? I mean 2, 1, 4, 3. That is due to the reason that by default levels assigned due to alphabetical order.

Now one million dolars question, how to make them numbered as I want?

Here is recipe:

cars.classes <- factor(c("Coupe", "Cabriolet", "Sport", "Hatchback"), levels=c("Coupe", "Cabriolet", "Sport", "Hatchback"))

and it will give you the following output:

> cars.classes

[1] Coupe     Cabriolet     Sport     Hatchback

Levels: Coupe     Cabriolet     Sport     Hatchback

Here is another proof how it is created:

str(cars.classes)

gives following output:

Factor w/ 4 levels "Coupe","Cabriolet",..: 1 2 3 4

which means: factor with four levels, Coupe, Cabriolet, which has assigned numbers 1, 2, 3, 4.

No Comments

Add a Comment