R - Pitfalls in converting a factor to a numeric value

Sometimes when you’ve done some data manipulation or read a new file, it might happen that a numerical attribute, like an ID, is stored as a factor. Let’s have a look at the following example: Here I simply defined a vector of numbers (e.g. ids) and converted the values to factors.

1
2
3
id <- factor(seq(10000, 20000, 1))
> str(id)
 Factor w/ 10001 levels "10000","10001",..: 1 2 3 4 5 6 7 8 9 10 ...

In one of my scripts, I wanted to convert the factor back to a numerical value. For this purpose, I used the following function without actually having a look at the result:

1
2
3
as.numeric(id)
> str(as.numeric(id))
 num [1:10001] 1 2 3 4 5 6 7 8 9 10 ...

You will see that a numerical vector is returned, which starts from 1 and goes up to 10001 (instead of from 10000 to 20000, what I expected). In retrospect, this seemed logical, since factors don’t care if their values look like numbers or characters. But in practice, this might lead to a lot of confusion, especially if you try to join different datasets by an ID that was converted in the wrong way. So, if you want to convert a factor back to a numerical value, you should use the following lines instead:

1
2
3
as.numeric(as.character(id))
> str(as.numeric(as.character(id)))
 num [1:10001] 10000 10001 10002 10003 10004 ...