R – import and export data from the outside

Being able to work with data, import them, export them and other similar operations, is a very important operation to start working and performing statistics and data analysis. Here are some fundamental concepts and operations that allow you to convert external data in different formats into data frames, or in any case into types of data that can be easily used on R.

R - import and export data from the outside

Workspace

The R platform works on a workspace directory which is technically called the workspace directory. In order to know the current path directory, the getwd() command is used.

> getwd()
[1] "D:/R-4.0.3/bin"

To set a different directory as a workspace, use setwd() instead.

> setwd("D:/")
> getwd()
[1] "D:/"

To know the files contained within the working directory you can use the command dir().

> dir()
[1] "models.csv"                "R-4.0.3"
[3] "System Volume Information"

Import data from Excel tables

In many professional activities, a lot of data is processed and stored in Excel tables. A very useful operation is to import data into tables directly into the R platform in a simple way.

The excel sheet from which we want to import the data opens. We select the table to import and copy it (CTRL + C).

R - ottenere dati dall'esterno - tabella excel

Now let’s open a session on R and enter the following command:

> tab <- read.table("clipboard", header=T, sep="\t")
> tab
   ID       Model    Class
1 123   Optical X Economic
2 154 Optical X 2 Economic
3 234   Wellness  Business
4 235  Wellness B Business
5 433        Kart   Luxury
6 455      Utopia   Luxury

Now the table data is in the form of a very common dataframe.

As you can see, the read.table() function allows us to read the data present in the temporary memory, the clipboard. If the table has headers to import, as in our case, it sets TRUE to the header key passed as a parameter. Furthermore, being an Excel sheet, the separator character “\ t” which corresponds to the TAB is selected.

To get the names of the headers we use the names() function.

> names(tab)
[1] "ID"    "Model" "Class"

Reading CSV files

Another type of file that contains tabular data is that contained in CSV files. Again you can use the read.table() function.

> models <- read.table(file="models.csv", header=T, sep=";")
> models
   ID       Model    Class
1 123   Optical X Economic
2 154 Optical X 2 Economic
3 234   Wellness  Business
4 235  Wellness B Business
5 433        Kart   Luxury
6 455      Utopia   Luxury

Import data directly from the web

But today, most of the data is on the web. There are some sites, which we could define as data sources, that every data scientist should know as sources of useful data.

One of these is data.gov, a site that provides a lot of public data from the US. For example, on the page recently loaded, data on Covid 19 are available.

R - dati da web import csv file

For example we select the link of the CSV containing the data on covid 19. Make a copy of it and then pass it as a parameter of the read.csv() function.

> covid19 <- read.csv("https://healthdata.gov/sites/default/files/reported_hos$

The system will take some time to download all the data from the link and convert it to a dataframe. Once you get the > prompt again, the platform will be ready to accept commands again. The loaded table is a huge amount of data. For example if we want to know the size of the dataframe, we use the dim() function.

> dim(covid19)
[1] 87369    93

As we can see, the loaded dataframe has over 80,000 rows and 93 columns. Once we know the dataframe and understand which columns and rows are we are interested in, we can extract a new dataframe from the total one that contains only the data that interest us.

df <- covid19[1:100,c(1:10,68)]

This is just one of the many possibilities to make a selection. In this case we select the first 100 rows (1: 100) and leave only the first 10 columns with the addition of the 68 column.

Export the data frames as CSV

With R it is also possible to perform the reverse operation. That is, once the data has been processed in a dataframe within the R platform, it can be exported as a CSV file.

write.table(models, "models2.csv")

A new CSV file containing the tabular data of the dataframe will be generated in the workspace directory.

R - models 2 on csv

Leave a Reply