Chapter 3 Download Data

You can download the following file types:

  • raw: 434 MHz tag detections
  • blu: 2.4 GHz tag detections
  • gps: Sensor Station lat/lon with timestamps
  • node_health: data on node battery, temperature, tag detections, etc.
  • telemetry: ???
  • sensorgnome: 166 MHz tag detections (Motus will need to translate the data into something usable)
  • log: Sensor Station log files

Trying to load all the data files into RStudio’s memory will lead to issues, namely maxing out on memory usage. Instead, you should create an SQL relational database. Using a database uses much less memory when combining or cleaning data frames compared to loading data directly into RStudio, which leaves more memory for analyses.

If you are new to relational databases, you should use DuckDB, a simple database management systems (DMBS) that is easy to install and use in RStudio.

# activate renv environment
renv::activate()

The celltracktech package includes all the packages you need. Download it from github using renv.

!NOTE The DuckDB install will take ~ 30 min. Please allocate time accordingly.

# install the celltracktech package using renv
library(renv)

renv::install('cellular-tracking-technologies/celltracktech')

If you want to download just your data files (in .csv.gz format), you can run the following script:

# load the celltracktech library
library(celltracktech)

# load env file into environment
load_dot_env(file='.env')

# Settings ----------------------------------------------------------------
my_token <- Sys.getenv('API_KEY') # load env variable into my_token
myproject <- "Meadows V2" # this is your project name on your CTT account, here we are using the CTT project 'Meadows V2'

# Create your data directory if it does not exist
outpath <- "./data/" # where your downloaded files are to go

# Create project name folder
create_outpath(paste0(outpath, myproject, '/'))

# Download just the data files onto your computer
get_my_data(my_token = my_token,
            outpath = outpath, 
            db_name = NULL,
            myproject = myproject,
            begin = as.Date("2023-08-01"),
            end = as.Date("2023-12-31"),
            filetypes=c("raw", "blu", "gps", "node_health", "sensorgnome", "telemetry", 'log')
)

3.1 Create a DuckDB database

Below is a sample script to create a DuckDB database

# Connect to Database using DuckDB -----------------------------------------------------
con <- DBI::dbConnect(duckdb::duckdb(), 
                      dbdir = "./data/Meadows V2/meadows.duckdb", 
                      read_only = FALSE)

3.2 Download data from the CTT API

You are now connected to your DuckDB database, but so far nothing is in the database. The script below will download your data from the CTT servers. If you do not want to use a database, you can still use the get_my_data() function to download the .csv.gz files.

NOTE! Your Sensor Station must be set to upload data to our servers. If your station does not upload data, skip this block and go to section 2.2.1.

get_my_data(my_token = my_token,
            outpath = outpath, 
            db_name = con, 
            myproject = myproject,
            begin = as.Date("2023-08-01"),
            end = as.Date("2023-12-31"),
            filetypes=c("raw", "blu", "gps", "node_health", "sensorgnome", "telemetry", 'log')
)

You may get this error message:

Error in post(endpoint = endpoint, payload = payload) : 
  Gateway Timeout (HTTP 504).

It is fine, just run the get_my_data() function again and it should work properly.

3.2.1 Create Database from Files on your Computer

If you already have your Sensor Station files on your computer (i.e. your sensor station is not connected to the internet), you can use the code block below to create a database and add those files to it.

celltracktech::create_database(my_token = my_token,
                               outpath = outpath,
                               myproject = myproject,
                               db_name = con)

3.3 Updating the database

Upload the compressed data (i.e. the ‘.csv.gz’ files) into your DuckDB database:

update_db(con, outpath, myproject)

3.4 Disconnecting from the database

The benefit of using a relational database is that it does not need to be loaded into your computer memory the entire time you are analyzing your data. You can connect to it, filter/clean the data you want, and then disconnect once you are done, freeing up computer memory for more intensive data analysis tasks.

To disconnect from the database, run the following command:

DBI::dbDisconnect(con)

3.5 Uploading Node Data from the SD Card

Note! This step is optional! If you are not uploading data from the Node SD cards, you can skip to chapter 3!

3.5.1 Create Nodes directory

To upload Node data from the SD cards, you will need to create a ‘nodes’ folder in the CTT Project Name folder. For example, we will create a folder in the ‘Meadows V2’ folder in the ‘./data/meadows/’ directory:

create_outpath('./data/Meadows V2/nodes/')

3.5.2 Create individual directories for each Node

Then, create a folder for each Node. In the example below, we are creating a folder for Node 3B8845:

create_outpath('./data/Meadows V2/nodes/3B8845/')

If you are only uploading data from the node SD cards frequently, you should create a parent folder for the date you removed the data from the SD card. The nodes save files incrementally (i.e. 434_mhz_beep_0, 434_mhz_beep1, etc.). If you remove the files, it will start back at 0 again. When you put the files with the same name in the same folder, one of them will be overwritten.

create_outpath('./data/Meadows V2/nodes/20250423/3B8845)')

3.5.3 Upload Node data to your DuckDB database

Remember, we need to re-connect to the DuckDB database to upload the Node data:

con <- DBI::dbConnect(duckdb::duckdb(), 
                      dbdir = "./data/Meadows V2/meadows.duckdb", 
                      read_only = FALSE)

# Import node data into your database
import_node_data(d = con,
                 outpath = outpath,
                 myproject = myproject,
                 station_id = '6CA25D375881')

# disconnect from the database
DBI::dbDisconnect(con)

If you get Error in files_loc[1, ] : incorrect number of dimensions you did not create the nodes folder in the correct location.

The import_node_data will import the node csv files into their own ‘node’ tables

  • node_raw
  • node_blu
  • node_gps
  • node_health_from_node

and insert the csv files into the regular “raw”, “blu”, and “node_health” tables.

You have finished uploading data to your database!

3.6 Create Postgres Database

If you or your lab are familiar with PostgreSQL, you can connect to your database with this:

# connect to Postgres database
con <- DBI::dbConnect(
  RPostgres::Postgres(),
  dbname="meadows"
)

Creating and editing a Postgres database is vastly different from a DuckDB database. You can find more information on Postgres and R here.

You should be able to do all the above as long as you use RPostgres::Postgres() in your database connection.