A walkthrough with hardware sizing recommendations
Original post by William Hoang
TimescaleDB is an open-source scalable SQL database built for time-series data, optimized for fast ingest and complex queries. It speaks “full SQL” and is correspondingly easy to use like a traditional relational database, yet scales in ways previously reserved for NoSQL databases. (More info in our Docs.)
In this post, we will walk you through how to set up TimescaleDB on Amazon Web Services (AWS).
You will learn how to:
- Get up and running on AWS
- Set up your EC2 instance
- Install TimescaleDB
- Configure PostgreSQL
- Use TimescaleDB
Let’s get started!
Getting up and running with AWS
Head to https://aws.amazon.com/ and create your AWS account. Enter the required credentials and access your AWS console. You should see a similar page as the one below:
On the top left, select the ‘Services’ tab and a panel of AWS services will be provided. Next select ‘EC2’:
Click on ‘Launch Instance’ to view the Amazon Machine Image (AMI):
We will select the ‘Ubuntu Server 16.04 LTS’ image or later as it aligns with one of TimescaleDB’s Linux distributions at the time of writing. (See a list of Timescale Linux distributions here.)
Next we will set up our instance type.
Note: In a production environment we’d recommend running a R4.4xlarge. We recommend this machine size because the inserts are a high-disk-throughput operation and so the expected throughput of the machine is important. (For more information, please see the expected throughput for each type, here.)
In addition, in production we’d recommend 2 EBS volumes:
- 1 for the WAL: GP2, 350 GB (Even though the WAL only needs ~10GB, we recommend a larger volume size for the higher IOPS)
- 1 for the Data: GP2, 5 TB
But for the purpose of this tutorial, we will select the ‘t2.micro’ instance and click ‘Review and Launch’ to jump ahead to the final step.
Now we’re on the last and final step, where we review the AMI details before launching the instance. For this tutorial we are using Ubuntu Server 16.04 LTS and have the type set to t2.micro.
Since we are not doing any performance testing, it is best to select the minimal configuration machine settings, with the security set to default.
Upon a final check, click on ‘Launch’:
Next, you will be brought to the key pair page where we will encrypt and decrypt login information for the remote instance.
In this tutorial we will create a new key named ‘timescale.’ You may name your key accordingly or use an existing key if you have one. You can also choose ‘Proceed without a key pair,’ but we recommend that you secure the instance with a key.
Next, click ‘Download Key Pair’ and save it in a secure location on your local computer.
Then click ‘Launch Instance’ to start running the instance remotely:
To determine if the instance is running, you will see a similar page like the one below where it shows the instance being initialized.
Click on the instance ID to see the details:
The instance details page will be displayed as shown below. Next we need to connect to our instance locally in order to install TimescaleDB remotely.
Then you will locate the command for SSH and copy the entire line into your clipboard.
For the example we have:
ssh -i “timescale.pem” ubuntu@ec2–54–69–235–152.us-west-2.compute.amazonaws.com
We will now go to our local file system and locate where we saved our .pem key pair file. (I am on a Mac and will open the Terminal Application and then locate the directory that contains the file.)
We will change the permission on the ‘timescale.pem’ file in order to access our instance next. Within your terminal at the location of the .pem file, execute:
sudo chmod 400 timescale.pem
If prompted, enter your local computer’s password. Next we will connect to our instance. (For more information on Amazon’s key pairs see here.)
Execute the SSH command that you copied in your AWS instance. For example: ssh -i “timescale.pem” ubuntu@ec2–54–69–235–152.us-west-2.compute.amazonaws.com
You should be logged into the remote Ubuntu instance you just setup.
We will now prepare our machine to install TimescaleDB!
Setting up your EC2 instance
For this section we will reference and walk through the TimescaleDB installation guide. Now that we are in our instance, we will add TimescaleDB’s Personal Package Archive via
sudo add-apt-repository ppa:timescale/timescaledb-ppa
and then enter your password, if prompted:
Next we will update our apt-get:
sudo apt-get update
With the packages dependencies installed, we can set up TimescaleDB in our instance by executing:
sudo apt install -y timescaledb-postgresql-9.6
(If you have PostgreSQL 10, install
With TimescaleDB installed, we can configure PostgreSQL to work in our example at the end of this tutorial.
First, let’s check the version of PostgreSQL that we have installed:
For TimescaleDB and our current tutorial, you should have version:
psql (PostgreSQL) 9.6.3
Next, go into the PostgreSQL configuration files by accessing the folder:
Change the permission settings for the postgresql.conf file:
sudo chmod 644 postgresql.conf
Now we have the ability to write to file. Using a text editor (e.g. ‘nano’) let‘s make some modifications:
Under the ‘CONNECTIONS AND AUTHENTICATION’ heading and within the ‘Connection Settings’ area (see below), uncomment the ‘listen_addresses’ attribute and add in the value ‘*’ to set all IP addresses to listen on:
The default value is ‘localhost’ so what you should now have is:
listen_addresses = ‘*’ # what IP address(es) to listen on;
Within the ‘RESOURCE USAGE’ section of the postgresql.conf file,
Find the ‘Kernel Resource Usage’ area (see below). Uncomment the ‘shared_preload_libraries’ attribute and add in ‘timescaledb’ as its value.
What you should have is:
shared_preload_libraries = ‘timescaledb’
Save the postgresql.conf file and exit out of your text editor.
Now we need to restart the instance of PostgreSQL by running:
sudo service postgresql restart
Then execute the following command:
sudo su - postgres
We can now use TimescaleDB.
Now that you have TimescaleDB installed and configured, let‘s walk through an example of running TimescaleDB on our remote AWS instance.
Note: The Postgres server needs to be running for the next step to work.
Within your instance, type in:
psql -U postgres
You should then be in the PostgreSQL instance:
Now we will walk through the steps of working with TimescaleDB as outlined in the ‘Getting Started: Setup’ page in the docs.
- Create a database named ‘tutorial’
CREATE DATABASE tutorial;
2. Connect to the database
Now you may connect to the database each time by executing:
psql -U postgres -h localhost -d tutorial
3. Initialize extension
CREATE EXTENSION timescaledb;
4. Create a hypertable
First, create a vanilla PostgreSQL table:
CREATE TABLE conditions ( time TIMESTAMPTZ NOT NULL, location TEXT NOT NULL, temperature DOUBLE PRECISION NULL, humidity DOUBLE PRECISION NULL );
Then, referencing the Timescale APIs, convert the table via the create_hypertable() function. This transforms our table into a hypertable.
SELECT create_hypertable('conditions', 'time');
5. Insert data into the hypertable
INSERT INTO conditions(time, location, temperature, humidity) VALUES (NOW(), 'office', 70.0, 50.0);
6. Query data from the hypertable
SELECT * FROM conditions ORDER BY time DESC LIMIT 100;
Voila! You have set up TimescaleDB on AWS.
For some lengthier tutorials please check out Tutorials, which includes a walkthrough on using PostGIS with TimescaleDB, as well as this blog post: Analyzing Ethereum, Bitcoin, and 1200+ other Cryptocurrencies.
And if you’d like to learn more about TimescaleDB and support us, please check out our GitHub, star us, and let us know how we can help out.