# Python

# Installation

# Install c_client

If you do not want to build from source, you can simply install the client via pip from here (opens new window)

The GridDB c_client (a preqrequisite to using the Python Client) can be found here: https://github.com/griddb/c_client (opens new window). The RPM is available here (opens new window). To get started simply wget the latest RPM and install.

$ wget \
https://github.com/griddb/c_client/releases/download/v4.2.0/griddb_c_client-4.2.0-1.linux.x86_64.rpm

then we need to actually install the rpm

$ sudo rpm -ivh griddb_c_client-4.2.0-1.linux.x86_64.rpm

and now the c_client is installed and ready in your /usr/ directory. That was easy!

# Install Python Client

Installing the Python Client is slightly more involved but still a very easy process. First, let's download the file from GitHub

$ wget \
https://github.com/griddb/python_client/archive/0.8.1.tar.gz

Next, let's unzip

$ tar xvzf 0.8.1.tar.gz

and let's install the prereqs

$ wget https://prdownloads.sourceforge.net/swig/swig-3.0.12.tar.gz
tar xvfz swig-3.0.12.tar.gz
cd swig-3.0.12
./configure
make 
sudo make install

And then we may need to install pcre as well

$ sudo yum install pcre2-devel.x86_64

Now of course we actually make our Python Client

$ cd ../python_client
make

If by chance you encounter the following error when attempting to make your Python Client

/usr/bin/ld: cannot find -lgridstore

do not worry: it is an easy fix. The issue lies with needing your Makefile to point to your c_client. This means the only thing we need to do is add the c_client/bin location in the LDFLAGS option

SWIG = swig -DSWIGWORDSIZE64
CXX = g++
ARCH = $(shell arch)
LDFLAGS = -L/home/israel/c_client/bin -lpthread -lrt -lgridstore #added /home/israel/c_client_bin right here
CPPFLAGS = -fPIC -std=c++0x -g -O2
INCLUDES = -Iinclude -Isrc
INCLUDES_PYTHON = $(INCLUDES)   \
                                -I/usr/include/python3.6m
PROGRAM = _griddb_python.so
EXTRA = griddb_python.py griddb_python.pyc
SOURCES =         src/TimeSeriesProperties.cpp \
                  src/ContainerInfo.cpp                 \
                  src/AggregationResult.cpp     \
                  src/Container.cpp                     \
                  src/Store.cpp                 \
                  src/StoreFactory.cpp  \
                  src/PartitionController.cpp   \
                  src/Query.cpp                         \
                  src/QueryAnalysisEntry.cpp                    \
                  src/RowKeyPredicate.cpp       \
                  src/RowSet.cpp                        \
                  src/TimestampUtils.cpp                        \
all: $(PROGRAM)
... snip ...

With the fix in place, make should work as intended. Next up: setting our environment variables. We just need to point to the proper locations:

$ export LIBRARY_PATH=$LIBRARY_PATH:[insert path to c_client]
$ export  PYTHONPATH=$PYTHONPATH:[insert path to python_client]
$ export LIBRARY_PATH=$LD_LIBRARY_PATH:[insert path to c_client/bin]

Now we should be able to use both c and python with our GridDB Cluster.

# Simulating an IoT Dataset

For this section, we will walk through a Python script which has the end goal of showcasing how to use Python with GridDB; it will also have the added benefit of teaching developers how to make a mock Internet of Things (IoT) dataset.

For this generated-dataset, we will be using TIMESERIES containers as they are most-often used when making an IoT dataset.

# Using Python

To use GridDB with Python, you will need to download the GridDB c_client (opens new window) from GitHub. Once that's done, you will also need the Python Client (opens new window). Alternatively, you can simply install via pip from here (opens new window)

# Getting our GridDB Connection and Schema

First let's grab our GridDB connection settings and gridstore obj. Here we will be using default values for all parameters. If you followed along with the official GridDB documentation, this should also work for your database directly.

Before that, let's also import our GridDB Connector and set some variables.

import griddb_python
griddb = griddb_python
factory = griddb.StoreFactory.get_instance()
store = factory.get_store(
    host="239.0.0.1",
    port=31999,
    cluster_name="defaultCluster",
    username="admin",
    password="admin"
)

Once the store variable has accurate connection settings for the currently running GridDB server, that variable can now directly run Gridstore functions.

Next let's look at the schema. For IoT data, we generally have many sensors outputting just one or two data points that may be of interest to the project. For this case, we will simulate just a sensor temperature and an arbitrary 'data' point, of which both will be floats.

    for i in range(numSensors):
        conInfo = griddb.ContainerInfo("sensor_" + str(i),
					[["timestamp", griddb.Type.TIMESTAMP],
		            ["data", griddb.Type.FLOAT],
                    ["temperature", griddb.Type.FLOAT]],
		            griddb.ContainerType.TIME_SERIES, True)
        col = store.put_container(conInfo)

In the above, we are looping through the user-selected variable numSensors to put that amount of sensors into the fake dataset. If using the script unchanged, it will insert 5 different sensors.

# Simulating Data

First, to start: this is how to actually use this script. When you want to simulate a dataset, you can simply run the entire script like so:

$ python3 generate_data.py 24 5

The first number is number of hours to simulate, while the second one is increments (in minutes). So this script will generate a dataset into your GridDB server over the timespan of now through 24 hours from now, with data emitting every 5 minutes from N sensors.

# Generating Random Data Points

The first thing to do when running the script is to set your parameters. In this case, we simply edit the numSensors var to be set to the number of sensors they would like to 'emit' data per incremented time span. As a default, it is set to 5: numSensors = 5.

From there, we simply convert the user's command line arguments into ints to work with our script

numSensors = 5
hours = int(argv[1])
minutes = int(argv[2])

Next we convert our user-set parameters to a uniform unit (milliseconds). From there, we take our values and figure out how many total emits our generated data will create (the arrLen variable)

    duration = hours * 3600000
    increment = minutes * 60000
    arrLen = ( int(duration) / int(increment) ) * numSensors

From there it's simply a matter of using some for loops to create different numbers/floats for each desired timestamp and to store that in an object which is eventually returned by the function.

containerEntry = {}
    collectionListRows = []
    for i in range(int(arrLen)):
        for j in range(numSensors):
            addedTime = i * minutes
			
            incTime = now + timedelta(minutes=addedTime)
            randData =  random.uniform(0,10000)
            randTemp = random.uniform(0,100)
            print("Data being inserted: " + str(j) + " " + str(incTime) + " " + str(randData) + "  " + str(randTemp))
            collectionListRows.append([incTime, randData, randTemp])
            containerEntry.update({"sensor_" + str(j): collectionListRows})
    store.multi_put(containerEntry)

This portion of the script will create random scripts from the python random library. It will also add time every loop iteration from now to simulate a real IoT dataset.

The last line of this portion is the GridDB multiPut. You can learn more about that here (opens new window).

# Querying

To run some basic queries, let's use the GridDB Shell (opens new window), which came to GridDB with version 4.6. To start, let's drop into the shell and do a basic check to make sure our data is there:

gs> load default.gsh
gs> connect $defaultCluster
The connection attempt was successful(NoSQL).
The connection attempt was successful(NewSQL).
gs[public]> sql select * from testing_0;
31,149 results. (26 ms)
gs[public]> get 10
timestamp,data,temperature
2021-07-28T21:01:00.282Z,72.40626,101.7
2021-07-28T21:07:43.152Z,33.06072,47.99
2021-07-28T21:09:52.166Z,8.167625,26.82
2021-07-28T21:49:33.682Z,80.57002,3.53
2021-07-28T21:54:38.294Z,28.507494,63.16
2021-07-28T21:55:24.377Z,18.945627,51.04
2021-07-28T21:56:09.467Z,13.514397,99.64
2021-07-28T21:58:40.906Z,88.73552,101.73
2021-07-28T21:58:55.059Z,43.575638,28.93
2021-07-28T21:59:42.647Z,38.931076,40.61
The 10 results had been acquired.

Great. Looks like our first sensor had 31k rows inserted.

Next let's try a time query. We can query all results from a range of 6 hours ago until now:

gs[public]> tql testing_0 select * where timestamp > TIMESTAMPADD(HOUR, NOW(), -6);
20,810 results. (3 ms)
gs[public]> get 3
timestamp,data,temperature
2021-08-04T16:28:34.566Z,7.682323,70.7
2021-08-04T16:29:28.943Z,22.910458,33.14
2021-08-04T16:30:34.566Z,89.90703,101.03
The 3 results had been acquired.

# Conclusion

With that, you can now generate as much IoT data as needed for your proof of concepts.

The complete source code can be downloaded from the GridDB.net GitHub (opens new window).