# Introduction

# Purpose of this documentation

This manual is targeted at designers and developers who perform system design and development using GridDB Community Edition.

The contents of this manual are as follows.

  • Structure of GridDB
    • Describes the cluster operating structure in GridDB.
  • The data model of GridDB
    • Describes the data model of GridDB.
  • Functions provided by GridDB
    • Describes the data management functions provided by GridDB.
  • Parameter
    • Describes the parameters to control the operations in GridDB.

# ⚠️ Note

  • GridDB is a database that manages a group of data (known as a row) that is made up of a key and multiple values. Besides having a composition of an in-memory database that arranges all the data in the memory, it can also adopt a hybrid composition combining the use of a disk (including SSD as well) and a memory.
  • OS user (gsadm) is created when GridDB is installed using the package.

# Terminology

Describes the terms used in GridDB in a list.

Term Description
Node Refers to the individual server process to perform data management in GridDB.
Cluster Single or a set of nodes that perform data management together in an integrated manner.
Master node Node to perform a cluster management process.
Follower node All other nodes in the cluster other than the master node.
number of nodes constituting a cluster Refers to the number of nodes constituting a GridDB cluster. When starting GridDB for the first time, the number is used as a threshold value for the cluster to be valid. (Cluster service is started when the number of nodes constituting a cluster joins the cluster.)
number of nodes already participating in a cluster Number of nodes currently in operation that have been incorporated into the cluster among the nodes constituting the GridDB cluster.
Block A block is a data unit for data persistence processing in a disk (hereinafter referred to a checkpoint) and is the smallest physical data management unit in GridDB. Multiple container data are arranged in a block. Block size is set up in a definition file (cluster definition file) before the initial startup of GridDB.
Partitioned table Data management unit to arrange a container. Smallest data arrangement unit among clusters, and data movement and replication unit for adjusting the load balance between nodes (rebalancing) and for managing data replicas in case of failure.
Partition group A group summarizing multiple partitions which is equivalent to the data file in the file system when the data is perpetuated in a disk. 1 checkpoint file corresponds to 1 partition group. Partition groups are created according to the number of concurrency (/dataStore/concurrency) in the node definition file.
Row Refers to one row of data registered in a container or table. Multiple rows are registered in a container or table. A row consists of values of columns corresponding to the schema definition of the container (table).
Container (Table) Container to manage a set of rows. It may be called a container when operated with NoSQL I/F, and may be called a table when operated with NewSQL I/F. What these names refer are the same object, only in different names. A container has two data types: collection and timeseries container.
Collection (table) One type of container (table) to manage rows having a general key.
Timeseries container (timeseries table) One type of container (table) to manage rows having a timeseries key. Possesses a special function to handle timeseries data.
Database file A database file is a file group consisting of transaction log file and checkpoint file that are perpetuated to a HDD or SSD. Transaction log file is updated every time the GridDB database is updated or a transaction occurs, whereas the checkpoint file is written at a specified time interval.
Checkpoint file A file written into a disk by a partition group. Updated information is reflected in the memory by a cycle of the node definition file (/checkpoint/checkpointInterval).
Transaction log file Update information of the transaction is saved sequentially as a log.
LSN (Log Sequence Number) Shows the update log sequence number, which is assigned to each partition during the update in a transaction. The master node of a cluster configuration maintains the maximum number of LSN (MAXLSN) of all the partitions maintained by each node.
Replica Replication is the process of creating an exact copy of the original data. In this case, one or more replica are created and stored on multiple nodes, which results to the creation of partition across the nodes. There are 2 forms of replica, master and backup. The former one refers to the original or master data, whereas the latter one is used in case of failure as a reference.
Owner node A node that can update a container in a partition. A node that records the container serving as a master among the replicated containers.
Backup node A node that records the container for backup data among the replicated containers.
Definition file Definition file includes two types of parameter files: gs_cluster.json, hereinafter referred to as a cluster definition file, used when composing a cluster; gs_node.json, hereinafter referred to as a node definition file, used to set the operations and resources of the node in a cluster. It also includes a user definition file for GridDB administrator users.
Event log file Event logs of the GridDB server are saved in this file including messages such as errors, warnings and so on.
OS user (gsadm) An OS user has the right to execute operating functions in GridDB. An OS user named gsadm is created during the GridDB installation.
Administrator user An administrator user is a GridDB user prepared to perform operations in GridDB.
General user A user used in the application system.
user definition file File in which an administrator user is registered. During initial installation, 2 administrators, system and admin, are registered.
Cluster database General term for all databases that can be accessed in a GridDB cluster system.
Database Theoretical data management unit created in a cluster database. A public database is created in a cluster database by default. Data separation can be realized for each user by creating a new database and giving a general user the right to use it.
Failover When a failure occurs in a cluster currently in operation, the structure allows the backup node to automatically take over the function and continue with the processing.
Client failover When a failure occurs in a cluster currently in operation, the structure allows the backup node to be automatically re-connected to continue with the processing as a retry process when a failure occurs in the API on the client side.
Table partitioning Function to access a huge table quickly by allowing concurrent execution by processors of multiple nodes, and the memory of multiple nodes to be used effectively by distributing the placement of a large amount of table data with multiple data registrations in multiple nodes.
Data partition General name of data storage divided by table partitioning. Multiple data partitions are created for a table by table partitioning. Data partitions are distributed to the nodes like normal containers. The number of data partitions and the range of data stored in each data partition are depending on the type of table partitioning (hash, interval or interval-hash).
Data Affinity A function to raise the memory hit rate by placing highly correlated data in a container in the same block and localizing data access.
Placement of container/table based on node affinity A function to reduce the network load during data access by placing highly correlated containers in the same node.