Skip to content

CursusDB is an open-source distributed in-memory yet persisted document oriented database system with real time capabilities.

License

Notifications You must be signed in to change notification settings

cursusdb/cursusdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Cursus Database System

CursusDB is a fast open source in-memory document oriented database offering security, persistence, distribution, availability and an SQL like query language(CDQL).

Table of contents

📙📙 https://cursusdb.com/documentation

The idea behind CursusDB was to create something exceedingly scalable whilst never really slowing down. Say you have 1 billion documents stored within 1 collection spread across 100 nodes the cluster will query 1 billion documents in the time it takes to query 10 million as the cluster initiates a non insert action on all nodes simultaneously. This is the power of parallel search. The Cursus(cluster) system is searching say in the users collection in multiple sections of the collection simultaneously. A cluster can query thousands of nodes at the same time. Think of main nodes as shards of many or one collection. Each collection locks on insert, update and delete but because of CursusDB's distributed design it's like a concurrent switch board that allows for large amounts of concurrent transactions. A cluster or many clusters take actions, these actions are relayed as requests to 1 or many nodes simultaneously. Consistency and reliability was one of the main goals when designing CursusDB. If you have many cluster's setup through a TCP load balancer you can imagine transactions just don't miss if the system is configured correctly. One more bit! Say you have multiple updates to one document the node will work in order of operation received. The database system is well-designed, heavily tested and very stable; It was designed and developed for my own need's with other projects/companies I have going on; Over the period of design and development it's very much turned into something special. With that I hope you all enjoy CursusDB!

~ Alex Gaetano Padula

Features

  • Secured cluster and node(s) communication with shared key and BASIC AUTH type implementation and OR TLS
  • If configured secured node replication sync with TLS using tls-replication config within .curodeconfig
  • In-memory data during runtime
  • Parallel search. Searching section of collections within multiple nodes or replicas simultaneously at the same time.
  • Auto generated $id key for all documents unique across all nodes
  • Database Users with basic (R, RW) permissions
  • Cluster node data replication and synchronization specifically for reads
  • JSON object insert
  • Unstructured collections
  • Cluster and client authentication using BASIC AUTH type implementation
  • Node(s) (insert, update, delete) relay to observers in real time
  • Node observer automatic reconnect if connection lost
  • SQL like query language (CDQL - Cursus Document Query Language)
  • Low-latency
  • Highly available
  • Unique k:v across all nodes using exclamation at end of key name email!
  • Secure by default with shared key and users
  • Highly configurable
  • Lightweight core code under 6000 lines of code in total
  • File logging and automatic log truncation based on log-max-lines config
  • Automatic reconnect of any lost node or node replica
  • Automatic node backups if automatic-backup within .curodeconfig is set to true
  • Automatic node backup clean up if automatic-backup-cleanup within .curodeconfig is set to true.
  • Automatic node recovery if data is corrupt if automatic-backup configured
  • Node data(.cdat) and node backups (/backups/.cdat.{unixtime}) are created by taking what's in memory serializing it, encrypting it and compressing it block by block via serialization-encryption(chacha20poly1305)-compression(DEFLATE) on shutdown or backup

There are no databases like MySQL let's say where you can have multiples. A cluster is your database that spreads data across many nodes.


drawing3.png

drawing5.png

drawing32.png

drawing63.png

drawing82.png

drawing102.png

drawing232.png

A node keeps track of queries/txns and if something bad happens can re-trigger what hasn't been processed. A node syncs to a .qqueue file every 70 milliseconds this is fixed and cannot be changed. .qqueue files are encrypted.

Docker

https://hub.docker.com/repositories/cursusdb (SOON)

Native Clients

Native Observers

.. more coming

Prebuilt Binaries

You can find the latest stable release prebuilt binaries at https://cursusdb.com/downloads

Cluster & Node Building & Initial Setup

Getting started with CursusDB is extremely easy! First you must build a cluster and node binary. To do that clone the source and follow below:

You must make sure you have GO installed minimum version 1.21.3, once installed follow below.

git clone [email protected]:cursusdb/cursusdb.git
cd cluster
go build .
cd ..
cd node 
go build .

Now you should have a curode and a cursus binary.

img.png

img_1.png

Now with both we first start cursus to setup a database user, .cursusconfig and a shared key which will be used for your node as well. This key is used to authenticate your cluster and nodes also encrypt your data at rest with ChaCha!

img_2.png

So now that we have our credentials setup we have to setup our first node!

We can run a node on the same instance as a cluster for this example. After completion of cluster setup through the initial run you'll get a .cursusconfig which has a few configurations.

nodes: []
host: 0.0.0.0
tls-node: false
tls-cert: ""
tls-key: ""
tls: false
port: 7681
key: QyjlGfs+AMjvqJd/ovUUA1mBZ3yEq72y8xBQw94a96k=
users:
    - YWxleA==:7V8VGHNwVTVC7EktlWS8V3kS/xkLvRg/oODmOeIukDY=
node-reader-size: 2097152
log-max-lines: 1000
join-responses: false
logging: false
timezone: Local
log-query: false
node-read-deadline: 2
  • nodes - database cluster nodes. i.e an ip/fqdn + port combination (cluster1.example.com:7682)
  • tls-node - whether the cluster will connect to nodes via tls
  • tls-cert - path to your tls cert for cluster
  • tls-key - path to your tls key for cluster
  • tls - enable or disable tls
  • port - cluster port
  • key - encoded shared key
  • users - array of database users serialized, and encoded.
  • node-reader-size - the max size of a response from a node
  • join-responses - join all node responses and limit based on provided n
  • logging - start logging to file
  • timezone - Default is Local but format allowed is for example America/Toronto
  • log-query - Logs client ip and their query to logs and std out if enabled
  • node-read-deadline - Amount of time in seconds to wait for a node to respond

Let's put in under nodes a local node we will start shortly.

nodes:
- host: 0.0.0.0
  port: 7682

Now with your .cursusconfig setup let's start our node for the first time.

img_92.png

You'll see that I've added the same key as I did for the cluster and the node is now started!

Let's start our cluster now.

img_4.png

Look at that! We are all set to start inserting data. Let's insert a user document into a users collection with a unique email key value using the curush(CursusDB Shell)

img_5.png

We can use curush with flag --host which will use the default port for a cluster 7681. If we wanted to specify a different port we can used the --port flag. If your cluster is using TLS make sure when using curush to also enable tls using flag --tls=true.

curush will ask for a database user username and password to connect to cluster. Once authorized you can start running queries!

insert into users({"name": "Alex", "lastName": "Padula", "age": 28, "email!": "[email protected]"});

img_7.png

On inserts every document will get a unique $id key which is unique across all nodes.

img_8.png

If we try and insert the same document we will get an error stating an existing document already exists. This is because we set email with and !

img_9.png

Node Replicating

.cursusconfig

nodes:
- host: 0.0.0.0 # node host i.e an IP or FQDN
  port: 7682 # node port 
  replicas:
  - host: 0.0.0.0 # replica host i.e an IP or FQDN
    port: 7683 @ replica port.  The reason we have 7683 here is because a replica is a completely seperate node from the main.
..

The cluster makes connections on start up to your node and node replicas hence configuring the .cursusconfig the way it is. The cluster keeps those connections alive for fast reactivity. The cluster will automatically reconnect to any lost node.

Node at 0.0.0.0:7682 has a configured replica at 0.0.0.0:7683

On the nodes end you need to configure a replica so the node you're configuring knows to replicate the data over.

.curodeconfig

replicas:
  - host: 0.0.0.0
    port: 7683
tls-cert: ""
tls-key: ""
..

Default sync time is 10 minutes and can be configured with yaml config replication-sync-time the node will sync its data to its configured replicas. If original node shuts down or is not available a replica will be used for reads, if a replica is not available another available replica will be used(a node can configure multiple replicas).

replicating-cluster-nodes.png

Query Language

Case-sensitive.. Keep it lowercase as the examples.

Ping the cluster

Using curush or native client

> ping;
> pong;

Inserts

insert into users({"name": "Alex", "last": "Lee", "age": 28});
insert into users({"name": "John", "last": "Josh", "age": 28, "tags": ["tag1", "tag2"]});

Selects

select {LIMIT} from {COLLECTION} where {CONDITIONS} {ORDERING}
select * from users;
select 0,2 from users;
select 1 from users where name == 'Alex' || name == 'John';
select * from users where name == 'Alex' && age == 28;
select * from users where tags == "tag1";
select * from users where name == 'Alex' && age == 28 && tags == 'tag1';
NOTE

You can use == OR =

For example

select 1 from users where name == 'Alex' || name == 'John';

OR

select 1 from users where name = 'Alex' || name = 'John';

Updates

update {LIMIT} in {COLLECTION} where {CONDITIONS} {SETS} {ORDERING}
update 1 in users where age >= 28 set name = 'Josie' order by createdAt desc;
update * in users where age > 24 && name == 'Alex' set name = 'Josie' set age = 52;
update n, n..
ect..

Deletes

delete {LIMIT} from {COLLECTION} where {CONDITIONS} {ORDERING}
delete * from users where age >= 28 || age < 32;
delete 0,5 from users where age > 28 && name == 'Alex';
ect

Pattern Matching

LIKE

Starts with 'A'

select * from users where firstName like 'A%lex Padula'

Ends with 'la'

select * from users where firstName like 'Alex Padu%la'

Contains Pad

select * from users where firstName like 'Alex %Pad%ula'

NOT LIKE

Starts with 'A'

select * from users where firstName not like 'A%lex Padula'

Ends with 'la'

select * from users where firstName not like 'Alex Padu%la'

Contains Pad

select * from users where firstName not like 'Alex %Pad%ula'

Sorting

select * from users order by createdOn desc;
select * from users order by firstName asc;

Counting

Example

select count from users where $id == "099ade86-93a8-4703-abdd-d1ccc1078b1d";

Response not joined

[{"127.0.0.1:7682": [{"count":1}]}]

Response joined if each node has 1 match and there is 5 nodes

{"count":5} 

Deleting a key within documents in a collection

It's very simple to alter a collections documents. Say you want to remove the y key from a documents like below:

[{"$id":"fcb773f6-2d77-45fe-a860-9dd94f5e7c07","x":5,"y":7},{"$id":"a567925e-dbb1-405e-b4ac-12522b33d07e","x":2,"y":4},{"$id":"4fa938f6-6813-4db9-9955-f5e3c81a9c0b","x":55,"y":9}]}]

Simple using a native client:

curush>delete key y in example;
[{"127.0.0.1:7682": {"message":"Document key removed from collection successfully.","statusCode":4021,"altered":3}}]

Uniqueness

using key! will make sure the value is unique across all nodes!

insert into users({"email!": "[email protected]" ...});

Operators

  • >
  • >=
  • <
  • >=
  • ==
  • =
  • !=

Conditional Symbols

  • &&
  • ||

Actions

  • select
  • update
  • delete

List collections

curush>collections;
[{"127.0.0.1:7682": {"collections":["losers","winners","users"]}}]

Deleting collections?

When you remove every document from a collection the collection is removed i.e

delete * from losers;
..."1 Document(s) deleted successfully.","statusCode":2000}}]
curush>collections;
[{"127.0.0.1:7682": {"collections":["winners","users"]}}]

Database Users

CursusDB has 2 permissions R(read) and (RW). RW can select, insert, delete, update and add new users whereas users with just R can only read.

new user USERNAME, PASSWORD, P

Using a client like curush the CursusDB Shell Program.

curush> new user someusername, somepassword, RW;

Listing Database Users

Getting all database users. User with RW permission required.

users;

command returns JSON array of database users.

["alex","daniel"]

Removing Database Users

delete user USERNAME;

Status codes

A CursusDB status code is a numerical value assigned to a specific message. The numerical values are used as a shorthand to the actual message. They are grouped by

  • Other signals, shutdowns
  • Authentication / Authorization cluster and node auth
  • Document & CDQL document and query language

Other

  • -1 Received signal (with signal) -1 is just for the system it doesn't mean error in CursusDB's case.

Authentication / Authorization

  • 0 Authentication successful.
  • 1 Unable to read authentication header.
  • 2 Invalid authentication value.
  • 3 No user exists
  • 4 User not authorized
  • 5 Failed node sync auth

Node / Cluster

  • 100 - Node is at peak allocation
  • 101 - Invalid permission
  • 102 - User does not exist
  • 103 - Database user already exists
  • 104 - No node was available for insert
  • 105 - Node unavailable
  • 106 - Node ready for sync
  • 107 - Node replica synced successfully
  • 108 - Could not decode serialized sync data into hashmap
  • 109 - No previous data to read. Creating new .cdat file
  • 110 - Could not open log file (with description)
  • 111 - Data file corrupt (with description)
  • 112 - Collection mutexes created
  • 113 - Could not unmarshal system yaml configuration (with description)
  • 114 - Could not marshal system yaml configuration (with description)
  • 115 - Could not decode configured shared key (with description)
  • 116 - Reconnected to lost connection (includes host:port)
  • 117 - Reconnected to lost observer connection (includes host:port)
  • 118 - Could not open/create configuration file (with description)
  • 119 - Could not open/create data file (with description)
  • 120 - No .qqueue file found. Possibly first run, if so the node will create the .qqueue file after run of this method (after first run you will normally see 505 0 recovered and processed from .qqueue. 0 being what was left on the query queue)
  • 200 - New database user created successfully
  • 201 - Database user removed successfully
  • 202 - Could not decode user username
  • 203 - Could not marshal users list array
  • 204 - There must always be one database user available
  • 205 - Could not marshal user for creation
  • 206 - Could not get node working directory for automatic backup (with description)
  • 207 - Could not create automatic backups directory (with description)
  • 208 - Could not read node backups directory (with description)
  • 209 - Could not remove .cdat backup {FILE NAME} (with description)
  • 210 - Could not get node working directory for automatic recovery (with description)
  • 211 - Node recovery from backup was successful
  • 214 - Node was unrecoverable after all attempts
  • 215 - Attempting automatic recovery with latest backup
  • 216 - Starting to sync to with master node
  • 217 - Synced up with master node (with addr)
  • 218 - Observer HOST:PORT was unavailable during relay
  • 219 - Could not encode data for sync (with description)
  • 220 - Starting to write node data to file
  • 221 - Starting to write node data to backup file
  • 222 - Node data written to file successfully
  • 223 - Node data written to backup file successfully
  • 224 - Observer connection established (with info)
  • 225 - Node connection established (with info)
  • 500 - Unknown error (with description)
  • 502 - Node could not recover query queue
  • 503 - Could not dial self to requeue queries (with description)
  • 504 - Could not commit to queued query/transaction
  • 505 - n recovered and processed from .qqueue
  • 507 - Error loading X509 key pair (with description)

Document & CDQL

  • 2000 Document inserted/updated/deleted
  • 4000 Unmarsharable JSON insert
  • 4001 Missing action
  • 4002 None existent action
  • 4003 Nested JSON objects not permitted
  • 4004 Document already exists
  • 4005 Invalid command/query
  • 4006 From is required
  • 4007 Invalid query operator
  • 4008 Set is missing =
  • 4009 Invalid insert query missing 'insert into'
  • 4010 Invalid insert query is missing parentheses
  • 4011 Invalid update query missing set
  • 4012 Could not marshal JSON
  • 4013 Unparsable boolean value
  • 4014 Unparsable float value
  • 4015 Unparsable integer value
  • 4016 Missing limit value
  • 4017 Invalid query
  • 4018 Unmarsharable JSON
  • 4019 Update sets are missing
  • 4020 In is required
  • 4021 Document key removed from collection successfully
  • 4022 No documents found to alter
  • 4023 No unique $id could be found for insert
  • 4024 Batch insertion is not supported
  • 4025 Where is missing values
  • 4026 Delete key missing in
  • 4027 Limit skip must be an integer (with description)
  • 4028 Could not convert limit value to integer (with description)
  • 4029 Invalid limiting value (with description)
  • 4030 Key cannot use reserved word
  • 4031 Key cannot use reserved symbol
  • 4032 Invalid set array values (with description)

Reserved Document Keys

On insert there are a variety of RESERVED keys.

  • count
  • $id
  • $indx
  • in
  • not like
  • !like
  • where
  • chan
  • const
  • continue
  • defer
  • else
  • fallthrough
  • func
  • go
  • goto
  • if
  • interface
  • map
  • select
  • struct
  • switch
  • var
  • false
  • true
  • uint8
  • uint16
  • uint32
  • uint64
  • int8
  • int16
  • int32
  • int64
  • float32
  • float64
  • complex64
  • complex128
  • byte
  • rune
  • uint
  • int
  • uintptr
  • string
  • ==
  • &&
  • ||
  • >
  • <
  • =
  • *

Ports

Default cluster port: 7681 Default node port: 7682

Logging

Logs for the CursusDB cluster and node are found where you launch your binaries. Cluster: cursus.log Node: curode.log

You can enable logging on either cluster or node enabling logging. This will log to file instead of stdout

logging: true

Within your yaml configs you can set log-max-lines this option will tell either node or cluster when to truncate(clear up) the log file(s).

How are logs are formatted?

[LEVEL][YOUR CONFIGURED TZ RFC822 DATE] DATA

Logs can have either level:

  • ERROR
  • FATAL
  • INFO
  • WARN
[INFO][26 Dec 23 08:34 EST] main(): 112 Collection mutexes created.
[INFO][26 Dec 23 08:34 EST] SignalListener(): -1 Received signal interrupt starting database shutdown.
[INFO][26 Dec 23 08:34 EST] WriteToFile(): 220 Starting to write node data to file.
[INFO][26 Dec 23 08:34 EST] WriteToFile(): 222 Node data written to file successfully.

Example using curush querying cluster

./curush -host 0.0.0.0
Username> ******
Password> *****
curush>select * from users;

127.0.0.1:7682: [{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec90","age":28,"last":"Padula","name":"Alex"}]
curush>select * from users;

127.0.0.1:7682: [{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec90","age":28,"last":"Padula","name":"Alex"}]
curush>insert into users({"name": "Alex", "last": "Lee", "age": 28});

{"collection": "users", "insert":{"$id":"ecaaba0f-d130-42c9-81ad-ea6fc3461379","age":28,"last":"Lee","name":"Alex"},"message":"Document inserted","statusCode":2000}
curush>select * from users;

127.0.0.1:7682: [{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec90","age":28,"last":"Padula","name":"Alex"},{"$id":"ecaaba0f-d130-42c9-81ad-ea6fc3461379","age":28,"last":"Lee","name":"Alex"}]

^ Single node

If multiple nodes you'd see a response similar to the one below

curush>select * from users;

127.0.0.1:7682: [{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec90","age":28,"last":"Doe","name":"John"},..]
127.0.0.1:7683: [{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec91","age":32,"last":"Johnson","name":"Sarah"},..]
127.0.0.1:7684: [{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec92","age":42,"last":"Stint","name":"Peter"},..]

By default though you wont see above..

join-responses: false

is required to see results for each node.

join-responses joins all documents from nodes and limits based on limit. For example..

select 3 from posts order by createdOn desc;

The select 3 portion the cluster will get depending on set amount of nodes say you have 5 nodes setup, you will get back 3 * 5 but the cluster will limit to 3 as that what was requested!

Cluster to Node TLS connectivity & Node to Node replica TLS connectivity

If you set tls-node on the cluster to true the cluster will expect all nodes to be listening on tls.

If you set tls-replication on a cluster node to true the cluster node will expect all node replicas to be listening on tls.

What is a Node Observer?

A node observer is a backend service using the CursusDB Observer package to listen to incoming node events such as insert, update, and delete in real time.

The observer must be configured with the same shared key as your nodes and clusters.

Document Expectation & Document Relation

CursusDB expects simple JSON objects. For example take this user object:

{"username!": "alex", "email!": "[email protected]", "password": "xxx", "interests": ["programming", "music", "botany"]}

This is an object CursusDB likes.

imagine you insert this object into a users collection:

insert into users({"username!": "alex", "email!": "[email protected]", "password": "xxx", "interests": ["programming", "music", "botany"]})

{"insert":{"$id":"17cc0a83-f78e-4cb2-924f-3a194dedec90", "username!": "alex", "email!": "[email protected]", "password": "xxx", "interests": ["programming", "music", "botany"]},"message":"Document inserted","statusCode":2000} You can see username and email are set up to be unique using the suffixed !. If CursusDB finds a user with that email or username you'll get back a 4004 error which means document already exists.

Now lets say this user can have many posts. We will create a posts collection with the first post containing the users $id we created.

insert into posts({"title": "First Post", "body": "This is a test post", "userId": "17cc0a83-f78e-4cb2-924f-3a194dedec90", "createdOn": 1703626015})

As you can see we sorta just related data so now it's fairly easy to query the database and say hey give me all the users posts like so:

select * from posts where userId = "17cc0a83-f78e-4cb2-924f-3a194dedec90";

Remember how we had the createdOn as a unix timestamp on our posts documents? Awesome we can sort all the posts and paginate them!

Skipping 10 and grabbing 10

select 10,10 from posts where userId = "17cc0a83-f78e-4cb2-924f-3a194dedec90" order by createdOn desc;

Let`s say we want to sort the posts by title alphabetically:

select * from posts where userId = "17cc0a83-f78e-4cb2-924f-3a194dedec90" order by title asc;

This is how data should be related on CursusDB either a user has many posts or lets say a user has one account profile well same thing just repeat the process.

Report Issues

Please report issues, enhancements, etc at:

CursusDB v MySQL BENCHMARK

Most basic setup. CursusDB cluster and node hosted same instance no TLS. MySQL setup exact same specification of an instance no TLS.

💨 Mind you Cursus(Cluster) was configured with one node. If configured with multiple inserts GREATLY speed up more for concurrency but sequentially as well.

CursusDB

Connection time: 64ms

Inserting 1002 records sequentially

insert into users({"first": "James", "last": "Jones", "age": 22, "active": true});

Insertion time: 481.190374ms

Read skipping 1000 selecting 1 where first is James

select 1000,1 from users where first == "James";

Read time: 743.538µs

MySQL

Connection time: 170ms

Inserting 1002 records sequentially

INSERT INTO users (first, last, age, active) VALUES ("James", "Jones", 22, true);

Insertion time: 1.928675495s

Read skipping 1000 selecting 1 where first is James

SELECT * FROM users where first = "James" LIMIT 1 OFFSET 1000;

Read time: 1.021852ms

Table used

CREATE TABLE users ( first varchar(255), last varchar(255), age int, active BOOLEAN );

Live Chat drawing using an Observer

How would a chat work with an Observer configured? Let's say you have 2 collections, a chatrooms collections and a messages collection. On insert the node will relay to an observer if configured. On the backend where the observer lives you can have a socket server sending actions from an observer to many web socket or web transport clients. Take the example, we have 2 users in a chatroom with an $id of 12718b2b-0efe-4fe6-94ec-1adea5f212c8 which is unique. Adam sends a message to Chris which is an insert. At that point the cluster will insert into a node and the node will relay to an observer at which point we can relay that through to our connected clients to a specific room let's say. Very cool stuff!

drawing182.png