September 10, 2024

Cassandra howto

Cassandra is a “big data” type database that scales really well across multiple servers but is a bit confusing if you come from regular databases like mysql.

Cassandra has a bunch of servers that each run Cassandra with a duplicate copy of your database, so if one gets eaten you don’t have big problems because the others pick up the slack and continue to serve your data.

It’s really fast, but unless you have lots of data you probably don’t care because mysql is pretty fast for normal things. If you need Cassandra or other noSQL (stands for Not Only SQL, not just no SQL) things, you already know it.

Installing Cassandra 3.5.x on Debian Jessie

Right now you have to install it on Debian from Apache foundation sources and also install Java (though you could also use openjdk if you want), so add the repositories like:

vi /etc/apt/sources.list
  deb http://www.apache.org/dist/cassandra/debian 35x main
  deb-src http://www.apache.org/dist/cassandra/debian 35x main
  deb http://ppa.launchpad.net/webupd8team/java/ubuntu xenial main
  deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu xenial main

Then update apt-get and import the keys and install cassandra:

gpg --keyserver pgp.mit.edu --recv-keys 749D6EEC0353B12C
gpg --export --armor 749D6EEC0353B12C | sudo apt-key add -
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
apt-get update
apt-get install oracle-java8-installer ntp git
apt-cache madison cassandra
  cassandra |        3.5 | http://www.apache.org/dist/cassandra/debian/ 35x/main amd64 Packages
  cassandra |        3.5 | http://www.apache.org/dist/cassandra/debian/ 35x/main Sources
apt-cache search cassandra
  ycassa-doc - Documentation for the Pycassa library
  python-pycassa - Client library for Apache Cassandra
  cassandra - distributed storage system for structured data
  cassandra-tools - distributed storage system for structured data
apt-get install cassandra cassandra-tools

Okay, now we login to the console and create a keyspace, which is a container that holds your data like:

>: cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.1.1 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> DESC KEYSPACES;
 
system_traces  system_schema  system_auth  system  system_distributed
 
cqlsh> CREATE KEYSPACE somedb WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};
cqlsh> use somedb;
cqlsh:somedb>

This means you have one node (replication factor 1), meaning the database/keyspace is only on one server, the one you’re working on. The class means how complicated the replication scheme is, this important if you have lots of nodes and lots of data.

Now let’s create a table in that keyspace. You don’t just need a database, you need a keyspace that could eventually fill up lots of servers, though you’re not doing that now, but you could as you scale.

Now that you have a keyspace, you have to make a table that holds your data. First we’ll use the DESC command to see that you don’t have any:

cqlsh:somedb> desc tables;
  <empty>
cqlsh:somedb> CREATE TABLE people (
            ... name text,
            ... age text,
            ... sex text,
            ... PRIMARY KEY (name));
cqlsh:somedb>

This means “name” is what you reference everything else to. To see what you’ve created do:

cqlsh:somedb> DESC SCHEMA ;
CREATE KEYSPACE somedb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
CREATE TABLE somedb.people (
    name text PRIMARY KEY,
    age int,
    sex text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Now insert something in your database like:

INSERT INTO people (name, age, sex) VALUES ( 'doug', 46, 'male');

If that worked you shouldn’t get any errors, and you can see it’s there by doing:

cqlsh:somedb> SELECT * FROM people ;
 name  | age | sex
-------+-----+--------
 wendy |  16 | female
  doug |  46 |   male
(2 rows)

Installing a Cassandra management GUICassandra-Cluster-Admin here, but it’s a little rougher, and you have to get it from the Cassandra like:

mkdir /usr/src/cluster-admin
cd /usr/src/cluster-admin
git clone https://github.com/sebgiroux/Cassandra-Cluster-Admin
ln -s /usr/src/cluster-admin/Cassandra-Cluster-Admin/ /var/www/html/cluster-admin
vi /usr/src/cluster-admin/Cassandra-Cluster-Admin/includes/conf.inc.php
  change port to 9042 from 9160

Now visit http://localhost/cluster-admin