Apache Cassandra Tutorial

Updated:10/25/2022 by Computer Hope

Developed by Facebook, Apache Cassandra Licence umder forApache License 2.0
TheApache Cassandra databaseis the right choice when you need scalability and high availability without compromising performance.
Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.


 Cassandra
 provides the Cassandra Query Language (CQL), an SQL-like language, to create and update database schema and access data. CQL allows users to organize data within a cluster of Cassandra nodes using:

  • Keyspace: defines how a dataset is replicated, for example in which datacenters and how many copies. Keyspaces contain tables.
  • Table: defines the typed schema for a collection of partitions. Cassandra tables have flexible addition of new columns to tables with zero downtime. Tables contain partitions, which contain partitions, which contain columns.
  • Partition: defines the mandatory part of the primary key all rows in Cassandra must have. All performant queries supply the partition key in the query.
  • Row: contains a collection of columns identified by a unique primary key made up of the partition key and optionally additional clustering keys.
  • Column: A single datum with a type which belong to a row.


 CQL
 supports numerous advanced features over a partitioned dataset such as:

  • Single partition lightweight transactions with atomic compare and set semantics.
  • User-defined types, functions and aggregates
  • Collection types including sets, maps, and lists.
  • Local secondary indices
  • (Experimental) materialized views


 In this cassandra tutorial
 explicitly chooses not to implement operations that require cross partition coordination as they are typically slow and hard to provide highly available global semantics. For example Cassandra does not support:

  • Cross partition transactions
  • Distributed joins
  • Foreign keys or referential integrity.

1)Creating a keyspace in cassandra

CREATE KEYSPACE developer WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};

CREATE KEYSPACE developerIndian WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 } AND DURABLE_WRITES = false;

2)Droping key spaces in cassandra

Syntax:
DROP keyspace KeyspaceName

3)
 In this cassandra tutorial
 Main points while altering Keyspace :

  • Keyspace Name: Keyspace name cannot be altered in Cassandra.
  • Strategy Name: Strategy name can be altered by using a new strategy name.
  • Replication Factor: Replication factor can be altered by using a new replication factor.
  • DURABLE_WRITES: DURABLE_WRITES value can be altered by specifying its value true/false. By default, it is true. If set to false, no updates will be written to the commit log and vice versa.

Cassandra Create Table:
In Cassandra, CREATE TABLE command is used to create a table. Here, column family is used to store data just like table in RDBMS.
So, you can say that CREATE TABLE command is used to create a column family in Cassandra.


CREATE TABLE student(
student_id int PRIMARY KEY,
student_name text,
student_city text,
student_fees varint,
student_phone varint
);

Printing data of table in cassandra

SELECT * FROM student;

Alter table in cassandra

Here we will drop two columns student_fees and student_phone.
ALTER TABLE student DROP (student_fees, student_phone);

Cassandra DROP table in cassandra

DROP TABLE student;

TRUNCATE command is used to truncate a table. If you truncate a table, all the rows of the table are deleted permanently.

Syntax:
TRUNCATE developerTable

Cassandra Batch
In Cassandra BATCH is used to execute multiple modification statements (insert, update, delete) simultaneously.
It is very useful when you have to update some column as well as delete some of the existing.


BEGIN BATCH
INSERT INTO Developer (Developer_id,Developer_fees,Developer_name) values (1,1100,'prafful');
INSERT INTO Developer (Developer_id,Developer_fees,Developer_name) values (4,4000,'rahul');
INSERT INTO Developer (Developer_id,Developer_fees,Developer_name) values (3,500,'push');
INSERT INTO Developer (Developer_id,Developer_fees,Developer_name) values (3,4000,'kavita');
UPDATE Developer SET Developer_fees=8000 where Developer_id=3;
DELETE Developer_fees FROM Developer where Developer_id=2;
APPLY BATCH

Insert script / Create Data in cassandra

INSERT command is used to insert data into the columns of the table.
INSERT INTO Developer (Developer_id, Developer_fees, Developer_name)VALUES(5,5000, 'rahul');
INSERT INTO Developer (Developer_id, Developer_fees, Developer_name)VALUES(6,3000, 'pushpa');
INSERT INTO Developer (Developer_id, Developer_fees, Developer_name)VALUES(7, 2000, 'ram');

Select script in cassandra


SELECT * FROM Developer WHERE Developer_id=2;

update statement in cassandra


UPDATE Developer SET Developer_fees=10000,Developer_name='Rahul'
WHERE Developer_id=2;

Delete script in cassadra

DELETE Developer_fees FROM Developer WHERE Developer_id=4;

Cassandra Collections
Cassandra collections are used to handle tasks. You can store multiple elements in collection. There are three types of collection supported by Cassandra:
  • Set
  • List
  • Map

Example of SET collection:


1)create table employee ( id int, name text, Email set, primary key (id) );

2) INSERT INTO employee (id, email, name) VALUES(1, {'rahul4u@gmail.com'}, 'Ajeet');
INSERT INTO employee (id, email, name)VALUES(2,{'pushpa@gmail.com'}, 'pushpa');
INSERT INTO employee (id, email, name)VALUES(3, {'ram4u@gmail.com'}, 'ram');


List Collection :
The list collection is used when the order of elements matters.
Let's take the above example of "employee" table and a new column name "department" in the table employee.


alter table employee
add department list text;
insert into employee (id ,email,name,department) values (4,{'ram@gmail.com'},'ram',['computer Science']);

Map Collection:
The map collection is used to store key value pairs. It maps one thing to another. For example, if you want to save course name with its prerequisite course name, you can use map collection.


create table dev_course
(id int,
prereq map,
primary key(id) );

insert into dev_course (id,prereq) values (1,{'programing':'java','Neural Network': 'Artificial Intelligence','programming':'java'});
select * from dev_course

Cassandra database tutorial and Article