An overview of Apache Cassandra

By admin

Cassandra is an open source distributed, scalable database management system designed to handle large amount of data. Apache Cassandra offers capabilities like continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centers.

Features:

Decentralised
Supports replication and multiple data center
High scalability
Fault tolerant
Tunable data consistency
Data compression

Key structures:

Node:This is the place where actually data gets stored.
Data center:This is a collection of related nodes. It could be physical or virtual data center. Replication is a set of data center.
Cluster:A cluster is a collection of one or more data centers. It can span physical locations.
Commit log:All data is written in commit log for durability. After all its data has been flushed to SSTables, it can be archived, deleted or recycled.
Table:This is a collection of ordered columns fetched by rows. It consists of rows identified by a primary key.
SSTables:This is Sorted String Table and an immutable data file into which Cassandra writes memtables periodically.

Uses of Apache Cassandra

Apache Cassandra has wide range of applications. Some of the best choices for Apache Cassandra includes:

Internet of things applications
In activity-tracking and monitoring applications
In heavy write systems or time-series based applications
Social media analytics and recommendation engines
Product catalogues and retail applications
Messaging

It can be concluded that Apache Cassandra is one of the most powerful open source distributed database systems available. It provides a flexible data model than what is offered in relational database world.