An overview of Apache Cassandra

An overview of Apache Cassandra

Cassandra is an open source distributed, scalable database management system designed to handle large amount of data. Apache Cassandra offers capabilities like continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centers.

Features:

  • Decentralised
  • Supports replication and multiple data center
  • High scalability
  • Fault tolerant
  • Tunable data consistency
  • Data compression

Key structures:

  • Node:This is the place where actually data gets stored.
  • Data center:This is a collection of related nodes. It could be physical or virtual data center. Replication is a set of data center.
  • Cluster:A cluster is a collection of one or more data centers. It can span physical locations.
  • Commit log:All data is written in commit log for durability. After all its data has been flushed to SSTables, it can be archived, deleted or recycled.
  • Table:This is a collection of ordered columns fetched by rows. It consists of rows identified by a primary key.
  • SSTables:This is Sorted String Table and an immutable data file into which Cassandra writes memtables periodically.

Uses of Apache Cassandra

Apache Cassandra has wide range of applications. Some of the best choices for Apache Cassandra includes:

  • Internet of things applications
  • In activity-tracking and monitoring applications
  • In heavy write systems or time-series based applications
  • Social media analytics and recommendation engines
  • Product catalogues and retail applications
  • Messaging

It can be concluded that Apache Cassandra is one of the most powerful open source distributed database systems available. It provides a flexible data model than what is offered in relational database world.

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *