Please login to post comment

An Introduction to Apache Cassandra

  • Amruta Bhaskar
  • Dec 6, 2019
  • 0 comment(s)


Let’s say you're considering an information storage resolution for Associate in Nursing IoT or application event load. You’ll have many questions: however, do I store all of my information with its variable event length? And the way do I question my large, invasive dataset for immediate insights and repetitious, perpetual improvements?

These things need a distributed information store that may accommodate evolving and variable-length records at large scale and ingest rate, using in-built fault-tolerance and accessibility with high write speeds and this information should be manageable with a question language everybody already understands.

In today’s modern era, billions of connected devices and digital environments regularly stream and store information. From smartphones and laptops, net browsers and applications, to good appliances, infrastructure controls, and sensors — all of those devices generate information.

Every bit of generated information is made to be collected, stored, refined, queried, analysed, and operationalized for the aim of continuous improvement: constantly and iteratively providing higher, safer, and Data generation is endless, which information, once keep, grow exponentially. As long as users still use the digital product, and as long as digital product stay connected to networks, they’ll still. Wide column stores like Apache prophetess were developed to assist organizations regained a semblance of management over these large, exponentially-growing amounts of regularly reworking information.

In this article, we’ll cross-check what Apache prophetess is, what’s special concerning it, and the way it distributes and stores information. We’ll take into account why consistency and accessibility (read: performance) are core trade-offs, take into account what situations are best (or not), and review some use-cases.

Apache prophetess is Associate in Nursing ASCII text file, NoSQL, wide column information store that may quickly ingest and method large amounts of information. It’s additionally localized, distributed, scalable, extremely accessible, fault-tolerant, and tenably consistent, with identical nodes clustered along to eliminate single points of failure and bottlenecks (we’ll think again each of you'll be able to deploy prophetess on-premise, within the cloud, or in an exceedingly hybrid information setting.

Originally designed for Facebook inbox looking, prophetess is employed these days by CERN, GitHub, Apple, Netflix, and incalculable different organizations. It’s extraordinarily well-suited for managing massive amounts of semi-variable however structured information (from sensors, connected appliances and applications) for analytics, event work, monitoring, and eCommerce functions, notably once high


In order to know the distinctive worth add that Apache prophetess provides, it’s helpful to appear at those terms we’ve won't describe it.

• Distributed implies that prophetess adds the foremost worth once it's distributed across several nodes and even information centres.

• Scalable implies that prophetess is simply scaled horizontally, by adding a lot of nodes (machines) to the cluster, while not disrupting your browse and write progress.

• Highly accessible implies that your information store is fault-tolerant and your information remains accessible although one or many nodes and data centres go down.

• Tenably Consistent implies that it's potential to regulate the exchange between accessibility and consistency of information on the prophetess nodes, generally by configuring replication factor and consistency level settings.


How Apache prophetess distributes information Casandra uses a peer-to-peer distribution model that allows it to totally distribute information within the sort of variable-length rows by keeping partition keys.


This happens across completely different cloud accessibility zones and multiple information centres. Prophetess is constructed for measurability, continuous accessibility, and has no single purpose of failure.

Many databases, like Postgres, use a master-slave replication model, within which the writes move to a master node and reads and dead-on slaves to supply high accessibility, fault tolerance, and measurability,

Cassandra’s peer-to-peer cluster design provides nodes with open channels of communication.

Cassandra uses tokens to work out that node holds what information.

A token could be a 64-bit number, and prophetess assigns ranges of those tokens to nodes so every potential token is closely-held by a node.

Adding a lot of nodes to the cluster or removing previous ones results in redistributing these token ranges among nodes A row’s partition the secret is wont to calculate a token employing a given practitioner (a hash operate for computing the token of a partition key) to work out that node owns that row.

That’s however prophetess finds wherever the replicas are that hold that information.


Author: Chethan M

Please login to post comment

( 0 ) comment(s)