什么时候不使用Cassandra?

最近有很多关于卡桑德拉的话题。

Twitter, Digg, Facebook等都在使用它。

什么时候有意义:

使用卡桑德拉, 不用卡桑德拉，还有使用RDMS而不是Cassandra。

当前回答

在这里，我将重点介绍一些重要的方面，这些方面可以帮助你决定是否真的需要卡桑德拉。这个清单并不详尽，只是我脑海中最重要的一些观点

Don't consider Cassandra as the first choice when you have a strict requirement on the relationship (across your dataset). Cassandra by default is AP system (of CAP). But, it supports tunable consistency which means it can be configured to support as CP as well. So don't ignore it just because you read somewhere that it's AP and you are looking for CP systems. Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability. Don't use Cassandra if your scale is not much or if you can deal with a non-distributed DB. Think harder if your team thinks that all your problems will be solved if you use distributed DBs like Cassandra. To start with these DBs is very simple as it comes with many defaults but optimizing and mastering it for solving a specific problem would require a good (if not a lot) amount of engineering effort. Cassandra is column-oriented but at the same time each row also has a unique key. So, it might be helpful to think of it as an indexed, row-oriented store. You can even use it as a document store. Cassandra doesn't force you to define the fields beforehand. So, if you are in a startup mode or your features are evolving (as in agile) - Cassandra embraces it. So better, first think about queries and then think about data to answer them. Cassandra is optimized for really high throughput on writes. If your use case is read-heavy (like cache) then Cassandra might not be an ideal choice.

2019-08-06 10:21:05

其他回答

没有什么是银弹，任何东西都是为了解决特定的问题而构建的，有自己的优点和缺点。这取决于你，你有什么问题陈述，什么是该问题的最佳解决方案。

我会按照你问的顺序一个一个地回答你的问题。因为Cassandra是基于NoSQL数据库家族的，所以在我回答你的问题之前，理解为什么使用NoSQL数据库是很重要的。

为什么使用NoSQL

In the case of RDBMS, making a choice is quite easy because all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost the same kind of solutions oriented toward ACID properties. When it comes to NoSQL, the decision becomes difficult because every NoSQL database offers different solutions and you have to understand which one is best suited for your app/system requirements. For example, MongoDB is fit for use cases where your system demands a schema-less document store. HBase might be fit for search engines, analyzing log data, or any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like trees, queues, linked lists, etc and can be a good fit for making real-time leaderboards, pub-sub kind of system. Similarly there are other databases in this category (Including Cassandra) which are fit for different problem statements. Now lets move to the original questions, and answer them one by one.

何时使用卡桑德拉

Being a part of the NoSQL family, Cassandra offers a solution for problems where one of your requirements is to have a very heavy write system and you want to have a quite responsive reporting system on top of that stored data. Consider the use case of Web analytics where log data is stored for each request and you want to built an analytical platform around it to count hits per hour, by browser, by IP, etc in a real time manner. You can refer to this blog post to understand more about the use cases where Cassandra fits in.

什么时候使用RDMS而不是Cassandra

Cassandra基于NoSQL数据库，不提供ACID和关系数据属性。如果您对ACID属性有强烈的需求(例如财务数据)，Cassandra将不适合这种情况。显然，您可以为此制定一个变通方案，但是您最终将编写大量的应用程序代码来模拟ACID属性，并将严重延误上市时间。同时，使用Cassandra管理这种系统对您来说也是复杂而乏味的。

什么时候不用卡桑德拉

我认为上面的解释是否有意义不需要回答。

2015-06-21 11:33:24

在部署Cassandra的过程中与某人交谈，它不能很好地处理多对多。他们正在做初步测试。我和Cassandra的顾问谈过这个问题，他说如果你有这样的习题集，他就不建议你这么做。

2010-06-06 22:21:04

除了这里的其他答案之外，沉重的单个查询与无数的轻查询负载是另一个需要考虑的问题。在nosql风格的DB中自动优化单个查询本身就比较困难。我使用过MongoDB，在尝试计算复杂查询时遇到了性能问题。我没有使用Cassandra，但我预计它会有同样的问题。

另一方面，如果您的负载预期是许多小型查询的负载，并且您希望能够轻松地向外扩展，那么您可以利用大多数NoSql数据库提供的最终一致性。注意，最终一致性实际上不是非关系数据模型的特性，但是在基于nosql的系统中实现和设置一致性要容易得多。

For a single, very heavy query, any modern RDBMS engine can do a decent job parallelizing parts of the query and take advantage of as much CPU and memory you throw at it (on a single machine). NoSql databases don't have enough information about the structure of the data to be able to make assumptions that will allow truly intelligent parallelization of a big query. They do allow you to easily scale out more servers (or cores) but once the query hits a complexity level you are basically forced to split it apart manually to parts that the NoSql engine knows how to deal with intelligently.

根据我使用MongoDB的经验，由于查询的复杂性，MongoDB最终无法对其进行优化，也无法在多个数据上运行部分查询。Mongo可以并行多个查询，但不太擅长优化单个查询。

2013-04-09 14:36:09

Apache cassandra是一个分布式数据库，用于跨许多商用服务器管理大量结构化数据，同时提供高可用性服务，没有单点故障。

该架构完全基于上限定理，即可用性和分区容忍，有趣的是最终一致。

不要使用它，如果你不存储数据卷的机架集群，如果您不存储时间序列数据，请不要使用，不要使用如果你不分区你的服务器，如果你要求强烈的一致性，请不要使用。

2017-12-07 23:48:46

Mongodb有非常强大的聚合函数和一个富有表现力的聚合框架。它具有许多开发人员习惯于从关系数据库世界中使用的特性。例如，它的文档数据/存储结构允许比Cassandra更复杂的数据模型。

当然，所有这些都是有代价的。因此，当您选择数据库(NoSQL、NewSQL或RDBMS)时，请考虑您要解决的问题和可伸缩性需求。没有一个数据库可以完成所有的工作。

2013-04-09 14:06:23

什么时候不使用Cassandra?

推荐文章

最新文章

标签