猪和蜂巢的区别?为什么两者都有?

我的背景——在Hadoop世界工作了4周。使用Cloudera的Hadoop VM对Hive, Pig和Hadoop进行了一些尝试。已阅读谷歌关于Map-Reduce和GFS的论文(PDF链接)。

我明白——

猪的语言猪的拉丁语是一种转变来自(适合程序员的思维方式) SQL喜欢声明式的编程和Hive的查询语言密切相关类似于SQL。 Pig位于Hadoop之上原则也可以凌驾于之上德律阿得斯。我可能错了，但蜂巢错了与Hadoop紧密耦合。都是Pig Latin和Hive命令编译映射和减少作业。

我的问题是——当一个(比如猪)可以达到目的时，拥有两者的目标是什么?难道只是因为雅虎宣传了Pig !和Facebook的Hive ?

当前回答

在这个链接中阅读PIG和HIVE的区别。

http://www.aptibook.com/Articles/Pig-and-hive-advantages-disadvantages-features

给出了所有的方面。如果你不知道该选择哪个，那么你必须看看那个网页。

2013-09-05 16:39:25

其他回答

我发现这个是最有帮助的(尽管它已经有一年的历史了)——http://yahoohadoop.tumblr.com/post/98256601751/pig-and-hive-at-yahoo

它特别谈到了Pig vs Hive，以及他们在雅虎的工作时间和地点。我发现这很有见地。一些有趣的笔记:

关于数据集的增量更改/更新:

方法来连接新的增量数据并使用结果与以前的结果完全连接在一起就是正确的方法。这只需要几分钟。标准数据库操作可以以这种增量的方式在Pig Latin中实现，这使得Pig成为这个用例的好工具。

关于通过流媒体使用其他工具:

猪与流媒体的集成也使研究人员很容易使用他们已经调试过的Perl或Python脚本数据集，并在一个巨大的数据集上运行。

关于使用Hive进行数据仓库:

In both cases, the relational model and SQL are the best fit. Indeed, data warehousing has been one of the core use cases for SQL through much of its history. It has the right constructs to support the types of queries and tools that analysts want to use. And it is already in use by both the tools and users in the field. The Hadoop subproject Hive provides a SQL interface and relational model for Hadoop. The Hive team has begun work to integrate with BI tools via interfaces such as ODBC.

2011-11-22 20:04:31

我相信你的问题的真正答案是，它们是/是独立的项目，没有集中协调的目标。他们在早期处于不同的空间，随着两个项目的扩展，随着时间的推移逐渐重叠。

摘自Hadoop O'Reilly的书:

Pig:一种数据流语言探索环境非常大数据集。 Hive:分布式数据仓库

2010-07-28 19:08:16

有什么是HIVE可以做到的，而PIG做不到的?

分区可以使用HIVE完成，但不能在PIG中完成，这是一种绕过输出的方式。

什么是PIG可以做的，而在HIVE中是不可能的?

位置引用-即使你没有字段名，我们也可以使用像$0这样的位置来引用第一个字段，$1用于第二个字段，等等。

另一个基本区别是，PIG不需要一个模式来写值，但HIVE需要一个模式。

您可以使用JDBC和其他方法从任何外部应用程序连接到HIVE，但不能使用PIG。

注意:两者都运行在HDFS (hadoop分布式文件系统)上，语句被转换为Map Reduce程序。

2015-03-29 04:32:59

〇蜂巢Vs猪

Hive是一个SQL接口，允许SQL精明的用户或其他工具，如Tableu/Microstrategy/任何其他工具或语言，有SQL接口。

PIG更像是一个ETL管道，有一步一步的命令，比如声明变量、循环、迭代、条件语句等。

当我想编写复杂的分步逻辑时，我更喜欢编写Pig脚本而不是hive QL。当我很舒服地写一个sql拉数据我想我使用Hive。对于hive，你需要在查询之前定义表(就像你在RDBMS中做的那样)

两者的目的不同，但在引子下，两者都做相同的，转换为映射减少程序。此外，Apache开源社区正在为这两个项目添加越来越多的特性

2015-12-24 17:55:30

Pig-latin is data flow style, is more suitable for software engineer. While sql is more suitable for analytics person who are get used to sql. For complex task, for hive you have to manually to create temporary table to store intermediate data, but it is not necessary for pig. Pig-latin is suitable for complicated data structure( like small graph). There's a data structure in pig called DataBag which is a collection of Tuple. Sometimes you need to calculate metrics which involve multiple tuples ( there's a hidden link between tuples, in this case I would call it graph). In this case, it is very easy to write a UDF to calculate the metrics which involve multiple tuples. Of course it could be done in hive, but it is not so convenient as it is in pig. Writing UDF in pig much is easier than in Hive in my opinion. Pig has no metadata support, (or it is optional, in future it may integrate hcatalog). Hive has tables' metadata stored in database. You can debug pig script in local environment, but it would be hard for hive to do that. The reason is point 3. You need to set up hive metadata in your local environment, very time consuming.

2013-07-15 23:37:30

猪和蜂巢的区别?为什么两者都有?

推荐文章

最新文章

标签