When we started Spark, we wanted it to have a concise API for users, which Scala did well. At the same time, we wanted it to be fast (to work on large datasets), so many scripting languages didn’t fit the bill. Scala can be quite fast because it’s statically typed and it compiles in a known way to the JVM. Finally, running on the JVM also let us call into other Java-based big data systems, such as Cassandra, HDFS and HBase.
回复 ( 2 )
Matei Zaharia, CTO @ Databricks from Quora
When we started Spark, we wanted it to have a concise API for users, which Scala did well. At the same time, we wanted it to be fast (to work on large datasets), so many scripting languages didn’t fit the bill. Scala can be quite fast because it’s statically typed and it compiles in a known way to the JVM. Finally, running on the JVM also let us call into other Java-based big data systems, such as Cassandra, HDFS and HBase.
以上,为什么spark选择scala。
同样基于jvm,顺便支持java。
支持python是趋势,美帝搞data sciences很少不用python的。
顺便Hadoop发布好像十周年快了!恭喜一下!
Java开发系统级别软件的优势,我觉得是不言而喻的,这应该选择的首要原因。
加上Apache社区从来都是Java主打的。
而且从易用性和普及性考虑,一个开源软件在10年前用scala开发真的是作。
而且Spark还是AMP捐出来然后被Apache收了的。不然,出来就吊打MapReduce,面子不好搁,还是收了吧!哈哈哈!
挖个坑。慢慢补充。正好在看这方面内容,下班有时间再回答。
先上参考链接,其实google一下就能出来。
How does Apache Spark support different language APIs