Googles launch of Spark 3 and Hadoop 3 on its latest Dataproc 2.0 adds to the increasing sophistication of the open source environments and also continues to empower its enterprise customers to focus more on data workloads rather than infrastructure.
For data scientists, data explosion is a common business. Due to booming data usage and processing, data professionals had to switch from traditional servers to more flexible, open source, distributed, cluster environments such as Apache Spark and Hadoop which offer Python, Java, Scala, and R interfaces for any data size. Understanding this growing need of data professionals for the best cloud environment, Google has introduced Spark 3 and Hadoop 3 on its latest Dataproc image version 2.0.
What is Dataproc?
Belonging to the Google Cloud portfolio, Dataproc is a powerful tool that manages data processing workloads securely in the cloud. Data engineers and scientists can leverage this fully managed cloud service to run Apache Spark, Hadoop, Hive, and other Open Source Software (OSS) clusters at scale, without worrying about the infrastructure. Last year, Google offered the best of cloud and open source, with Cloud Dataproc on Google Kubernetes. With this, data professionals could deploy unified resource management and build resilient infrastructure across any environment at a lower price.
Some of the classic use cases for Dataproc are data processing from the Internet of Things (IoT) devices, analyzing business data for sales prospects, or to identify security challenges. Some of Google Cloud's prominent customers who have moved their on-premises Apache Hadoop to Google Cloud include Twitter, Vodafone, Pandora. Explaining the cost benefits and flexibility around Cloud Dataproc, James Malone, Product Manager at Google Cloud, says, "Customers who are migrating to the cloud from on-premises data centers often share a common complaint: their uncertainty around the costs invested in and benefits derived from their existing investments in Spark and Hadoop. Cloud economics can mitigate some of these concernsCloud Dataproc is specifically designed to stabilize pricing, even when you use your cluster ephemerally."
Tech News: New Backup-as-a-Service Solutions Address the Urgent Need for Data Protection
Google Cloud Dataproc uses image versions to bundle together operating systems, big data components, and Google Cloud Platform (GCP) connectors into a single package which is further deployed on a cluster. Since the images are updated regularly with new features and enhancements, the latest Dataproc image version 2.0 (currently in preview mode) offers a step function increase over the previous image versions and runs the latest iterations of Apache Spark and Hadoop clusters.
Ilias Papachristos, data analyst and a Lead Volunteer at the Google Development Group shares this code to create a Dataproc image version 2.0
Exploring Spark and Hadoop
It is difficult for a single computer to process petabytes of data, thus there is a growing need for a cluster of machines for data processing. But the tricky question is how do these cluster machines work to solve the data analytics process? Meet Spark and Hadoop.
Developed by Apache Software Foundation, like Hadoop, Spark is an open-source, distributed, parallel data processing framework that manages big data and machine learning applications in scalable clusters of computer servers. It offers a set of libraries in Java, Scala, and Python languages and can process data from data repositories such as the Hadoop Distributed File System (HDFS), NoSQL databases, and Apache Hive.
Tech News: AWS Launches the 6th Generation of EC2 Family Powered by AWS Graviton2 Processor
Spark has gained more popularity over Hadoop mainly for its speed, performance, and quick feedback loop.Bernard Marr, business influencer and bestselling author explained the growing popularity of Spark in this blog by highlighting its machine learning compatibility, "Spark has proven very popular and is used by many large companies for huge, multi-petabyte data storage and analysis. This has partly been because of its speed. Additionally, Spark has proven itself to be highly suited to machine learning applications."
Spark 3, the latest iteration of Apache Spark is currently in preview mode and the highlight of the new release is its performance optimization. Spark 3 will formulate end-to-end machine learning pipelines (data ingest, model training, and visualization) at a faster pace and reduced infrastructure costs. With Spark 3, data engineers can now perform adaptive queries, data pruning techniques (eliminating historical information from the database), and GPU acceleration. Moreover, the new Spark 3 has taken down a few functionalities such as Resilient Distributed Datasets, GraphX, and Python 2.7.
Additionally, the Hadoop 3 has exciting features such as native support for GPUs in the Yet Another Resource Negotiator (YARN) scheduler and YARN containerization. Christopher Crosbie, Product Manager, and Igor Dvorzhak, Software Engineer at Google Cloud explain, "In cloud-based deployments of Hadoop, there tends to be less reliance on Hadoop Distributed File System (HDFS) and YARN. HDFS storage will be substituted for Cloud Storage in most situations. YARN is still used for scheduling resources within a cluster, but in the cloud, Hadoop customers start to think about job and resource management at the cluster or VM level. Dataproc offers job-scoped clusters that are right-sized for the task at hand instead of being limited to just configuring a single clusters YARN queues with complex workload management policies."
Furthermore, the latest Dataproc image version 2.0 has modified the existing configuration settings to optimize OSS software and upgraded software and shared libraries to avoid runtime incompatibilities.
We think its a welcome addition to an increasingly sophisticated open-source environment!
Comment below or let us know on LinkedIn, Twitter, or Facebook. Wed love to hear from you!
Read the original here:
Big Push for Big Data Processing Performance and Speed with Google's Latest Dataproc 2.0 - Toolbox
- Wyplay’s Digital TV Middleware Source Code is Now Available to Members of the Frog by Wyplay Community [Last Updated On: January 5th, 2014] [Originally Added On: January 5th, 2014]
- Find Open Source Alternatives to commercial software | Open ... [Last Updated On: January 5th, 2014] [Originally Added On: January 5th, 2014]
- Open Source Initiative - Official Site [Last Updated On: January 5th, 2014] [Originally Added On: January 5th, 2014]
- SCALE 11x: Evolution of an Open Source Software Foundation - Stephen Walli - Video [Last Updated On: January 5th, 2014] [Originally Added On: January 5th, 2014]
- Bitcoin Baron Keeps a Secretive Open Source OS Alive [Last Updated On: January 22nd, 2014] [Originally Added On: January 22nd, 2014]
- osalt.com - Find Open Source Alternatives to commercial ... [Last Updated On: January 22nd, 2014] [Originally Added On: January 22nd, 2014]
- Sustainability of Open Source software communities beyond a fork - Video [Last Updated On: January 22nd, 2014] [Originally Added On: January 22nd, 2014]
- Bringing MoreWomen to Free and Open Source Software - Video [Last Updated On: January 22nd, 2014] [Originally Added On: January 22nd, 2014]
- Acquia podcast with Sensio Labs UK - Video [Last Updated On: January 22nd, 2014] [Originally Added On: January 22nd, 2014]
- xTuple ERP + OrangeHRM Open source software leaders integration - Video [Last Updated On: January 22nd, 2014] [Originally Added On: January 22nd, 2014]
- Guest articles setting out the author's position on the current status and future directions of KDE and its software [Last Updated On: January 23rd, 2014] [Originally Added On: January 23rd, 2014]
- Open Source Power for Small Business in 2014 [Last Updated On: January 23rd, 2014] [Originally Added On: January 23rd, 2014]
- EnterpriseDB Expands in Korea to Meet Rising Demand for Postgres [Last Updated On: January 24th, 2014] [Originally Added On: January 24th, 2014]
- Introduction to FOSS - Free and Open Source Software - Video [Last Updated On: January 24th, 2014] [Originally Added On: January 24th, 2014]
- Out in the Open: Teenage Hacker Transforms Web Into One Giant Bitcoin Network [Last Updated On: January 27th, 2014] [Originally Added On: January 27th, 2014]
- Who says that Open Source Software does not have support? By Rosaria Silipo - Video [Last Updated On: January 27th, 2014] [Originally Added On: January 27th, 2014]
- Microsoft Open Sources Its Internet Servers, Steps Into the Future [Last Updated On: January 28th, 2014] [Originally Added On: January 28th, 2014]
- Microsoft cloud server designs for Facebook's Open Compute Project [Last Updated On: January 28th, 2014] [Originally Added On: January 28th, 2014]
- Richard Stallman Free v Open Source Software - Video [Last Updated On: January 28th, 2014] [Originally Added On: January 28th, 2014]
- UK government looks to open source to cut costs [Last Updated On: January 30th, 2014] [Originally Added On: January 30th, 2014]
- Free Software + $20 USB Dongle = Software Defined Radio, Hak5 1524 - Video [Last Updated On: January 30th, 2014] [Originally Added On: January 30th, 2014]
- Libreoffice 4.2 challenges Microsoft Office with improved Windows integration [Last Updated On: January 31st, 2014] [Originally Added On: January 31st, 2014]
- Fallout 3 Let's Play Pt 6 - Video [Last Updated On: February 1st, 2014] [Originally Added On: February 1st, 2014]
- 14 1 29 Tom G Open Source Software 1 - Video [Last Updated On: February 1st, 2014] [Originally Added On: February 1st, 2014]
- 14 1 29 Tom G Open Source Software - Video [Last Updated On: February 1st, 2014] [Originally Added On: February 1st, 2014]
- How is open source software like great wine? - Video [Last Updated On: February 3rd, 2014] [Originally Added On: February 3rd, 2014]
- Free and open source software key for multicore hardware [Last Updated On: February 4th, 2014] [Originally Added On: February 4th, 2014]
- Blender Tutorial - 2D Animation (1) Bone Rigging, Shape Character Planes by VscorpianC - Video [Last Updated On: February 4th, 2014] [Originally Added On: February 4th, 2014]
- Obama Bit Coin Conspiracy? - Video [Last Updated On: February 4th, 2014] [Originally Added On: February 4th, 2014]
- The Pentagon's Mad Science Is Going Open Source [Last Updated On: February 5th, 2014] [Originally Added On: February 5th, 2014]
- The open source countdown has begun [Last Updated On: February 6th, 2014] [Originally Added On: February 6th, 2014]
- BLOG: Why open source will rule the data centre [Last Updated On: February 6th, 2014] [Originally Added On: February 6th, 2014]
- OpenDaylight Summit: SDN Needs Open Source and Open Standards [Last Updated On: February 10th, 2014] [Originally Added On: February 10th, 2014]
- 7 reasons not to use open source software [Last Updated On: February 12th, 2014] [Originally Added On: February 12th, 2014]
- The Open Source Initiative | Open Source Initiative [Last Updated On: February 12th, 2014] [Originally Added On: February 12th, 2014]
- Find Open Source Alternatives to commercial software ... [Last Updated On: February 12th, 2014] [Originally Added On: February 12th, 2014]
- Has Linux Conquered the Cloud? [Last Updated On: February 13th, 2014] [Originally Added On: February 13th, 2014]
- The New eRacks/NAS36 Rackmount Storage Server Achieves Price/Density Breakthrough: 100TB Storage in Only 4U for Under ... [Last Updated On: February 14th, 2014] [Originally Added On: February 14th, 2014]
- 2012 Red Hat Summit Build a PaaS using Open Source Software ~ Redhat Linux Video YouTube - Video [Last Updated On: February 14th, 2014] [Originally Added On: February 14th, 2014]
- Intel launches big data software suite - free to a good home [Last Updated On: February 15th, 2014] [Originally Added On: February 15th, 2014]
- Three college students build a health provider search site in six weeks [Last Updated On: February 16th, 2014] [Originally Added On: February 16th, 2014]
- The Asgard Show Episode 6 - Video [Last Updated On: February 16th, 2014] [Originally Added On: February 16th, 2014]
- Open source startups: Don't try to be Red Hat [Last Updated On: February 18th, 2014] [Originally Added On: February 18th, 2014]
- Open Source in the Enterprise: To Pay or Not to Pay? [Last Updated On: February 18th, 2014] [Originally Added On: February 18th, 2014]
- DEF CON 12 - Wendy Seltzer and Seth Schoen, Hacking the Spectrum - Video [Last Updated On: February 18th, 2014] [Originally Added On: February 18th, 2014]
- dev@Pulse Speaker Predictions - Jonathan Bryce - Video [Last Updated On: February 19th, 2014] [Originally Added On: February 19th, 2014]
- Facebook Boosts Its Open Source Mojo With New Project [Last Updated On: February 20th, 2014] [Originally Added On: February 20th, 2014]
- Raising Linux to Grow Open Source [Last Updated On: February 20th, 2014] [Originally Added On: February 20th, 2014]
- Apple Veteran Named PayPal's First Head of Open Source Software [Last Updated On: February 20th, 2014] [Originally Added On: February 20th, 2014]
- Open Source Software | 46 of 62 | MconneX - Video [Last Updated On: February 20th, 2014] [Originally Added On: February 20th, 2014]
- News Flash from Redmond: FOSS Causes Dissatisfaction! [Last Updated On: February 25th, 2014] [Originally Added On: February 25th, 2014]
- FOSS4G with Eric Brelsford - Video [Last Updated On: February 25th, 2014] [Originally Added On: February 25th, 2014]
- NYLUG Presents: Mark Tolliver on Palamida. Application Security for Open Source Software (6/25/08) - Video [Last Updated On: February 25th, 2014] [Originally Added On: February 25th, 2014]
- DARPA Open Catalog Makes Agency-Sponsored Software and Publications Available to All [Last Updated On: February 25th, 2014] [Originally Added On: February 25th, 2014]
- Munich opts for open source groupware from Kolab [Last Updated On: February 26th, 2014] [Originally Added On: February 26th, 2014]
- Modelling Hands Step by Step Using Free Open Source Software Seamless3d 3 - Video [Last Updated On: February 27th, 2014] [Originally Added On: February 27th, 2014]
- Accelerating the Network with Open Source Software, Erik Ekudden | OpenDaylight Summit 2014 - Video [Last Updated On: February 27th, 2014] [Originally Added On: February 27th, 2014]
- The Commercial Case for Open Source Software [Last Updated On: March 1st, 2014] [Originally Added On: March 1st, 2014]
- Beginners guide to contributing to open source software - Video [Last Updated On: March 3rd, 2014] [Originally Added On: March 3rd, 2014]
- Free Open Source Software [Last Updated On: March 4th, 2014] [Originally Added On: March 4th, 2014]
- Open Source Software - Video [Last Updated On: March 4th, 2014] [Originally Added On: March 4th, 2014]
- Open Source Software EDTC5325 - Video [Last Updated On: March 6th, 2014] [Originally Added On: March 6th, 2014]
- Broadcom Announces Open Switch Pipeline Specification Targeting Growing SDN Application Ecosystem [Last Updated On: March 7th, 2014] [Originally Added On: March 7th, 2014]
- RIT launches nation’s first minor in free and open source software and free culture [Last Updated On: March 7th, 2014] [Originally Added On: March 7th, 2014]
- Forum created to push optical SDNs [Last Updated On: March 10th, 2014] [Originally Added On: March 10th, 2014]
- Google embraces open source for 10th year of Summer of Code [Last Updated On: March 10th, 2014] [Originally Added On: March 10th, 2014]
- Is Open Source Software The Answer to Oregon's IT Problems? [Last Updated On: March 11th, 2014] [Originally Added On: March 11th, 2014]
- Spenden Ticketautomat mit Open Source Software auf der CeBIT 2014, CMS Garden - Video [Last Updated On: March 14th, 2014] [Originally Added On: March 14th, 2014]
- 2012 Red Hat Summit Build a PaaS using Open Source Software - Video [Last Updated On: March 14th, 2014] [Originally Added On: March 14th, 2014]
- CyanogenMod receiving Linux New Media Award 2014 (Best Open Source Software App for Android) - Video [Last Updated On: March 15th, 2014] [Originally Added On: March 15th, 2014]
- Real tech 25 Finding open source software you can trust - Video [Last Updated On: March 15th, 2014] [Originally Added On: March 15th, 2014]
- Tor is building an anonymous instant messenger [Last Updated On: April 10th, 2017] [Originally Added On: March 15th, 2014]
- MailPile is now in Alpha [Last Updated On: April 10th, 2017] [Originally Added On: March 15th, 2014]
- $2,400 “Introduction to Linux” course will be free and online this summer [Last Updated On: April 10th, 2017] [Originally Added On: March 16th, 2014]
- Linaro announces MediaTek as member [Last Updated On: March 18th, 2014] [Originally Added On: March 18th, 2014]
- TN state departments asked to switch over to open source software [Last Updated On: March 18th, 2014] [Originally Added On: March 18th, 2014]
- Open source project builds mobile networks without big carriers [Last Updated On: March 18th, 2014] [Originally Added On: March 18th, 2014]
- Your U.S. government uses open source software, and loves it [Last Updated On: March 18th, 2014] [Originally Added On: March 18th, 2014]
- Linux Goes to the Head of the Class [Last Updated On: March 22nd, 2014] [Originally Added On: March 22nd, 2014]
- What is open source? - Definition from WhatIs.com [Last Updated On: March 23rd, 2014] [Originally Added On: March 23rd, 2014]