How Open Source is Driving the Future of Data Science

With its reliance on a community of physically dispersed individuals and flexibility of adoption, open-source data science is becoming an even more attractive choice among cash-strapped governments, non-profits, and businesses.

Over the past decade, data science and machine learning have made their way from an obscure academic discipline to widespread corporate adoption. The academic community has a natural preference towards open source. Science is a collaborative effort, and its advancement is best served by enabling as large a community as possible to build upon existing research.

Private companies, on the other hand, have a much stronger incentivefor proprietary technology. Developing software systems is an expensiveendeavor. Naturally, a business wants to make a return on this investment.Making the results of your work freely available to competitors doesnt seemlike the smartest choice if you are a business owner.

Still, in data science, several powerful incentives pull corporateinterests in the direction of favoring open-source implementations.

Open source tools offer a lower barrier to entry thanlicensed software. Companies can experiment more easily and with fewerconstraints. They are also more likely to find talent for programming languagesand data science tools that are freely available to everyone.

A case in point is Python, the dominant programming languagefor data science, which happens to be open source. It has the most versatileand extensive capabilities for manipulating data and building machine learningmodels. Python has even superseded commercial tools like MatLab in terms ofcapabilities for data science applications.

Most data science and machine learning frameworks such asTensorFlow, SciKit-Learn, or PyTorch build directly on Python and are also open-source.

Often, their creators are large companies that are alreadydominant in their respective markets. Evidently, the benefits of making alibrary like TensorFlow open-source outweigh the costs for its creator Google.

While Google gave potential competitors a powerful deeplearning tool, it probably benefits more from the massively expanded talentpool, the sprawling deep learning innovation, and the widespread adoption ofthe framework by other companies that open-sourcing TensorFlow entailed.

Other machine learning libraries, such as XGBoost,originatedas research projects in universities. For these institutions, the benefits ofopen-source software are overwhelming for the reasons discussed above.

Most machine learning models require large amounts of datato train. Modern machine learning models, especially deep neural networks usedin computer vision and natural language processing, require vast amounts ofcomputational resources to train. This would present an almost insurmountablechallenge for smaller organizations and individuals, who simply do not havethis amount of data internally, nor the budget to run expensive model trainingexperiments. If it werent for open source data, machine learning would bealmost exclusively the domain of large corporations. This may be in theinterest of the shareholders of said corporations, but certainly not of societyat large, which benefits from the innovations produced by startups andindividuals.

Even for large corporations, the widespread availability of open-sourcedata and pre-trained machine learning models has benefits.

Many of the cutting-edge models developed by researchers atcompanies like Google and Facebook have been open-sourced. Anyone can downloadthese models from Github and use them in their custom data science projects.

But why are these corporations so generous in sharing theirmodels and their data?

From the perspective of an established corporation, it makessense to avoid risky ventures and instead aim to expand market share throughmore traditional strategies.

Startups tend to be better suited for engaging in novelhigh-risk ventures because they are smaller, more agile, and have nothing tolose.

If a large company wants to enter a novel market, or obtainnew technology, acquiring a successful startup in the desired field may be asmarter move than trying to do everything from scratch in-house.

For example, Google acquired Deep Mind in 2014 for thepotential it saw in DeepMinds research in reinforcement learning andgeneral-purpose AI.

To maximize the potential for the emergence of innovativedata science and artificial intelligence startups, it makes sense to giveambitious new upstarts the tools and data they need.

Furthermore, many of the researchers working on commercialprojects come from academic settings. They bring with them a culture ofcollaboration based on open source.

Researchers and developers are naturally inclined toshowcase their work. Therefore, a commitment to open source and the opportunityfor employees to participate in open source projects can go a long way to makea company a more attractive employer for highly coveted data science talent.

The foundational knowledge for data science includesadvanced skills in mathematics, statistics, and programming. Until a few yearsago, this knowledge was deeply buried in academic textbooks and usuallyacquired by obtaining a technical university degree.

Today, an ambitious self-starter can learn all of thesethings via resources that are freely available on the web. An army of Youtubeeducators and bloggers has emerged that makes previously dry and highlyacademic topics accessible in a fun and easy-to-digest way.

These new educational resources grow the talent pool bymaking data science more accessible for a larger group of people, which alsobenefits companies.

Without open-source software and open-source data, offeringthis type of education for free would be much more difficult.

Online education platforms offer academic curricula that often match or exceed traditional university courses in terms of quality. In many cases, these courses are accompanied by Github repositories full of open source code.

Developing and maintaining a custom data science solutionfrom scratch in-house presents a major challenge to most companies. The largera software system grows, the more susceptible it is to bugs and the moredifficult it is to find problems in the source code and deploy the system intoproduction.

Building on open source software and models cansignificantly alleviate these burdens and speed up time to market. Bugs inwidely used open-source libraries are likely to have been discovered byprevious users. If bugs do occur,developers are free to go into the code and fix them without having to worryabout violating licensing agreements. If the open-source tool turns out to notbe a good fit, no money has been sunk on a failed trial.

Even for private businesses who have a commercial interestin protecting their software, there are strong incentives for using andbuilding open-source data science solutions.

More recently, the Covid-19 pandemic has put many organizations under enormous pressure to digitize data-heavy processes as quickly as possible while physically scattering technical talent. With its reliance on a community of physically dispersed individuals and flexibility of adoption, open-source data science is becoming an even more attractive choice among cash-strapped governments, non-profits, and businesses.

Originally posted here:
How Open Source is Driving the Future of Data Science - RTInsights

Research, Evaluation and Learning at the International Rescue Committee - World - ReliefWeb [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Conserving Biodiversity with AI - BBN Times [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
DevOps Fundamentals You Ever Wanted To Know - hackernoon.com [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Another Perspective on Evictions - Bacon's Rebellion [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Amitabh Bachchan on fans alternate job suggestion: My job is now insured - The Indian Express [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Will You Soon Download Packaging Machine Controls from the Internet? - Packaging Digest [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
5 free resources every data scientist should start using today - The Next Web [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Who's hoping to make an Epic impact on Green Bay area music scene with a new concert venue? | Streetwise - Green Bay Press Gazette [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Industrial robots are dominating but are they safe from cyber-attacks? - TechHQ [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Friday Rant - Rise of the Rogue-Bots? - Diginomica [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Important Reasons Why You Should Pick RoR As Your Web-Based Development Project - Customer Think [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Portrait of the software developer as an artist - ComputerWeekly.com [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Python may be your safest bet for a career in coding - Gadgets Now [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
1Password is coming to Linux - ZDNet [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
IBM creates an open source tool to simplify API documentation - TechRepublic [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Mastercard : Accelerate Ignites Next Generation of Fintech Disruptors and Partners to Build the Future of Commerce - Marketscreener.com [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Expanding the Universe of Haptics | by Lofelt | Aug, 2020 - Medium [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
UX Designer Salary: 5 Important Things to Know - Dice Insights [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Persistent memory reshaping advanced analytics to improve customer experiences - IT World Canada [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
NextCorps and SecondMuse Open Application Period for Programs that Help Climate Technology Startups Accelerate Hardware Manufacturing - GlobeNewswire [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Buried deep in the ice is the GitHub code vault humanity's safeguard against devastation - ABC News [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Top 12 Most Used Tools By Developers In 2020 - Analytics India Magazine [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Facebook's React 17 JavaScript library: Here's why its top feature is 'no new features' - ZDNet [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
CORRECTING and REPLACING Anyscale Hosts Inaugural Ray Summit on Scalable Python and Scalable Machine Learning - Business Wire [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Google: Here's how much we give to open source through our GitHub activity - ZDNet [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
How Chriselle Lim And Joan Nguyen Created Bmo, The Coworking Space And Virtual Classroom Of The Future (With A Childcare Twist) - Forbes [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
How Will Public Libraries Adapt To New School Year Norms? - Book Riot [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
Google: We'll test hiding the full URL in Chrome 86 to combat phishing - ZDNet [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
How to install Python 3 and PIP 3 on Ubuntu 20.04 LTS - Linux Shout - H2S Media [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
What are Bitcoin Wallets: Everything You Need to Know - Programming Insider [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
JSHint is Now Free Software after Updating License to MIT Expat - WP Tavern [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
How to learn JavaScript: These are the best online courses - Mashable [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
What developers need to know about inter-blockchain communication - ComputerWeekly.com [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Introducing the CDK construct library for the serverless LAMP stack - idk.dev [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
IBM asked software developers to take on the wrath of Mother Nature - The Drum [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Aspire Technology Launches First Truly Secure Public Blockchain for Creation of Digital Assets - GlobeNewswire [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
GM Creates And Shares New Workplace Safety Technologies - Pulse 2.0 [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Key Considerations and Tools for IP Protection of Computer Programs in Europe and Beyond - Lexology [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
The state of application security: What the statistics tell us - CSO Online [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Open Source: What's the delay on the former high/middle school on North Mulberry? - knoxpages.com [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
The Risks Associated with OSS and How to Mitigate Them - Security Boulevard [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
news digest: Microsoft launches open source website, TensorFlow Recorder released, and Stackery brings serverless to the Jamstack - SD Times -... [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Build Your Own PaaS with Crossplane: Kubernetes, OAM, and Core Workflows - InfoQ.com [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
ISRO Is Recruiting For Vacancies with Salary Upto Rs 54000: How to Apply - The Better India [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
Does technology increase the problem of racism and discrimination? - TechTarget [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
CORRECTING and REPLACING Anyscale Hosts Inaugural Ray Summit on Scalable Python and Scalable Machine Learning - Yahoo Finance [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
In the City: Take advantage of open recreation, cultural and park amenities - Coloradoan [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
Exploring the future of modern software development - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Hadoop Developer Interview Questions: What to Know to Land the Job - Dice Insights [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
SiFive Opens Business Unit to Build Chips With Arm and RISC-V Inside - Electronic Design [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Use Pulumi and Azure DevOps to deploy infrastructure as code - TechTarget [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Why ASP.NET Core Is Regarded As One Of The Best Frameworks For Building Highly Scalable And Modern Web Applications - WhaTech [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
NITK figures 4th in Google Summer of Code ranking - BusinessLine [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Learn More About Dynamo for Revit: Features, Functions, and News - ArchDaily [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Linux Foundation showcases the greater good of open source - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Programming language Kotlin 1.4 is out: This is how it's improved quality and performance - ZDNet [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Top 10 Languages That Paid Highest Salaries Worldwide In 2020 - Analytics India Magazine [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Programming language Rust: Mozilla job cuts have hit us badly but here's how we'll survive - ZDNet [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
In-App Bidding Gathers Steam, But Adoption Looks Nothing Like Header Bidding On The Web - AdExchanger [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
13 thoughts on Fitting Snake Into A QR Code - Hackaday [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Newham test and trace app was designed by man who grew up in the borough - Newham Recorder [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
'Trapped in a code' the fight over our algorithmic future - Open Democracy [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Telegram launches one-on-one video calls on iOS and Android - The Verge [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
AWS Controllers for Kubernetes Will Be A 'Boon For Developers' - CRN: Technology news for channel partners and solution providers [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Coding within company constraints - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Open Source and Open Standards: The Recipe for Success Featured - The Fast Mode [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
How Intel helped give the worlds first cyborg a voice - The Next Web [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Tiger Woods, Rory McIlroy near bottom of field at The Northern Trust - ESPN [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
Intel Owl OSINT tool automates the intel-gathering process using a single API - The Daily Swig [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
IOTA Foundation presents the current projects in the mobility industry - Crypto News Flash [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
How 'Fortnite' and 'Second Life' Shaped the Future of Indian Market - Santa Fe Reporter [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
Apple Enters $ 2 Trillion Club, Github's Chinese Counterpart And More In This Week's Top News - Analytics India Magazine [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
As world grapples with pandemic, schools are the epicenter - ABC News [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
Why Businesses Should Embrace Modernizing Their Legacy Applications - TechBullion [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
Is It Time To Rename RPG? - IT Jungle [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
Phantasy Star Online programmers on breaking new ground and their Diablo-style isometric prototype - Polygon [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
How To Learn To Program In Python By Playing Videogames - Analytics India Magazine [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
New Microsoft program to help develop the quantum computing workforce of the future in India - Microsoft [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
How the Docker Revolution Will Change Your Programming, Part 1 - Walter Bradley Center for Natural and Artificial Intelligence [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
The art of developing happy customers - ComputerWeekly.com [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]