Imagine that youre hiking, and you encounter an odd-looking winged bug thats almost bird-like. If you open the Seek app by iNaturalist and point it at the mystery critter, the camera screen will inform you that what youre looking at is called a hummingbird clearwing, a type of moth active during the day. In a sense, the Seek app works a lot like Pokmon Go, the popular augmented reality game from 2016 that had users searching outdoors for elusive fictional critters to capture.
Launched in 2018, Seek has a similar feel. Except when users point their camera to their surroundings, instead of encountering a Bulbasaur or a Butterfree, they might encounter real world plant bulbs and butterflies that their camera identifies in real-time. Users can learn about the types of plants and animals they come across, and can collect badges for finding different species, like reptiles, insects, birds, plants, and mushrooms.
How iNaturalist can correctly recognize (most of the time, at least) different living organisms is thanks to a machine-learning model that works off of data collected by its original app, which first debuted in 2008 and is simply called iNaturalist. Its goal is to help people connect to the richly animated natural world around them.
The iNaturalist platform, which boasts around 2 million users, is a mashup of social networking and citizen science where people can observe, document, share, discuss, learn more about nature, and create data for science and conservation. Outside of taking photos, the iNaturalist app has extended capabilities compared to the gamified Seek. It has a news tab, local wildlife guides, and organizations can also use the platform to host data collection projects that focus on certain areas or certain species of interest.
When new users join iNaturalist, theyre prompted to check a box that allows them to share their data with scientists (although you can still join if you dont check the box). Images and information about their location that users agree to share are tagged with a creative commons license, otherwise, its held under an all-rights reserved license. About 70 percent of the apps data on the platform is classified as creative commons. You can think of iNaturalist as this big open data pipe that just goes out there into the scientific community and is used by scientists in many ways that were totally surprised by, says Scott Loarie, co-director of iNaturalist.
This means that every time a user logs or photographs an animal, plant, or other organism, that becomes a data point thats streamed to a hub in the Amazon Web Services cloud. Its one out of over 300 datasets in the AWS open data registry. Currently, the hub for iNaturalist holds around 160 terabytes of images. The data collection is updated regularly and open for anyone to find and use. iNaturalists dataset is also part of the Global Biodiversity Information Facility, which brings together open datasets from around the world.
iNaturalists Seek is a great example of an organization doing something interesting and otherwise impossible without a large, open dataset. These kinds of datasets are both a hallmark and a driving force of scientific research in the information age, a period defined by the widespread use of powerful computers. They have become a new lens through which scientists view the world around us, and have enabled the creation of tools that also make science accessible to the public.
[Related: Your Flickr photos could help scientists keep tabs on wildlife]
iNaturalists machine learning model, for one, can help its users identify around 60,000 different species. Theres two million species living around the world, weve observed about one-sixth of them with at least one data point and one photo, says Loarie. But in order to do any sort of modeling or real synthesis or insight, you need about 100 data points [per species]. The teams goal is to have 2 million species represented. But that means they need more data and more users. Theyre trying to create new tools, as well, that help them spot weird data, correct errors, or even identify emerging invasive species. This goes along with open data. The best way to promote it is to get as little friction as possible in the movement of the data and the tools to access it, he adds.
Loarie believes that sharing data, software code, and ideas more openly can create further opportunities for science to advance. My background is in academia. When I was doing it, it was very much this publish or perish, your data stays on your laptop, and you hope no one else steals your data or scoops you [mindset], he says. One of the things thats really cool to see is how much more collaborative science has gotten over the last few decades. You can do science so much faster and at such bigger scales if youre more collaborative with it. And I think journals and institutions are becoming more amenable to it.
Over the last decade, open datadata that can be used, adapted, and shared by anyonehas been a boon in the scientific community, riding on a growing trend of more open science. Open science means that any raw data, analysis software, algorithms, papers, documents used in a project are shared early as part of the scientific process. In theory, this would make studies easier to reproduce.
In fact, many governments organizations and city offices are releasing open datasets to the public. A 2012 law requires New York City to share all of its non-confidential data collected by various agencies for city operation through an accessible web portal. In early spring, NYC hosts an open data week highlighting datasets and research that has used them. A central team at the Office of Technology and Information, along with data coordinators from each agency, helps establish standards and best practices, and maintain and manage the infrastructure for the open data program. But for researchers who want to outsource their data infrastructure, places like Amazon and CERN offer services to help organize and manage data.
[Related: The Ten Most Amazing Databases in the World]
This push towards open science was greatly accelerated during the recent COVID-19 pandemic, during which an unprecedented amount of discoveries were shared near-instantaneously for COVID-related research and equipment designs. Scientists rapidly publicized genetic information on the virus, which aided in vaccine development efforts.
If the folks who had done the sequencing had held it and guarded it, it wouldve slowed the whole process down, says John Durant, a science historian and director of the MIT Museum.
The move to open data is partly about trying to ensure transparency and reliability, he adds. How are you going to be confident that results being reported are reliable if they come out of a dataset you cant see, or an algorithmic process you cant explain, or a statistical analysis that you dont really understand? Then its very hard to have confidence in the results.
Open data cannot exist without lots and lots of data in the first place. In this glorious age of big data, this is an opportunity. From the time when I trained in biology, way back, you were using traditional techniques, the amount of information you hadthey were quite important, but they were small, says Durant. But today, you can generate information on an almost bewildering scale. Our ability to collect and accrue data has increased exponentially in the last few decades thanks to better computers, smarter software, and cheaper sensors.
A big dataset is almost like a universe of its own, Durant says. It has a potentially infinite number of internal mathematical features, correlations, and you can go fishing in this until you find something that looks interesting. Having the dataset open to the public means that different researchers can derive all kinds of insights from varying perspectives that deviate from the original intention for the data.
All sorts of new disciplines, or sub-discipline have emerged in the last few years which are derived from a change in the role of data, he adds, with data scientists and bioinformaticians as just two out of numerous examples. There are whole branches of science that are now sort of meta-scientific, where people dont actually collect data, but they go into a number of datasets and look for higher level generalizations.
Many of the traditional fields have also undergone technological revamps. Take the environmental sciences. If you want to cover more ground, more species, over a longer period of time, that becomes intractable for one person to manage without using technology tools or collaboration tools, says Loarie. That definitely pushed the ecology field more into the technical space. Im sure every field has a similar story like that.
[Related: Project Icarus is creating a living map of Earths animals]
But with an ever-growing amount of data, our ability to wrangle these numbers and stats manually becomes virtually impossible. You would only be able to handle these quantities of data using very advanced computing techniques. This is part of the scientific world we live in today, Durant adds.
Thats where machine learning algorithms come in. These are software or computer commands that can calculate statistical relationships in the data. Simple algorithms using limited amounts of data are still fairly comprehensive. If the computer makes an error, you can likely trace back to where the error occurred in the calculation. And if these are open source, then other scientists can look at the code instructions to see how the computer got the output from the input. But more often than not, AI algorithms are described as a black box, meaning that the researchers who created it dont even fully understand whats going on inside and how the machine is arriving at the decision its making. And that can lead to harmful biases.
This is one of the core challenges that the field faces. Algorithmic bias is a product of an age where we are using big data systems in ways that we do or sometimes dont fully have control over, or fully know and understand the implications of, Durant says. This is where making data and code open can help.
[Related: Artificial intelligence is everywhere now. This report shows how we got here.]
Another problem that researchers have to consider is maintaining the quality of big datasets, which can impinge on the effectiveness of analytics tools. This is where the peer-review process plays an important role. Loarie has observed that the field of data and computer science moves incredibly fast with publishing and getting findings out on the internet whether its through preprints, electronic conference papers, or some other form. I do think that the one thing that the electronic version of science struggles with is how to scale the peer-review process, which keeps misinformation at bay, he says. This kind of peer review is important, for example, in iNaturalists data processing, too. Loarie notes that although the quality of data from iNaturalist as a whole is very high, theres still a small amount of misinformation they have to check through community management.
Lastly, having science that is open creates a whole set of questions around how funding and incentives might changean issue that experts have been actively exploring. Storing huge amounts of data certainly is not free.
What people dont think about, that for us is almost more important, is that to move data around the internet, theres bandwidth charges, Loarie says. So, if someone were to download a million photos from the iNaturalist open data bucket, and wanted to do an analysis of it, just downloading that data incurs charges.
iNaturalist is a small nonprofit thats part of the California Academy of Sciences and National Geographic Society. Thats where Amazon is helping. The AWS Open Data Sponsorship Program, launched in 2009, covers the cost of storage and the bandwidth charges for datasets it deems of high value to user communities, Maggie Carter, global lead of AWS Global Social Impact says in an email. They also provide the computer codes needed to access the data and send out notifications when datasets are updated. Currently, they sponsor around 300 datasets through this program ranging from audio recordings of rainforests and whales to satellite imagery to DNA sequences to US Census data.
At a time where big data centers are getting closely scrutinized for their energy use, Amazon sees a centralized open data hub as more energy-efficient compared to everyone in the program hosting their own local storage infrastructure. We see natural efficiencies with an open data model. The whole premise of the AWS Open Data program is to store the data once, and then have everyone work on top of that one authoritative dataset. This means less duplicate data that needs to be stored elsewhere, Carter says, which she claims can result in a lower overall carbon footprint. Additionally, AWS is trying to run their operations with 100 percent renewable energy by 2025.
Despite challenges, Loarie thinks that useful and applicable data should be shared whenever possible. Many other scientists are onboard with this idea. Another platform from Cornell University, ebird, uses citizen science efforts as well to accrue open data for the scientific communityebird data has also translated back to tools for its users, like bird song ID that aims to make it easier and more engaging to interact with wildlife in nature. Outside of citizen science, some researchers, like those working to establish a Global Library of Underwater Biological Sound, are seeking to pool professionally collected data from several institutions and research groups together into a massive open dataset.
A lot of people hold onto data, and they hold onto proprietary algorithms, because they think thats the key to getting the revenue and the recognition thats going to help their program be sustainable, says Loarie. I think all of us who are involved in the open data world, were kinda taking a leap of faith that the advantages of this outweigh the cost.
Excerpt from:
Open data is a blessing for sciencebut it comes with its own curses - Popular Science
- Research, Evaluation and Learning at the International Rescue Committee - World - ReliefWeb [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Conserving Biodiversity with AI - BBN Times [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- DevOps Fundamentals You Ever Wanted To Know - hackernoon.com [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Another Perspective on Evictions - Bacon's Rebellion [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Amitabh Bachchan on fans alternate job suggestion: My job is now insured - The Indian Express [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Will You Soon Download Packaging Machine Controls from the Internet? - Packaging Digest [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- 5 free resources every data scientist should start using today - The Next Web [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Who's hoping to make an Epic impact on Green Bay area music scene with a new concert venue? | Streetwise - Green Bay Press Gazette [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Industrial robots are dominating but are they safe from cyber-attacks? - TechHQ [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Friday Rant - Rise of the Rogue-Bots? - Diginomica [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Important Reasons Why You Should Pick RoR As Your Web-Based Development Project - Customer Think [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Portrait of the software developer as an artist - ComputerWeekly.com [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Python may be your safest bet for a career in coding - Gadgets Now [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- 1Password is coming to Linux - ZDNet [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- IBM creates an open source tool to simplify API documentation - TechRepublic [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
- Mastercard : Accelerate Ignites Next Generation of Fintech Disruptors and Partners to Build the Future of Commerce - Marketscreener.com [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- Expanding the Universe of Haptics | by Lofelt | Aug, 2020 - Medium [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- UX Designer Salary: 5 Important Things to Know - Dice Insights [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- Persistent memory reshaping advanced analytics to improve customer experiences - IT World Canada [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- NextCorps and SecondMuse Open Application Period for Programs that Help Climate Technology Startups Accelerate Hardware Manufacturing - GlobeNewswire [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- Buried deep in the ice is the GitHub code vault humanity's safeguard against devastation - ABC News [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- Top 12 Most Used Tools By Developers In 2020 - Analytics India Magazine [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- Facebook's React 17 JavaScript library: Here's why its top feature is 'no new features' - ZDNet [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- CORRECTING and REPLACING Anyscale Hosts Inaugural Ray Summit on Scalable Python and Scalable Machine Learning - Business Wire [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- Google: Here's how much we give to open source through our GitHub activity - ZDNet [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
- How Chriselle Lim And Joan Nguyen Created Bmo, The Coworking Space And Virtual Classroom Of The Future (With A Childcare Twist) - Forbes [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- How Will Public Libraries Adapt To New School Year Norms? - Book Riot [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- Google: We'll test hiding the full URL in Chrome 86 to combat phishing - ZDNet [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- How to install Python 3 and PIP 3 on Ubuntu 20.04 LTS - Linux Shout - H2S Media [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- What are Bitcoin Wallets: Everything You Need to Know - Programming Insider [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- JSHint is Now Free Software after Updating License to MIT Expat - WP Tavern [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- How to learn JavaScript: These are the best online courses - Mashable [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
- What developers need to know about inter-blockchain communication - ComputerWeekly.com [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- Introducing the CDK construct library for the serverless LAMP stack - idk.dev [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- IBM asked software developers to take on the wrath of Mother Nature - The Drum [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- Aspire Technology Launches First Truly Secure Public Blockchain for Creation of Digital Assets - GlobeNewswire [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- GM Creates And Shares New Workplace Safety Technologies - Pulse 2.0 [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- Key Considerations and Tools for IP Protection of Computer Programs in Europe and Beyond - Lexology [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- The state of application security: What the statistics tell us - CSO Online [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- Open Source: What's the delay on the former high/middle school on North Mulberry? - knoxpages.com [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- The Risks Associated with OSS and How to Mitigate Them - Security Boulevard [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- news digest: Microsoft launches open source website, TensorFlow Recorder released, and Stackery brings serverless to the Jamstack - SD Times -... [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
- Build Your Own PaaS with Crossplane: Kubernetes, OAM, and Core Workflows - InfoQ.com [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
- ISRO Is Recruiting For Vacancies with Salary Upto Rs 54000: How to Apply - The Better India [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
- Does technology increase the problem of racism and discrimination? - TechTarget [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
- CORRECTING and REPLACING Anyscale Hosts Inaugural Ray Summit on Scalable Python and Scalable Machine Learning - Yahoo Finance [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
- In the City: Take advantage of open recreation, cultural and park amenities - Coloradoan [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
- Exploring the future of modern software development - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Hadoop Developer Interview Questions: What to Know to Land the Job - Dice Insights [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- SiFive Opens Business Unit to Build Chips With Arm and RISC-V Inside - Electronic Design [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Use Pulumi and Azure DevOps to deploy infrastructure as code - TechTarget [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Why ASP.NET Core Is Regarded As One Of The Best Frameworks For Building Highly Scalable And Modern Web Applications - WhaTech [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- NITK figures 4th in Google Summer of Code ranking - BusinessLine [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Learn More About Dynamo for Revit: Features, Functions, and News - ArchDaily [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Linux Foundation showcases the greater good of open source - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Programming language Kotlin 1.4 is out: This is how it's improved quality and performance - ZDNet [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Top 10 Languages That Paid Highest Salaries Worldwide In 2020 - Analytics India Magazine [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Programming language Rust: Mozilla job cuts have hit us badly but here's how we'll survive - ZDNet [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- In-App Bidding Gathers Steam, But Adoption Looks Nothing Like Header Bidding On The Web - AdExchanger [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- 13 thoughts on Fitting Snake Into A QR Code - Hackaday [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Newham test and trace app was designed by man who grew up in the borough - Newham Recorder [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- 'Trapped in a code' the fight over our algorithmic future - Open Democracy [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Telegram launches one-on-one video calls on iOS and Android - The Verge [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- AWS Controllers for Kubernetes Will Be A 'Boon For Developers' - CRN: Technology news for channel partners and solution providers [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Coding within company constraints - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Open Source and Open Standards: The Recipe for Success Featured - The Fast Mode [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- How Intel helped give the worlds first cyborg a voice - The Next Web [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
- Tiger Woods, Rory McIlroy near bottom of field at The Northern Trust - ESPN [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
- Intel Owl OSINT tool automates the intel-gathering process using a single API - The Daily Swig [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
- IOTA Foundation presents the current projects in the mobility industry - Crypto News Flash [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
- How 'Fortnite' and 'Second Life' Shaped the Future of Indian Market - Santa Fe Reporter [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
- Apple Enters $ 2 Trillion Club, Github's Chinese Counterpart And More In This Week's Top News - Analytics India Magazine [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
- As world grapples with pandemic, schools are the epicenter - ABC News [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- Why Businesses Should Embrace Modernizing Their Legacy Applications - TechBullion [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- Is It Time To Rename RPG? - IT Jungle [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- Phantasy Star Online programmers on breaking new ground and their Diablo-style isometric prototype - Polygon [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- How To Learn To Program In Python By Playing Videogames - Analytics India Magazine [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- New Microsoft program to help develop the quantum computing workforce of the future in India - Microsoft [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- How the Docker Revolution Will Change Your Programming, Part 1 - Walter Bradley Center for Natural and Artificial Intelligence [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
- The art of developing happy customers - ComputerWeekly.com [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]