Python is believed to be a great language for geospatial projects. Anita Graseris a legendary open-source geospatialPython expert. Shes been working withQGISand Python since 2008 as an integration solution to automate mapping and to look at data in different fashions, not just from the command line or in graphs but also in maps.Lets hear from her why Python may or may not be a good option for your GIS project.
A. It wasnt always clear thatPython would bethe best language for GIS. Not until ArcPy and PyQGIS came out around 12 years ago. These two implementations taught us that Python isversatileandeasy to learn, and you can manipulate data with it.Who in the GIS world wouldnt want to use a flexible tool for wrangling their data from a file or a database into something usable? Python does precisely that.
Its alsoeasy to interfaceit withPostgreSQL and PostGIS, andthe possibilities are endlessfrom then on for automating workflows with scripts. For model builders, for example, its possible to export models as Python scripts or write them from scratch in whichever workflow you prefer. There is also a vast opportunity of building extensions for desktop GIS and server-side GIS applications using Python with plugins in open-source as well as in proprietary systems.
There are many reasons why Python is now the universal language of GIS its a glue that holds things together.Once you know Python and realize its usefulness for geospatial data manipulation,you are no longer just pushing buttonsprovided to you, you are in control and have the freedom tocreate your own tools and processes. It has an element of self-documentation thats hard to find. You cant forget to document a certain parameter when youre writing code, and you can look it up later if you need to go back. This is helpful in cases when you inherit someone elses workflow.
Python is widely adopted in the geospatial worldand as such geospatial processes written in python are sharable and repeatable. While there may be different environmental variables that need to be tweaked and data that also needs to be shared, it is possible to share your work and let others use your code and build on top of your work.
A. If you already know some programming language, its possible to get into geospatial and apply Python specifics as you go because itsnot a hard language to learn.
If you dont have a programming background, youd be smart to cover the basics first, such as loops, functions, classes.
In both cases, most users,especially GIS people, do better if they have geospatial specific motivation and inspiration.They want to see something on a map, really quick. They want the first steps into this new, unknown to be related to what they do in geospatial.
A good intro to writing Python code is to create a model in a graphical model building and then export it a Python script. You can play around with feeding data the different parameters in the script and see how they affect the outcome. This also gives you an understanding of how Python code is structured and how the different components are chained together.
When people see that theyre not tied to the standard tools in the graphical interface, they realize how flexible programming is and how much they can get out of a model builder. This is real motivation.
Model builder scripts are only the first step. Once you start executing things outside of the program, like manipulating parameters, youll come across things you cant solve quickly with a model builder. Knowing Python and howyou can program something from scratch is a great motivation.
A. Scalais efficient and advanced. Knowing Scala and Java is immensely helpful they are related and can be used in combination with each other. Either of those would be able to solve challenges for large datasets that need to be manipulated effectively in distributed computing environments.
A. GeoPandas is a relatively new, open-source library thats a spatial extension for another library calledPandas.Its been around since 2008, and its been designed to make data analysis easy.
Pandas uses a concept called data frames theyre tables of data or time series of data if indexed by timestamp. Pandas acts like a database by putting on indexes to filter the data.
It comes with convenient functions to read and write files with missing numbers. If you have null values (no measurements have been recorded in a time series, for example), Pandas gives you options to calculate values for those rows or correctly interpret the null value in the same way a database would.
Thiscould be the last observed value or the interpolation between the previously observed value and the following value thats in the data set. Who doesnt want these functions when working with real-world data?
The Pandas library also comes with the ability to pivot and reshape tables and groups, do merges and plot.
Theres a lot you can then do in Python that generally requires a database. You can write a standalone script and no longer depend on a database or having to carry out your data analysis in cookie-cutter ways.
In 2013, GeoPandas entered the scene and made it possible to store geometries in the data frames (much like Postgres and PostGIS) by building on the existing Pandas libraries. Libraries such as:
GeoPandas is a fantastic tool for geospatial programmers because its easy to write standalone code that can be used outside of the typical desktop GIS environment. Its a good choice for non-GIS programmers who are familiar with Pandas and it makes it easier to build geospatial capabilities into existing python codebases without the need to install desktop environments like QGIS or ArcGIS.
Good programmers take whats working (GeoPandas) and build on top of it or extend it. There is no need to reinvent the wheel every time. Use whats already working and build a component yourself that will solve your particular problem. If Fiona has been reading your geospatial file formats for years, integrate that.Assemble compatible modules the nature of models is evolving, and versions keep changing so remember to check their compatibility!
A. You should always follow the installation instructions of the respective library you use.They know the current working configuration best. In the case of GeoPandas, useConda installation. (Python installations come with PIP for package installing.
PIP, however, doesnt work with some of the GeoPandas dependencies, particularly on Windows.) Conda is therefore recommended by GeoPandas to cover all major operating systems. You can run Conda from the command line or use a desktop application,Anaconda, with a graphical user interface. It lists available packages, and you can click the ones you want. It will automatically resolve dependencies and install the correct versions to ensure a working environment.
Once youve done your set up, Anaconda has multiple IDEs (Integrated Development Environment) or editors.SpyderandPyCharmare two options, they are available for free or with a free community edition, respectively. PyCharm has the advantage that it has the exact same layout as IntelliJ a popular Java editor that Java developers are familiar with. It has convenient functions for refactoring and making it easy to read code thats self-explanatory.
A. If you work with movement data, you need a specific tool. There is a library calledMovingPandas, and if you have vehicles, people, or goods that move and you need to track them or analyze the data, its a library you should go to and use.
A. Python has proven to be a reliable companion to data scientists from variousdifferent backgrounds. Libraries like GeoPandas fill the gap between nonspatial data scientists and people with geospatial expertise. They can work together on integrating spatial analysis capabilities and machine learning, deep learning, and AI that most data scientists work with.
For research, there is considerable potential toimprove reproducibility, particularly with technologies such asJupyter Notebooks. You can record and analyze step by step and show the intermediate results and the plots you might generate for a report or for a scientific paper in the context of that code.
In the past, you wrote a script, you ran it, and it dumped images into a directory. You then looked at both sides to find a figure in the output directory and decide if it made sense and reflected on what was going on.
In Jupyter Notebooks, you execute one part of the notebook, called a cell, and it will immediately plot the output under that cell. It can be text or interactive graphs, such as a leaflet map or a plot. You can see how this would make it easier to debug issues and understand the data analysis flow. If youve ever had the honor to inherit someone elses data processing workflow, youll appreciate this step by step debugging functionality and managing the code.
Pythons popularity is stillon the rise, and there arent many contenders on the horizon. There is something for everyone in Python.Its easy to get intoas a beginner, and itsefficient, especially if you can write some parts in CPython, which is what under the hood users see, and its much more performant. Once you get into Python, there arent too many reasons why youd want to abandon it.
People from the Java community, and people who work in Big Data settings (Hadoop and Spark), have started tobuild a bridge to Python. PySpark allows Python to interface with these Java Virtual Machine worlds and Big Data settings it will be around for a long time, and I encourage people to learn Python.
A. If youre working with a pre-established system thats a Java-based language, its not recommended that you introduce this interface without a valid reason to do so. Youd be better off sticking to the Java world. There are libraries for geospatial use such as GeoTools.Mixing and matching languages isnt a good idea.
If you are starting from scratch, and your work is related to data science, then use Python.
Thus, the reasons to learn Python are many. Hope you feel inspired to begin your journey to discovery!
See the article here:
Geospatial Python: Do you need to learn it? - Geospatial World