New spatial functions in BigQuery, starting with Uber H3-js
As you may have seen in last week's announcement we are now working very closely with the Google BigQuery team to support the creation of a next generation spatial data infrastructure stack.
BigQuery GIS does have support for some spatial functions but this is a limited set that we believe will continue growing over time. In the meantime if you need to perform a spatial operation that is not supported out of the box you can create your own User Defined Functions. UDFs in BigQuery can be programmed in Javascript for which there are a lot of spatial libraries already available. In order to use them you have to make slight modifications push them to cloud storage and then reference them which can be quite time consuming. Right now every user of BigQuery would have to do the same which is also a bit of a pain.
Luckily BigQuery works as if it were a huge multitenant database where all the databases of all users are on the same server and there are only permissions separating them. This means you can create a database and give permissions to the entire world so that everyone can use it for doing joins or calling functions. This is how the public data project works and the reason why we are collaborating to add location datasets. This means there is no need to install anything allowing you to use these functions directly on your queries. For example:
{% highlight sql %}SELECT jslibs.h3.ST_H3(ST_GEOGPOINT(-74.00 40.7128) 7){% endhighlight %}
And all of these UDFs functions are now available as an Open Source project and live directly on BigQuery for anybody to use.
BigQuery JS Libs: A repository of pre-packaged libraries to be used as functions inside BigQuery.
Here is a more complex SQL example:
{% highlight sql %}WITH data as ( SELECT jslibs.h3.compact( jslibs.h3.ST_H3_POLYFILLFROMGEOG(tract_geom 11)) as geo FROM bigquery-public-data.geo_census_tracts.census_tracts_new_york WHERE geo_id='36081000100')
SELECT jslibs.h3.ST_H3_BOUNDARY(h3) as h3geo FROM data UNNEST(geo) as h3{% endhighlight %}
Where Can You Run These functions?
The functions are deployed and ready to use so they can be used as if they were just another regular function: jslibs.h3.ST_H3. Again no need to install anything just use them on your regular SQL on BigQuery.
As most of these functions are for spatial operations it is useful to visualize them. For visualization you can use BigQuery Geo Viz QGIS or our CARTOframes library! Check it out in Google Collab.
What About Information Security?
Just a quick note that although all the functions are stored in a CARTO BigQuery project when you use them in your SQL we do not get to "see" or know that the functions are being used so there is no need to worry from a security perspective.
Starting with Uber H3
The open source project H3 is a hexagonal hierarchical geospatial indexing system which is really useful when working with grid structures and we use it extensively at CARTO as part of our Data Observatory.
{% include icons/icon-bookmark.svg %} Data Scientist Isaac Brodsky presented Uber's H3 project at this year's Spatial Data Science Conference Watch the recording today!
For this reason we ported most of the H3 JS library API into BigQuery allowing you to run the majority of H3 API functions. The best way to know which functions have been ported is just to look in the Github repo.
We have ported one single function (buffer) from the open source library Turf.JS to make buffers. Lots of Data Scientists need to do buffers in BigQuery so here they are:
{% highlight sql %}SELECT jslibs.turf.ST_BUFFER(ST_GEOGPOINT(-74.00 40.7128) 1 'kilometers' 10) as geo{% endhighlight %}
We have also added some support for Quadkeys and S2 libraries but this is just the beginning.
If you have proposals for new functions to make available or a specific library that you would like to see ported please send us a Pull Request on the project or send us a note.
Special thanks to Mamata Akella CARTO's Head of Cartography for her support in the creation of the maps in this post.