Databricks support for H3 in collaboration with CARTO


Introducing H3 for Databricks with CARTO! Unlock fast and efficient big data analytics in the cloud with Spatial Indexes

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
Databricks support for H3 in collaboration with CARTO

Over the past few months we have been working with Databricks to add built-in support for H3  and this added functionality was released recently. Native support for H3 means less friction when using H3 Spatial Indexes in Databricks  it also brings major speed improvements thanks to Photon acceleration.

We have been using these new H3 capabilities in CARTO for some time now and we are pretty excited about the impressive speed improvements.

This is a significant development that supports our mission to unlock massive-scale spatial analytics natively in the Lakehouse platform.

H3 is a powerful global hierarchical grid system - sometimes referred to as a Spatial Index - which enables the processing and analysis of truly big geospatial data. By leveraging the power of H3  Databricks users can now perform faster and more efficient workflows  as well as unlock totally new types of spatial analysis.

The announcement from Databricks - which you can read more about here - comes as part of their milestone 11.2 Runtime release and offers huge advancements for users who are processing and analyzing geospatial data.

CARTO has collaborated closely with Databricks to be able bring these latest advances to our cloud native Location Intelligence platform. Our expertise and knowledge of the spatial problems that Databricks customers are solving helped to inform the H3 roadmap. In particular  we worked with Databricks to define a list of geospatial functions that customers need to be able to get the most value from H3  which you can read about later in this post!

To see H3 in action on Databricks using CARTO  check out our presentation at the recent DATA & AI Summit.

What is H3?

H3 is a global hierarchical grid system which was developed to efficiently manage and analyze large geospatial datasets. The concept of a global hierarchical grid is simple; they consist of multiple resolutions which “map” directly to each other. H3 consists of 16 resolutions ranging from having an area of 4 million km² to 0.895m². Within each “parent” hexagonal cell  7 “child” hexagonal cells can be found.

Globe with H3

What are the advantages of using H3?

There are a number of advantages to using H3 for your spatial analytics  including:

  •  Efficiency and performance:H3 is stored as an index (a string variable) rather than a complex geometry variable. This makes them smaller to store and faster to analyze.  
  •  Scalability: harness the true  highly distributed power of Spatial Data Warehouses. Unlike with geometries  query costs do not increase exponentially with bigger  more complex areas.  
  •  Flexibility: cross-analyze data from multiple geographies when aggregated to one grid system  often referred to as a “Support Geography.”  
  •  Collaborate: easily share and transfer data with other H3 users by calling from the same index referencing system.  
  •  Analyze: converting data into a continuous grid systems enables new forms of analysis.  

Want to learn more about Spatial Indexes? Check out our blog post to find out why hexagons are such a powerful tool for Location Intelligence.

Using H3 in Databricks with CARTO

Thanks to our Spatial Extension for Databricks  CARTO users can connect directly to their Databricks cluster to access data and perform massive-scale data visualization and analytics. The latest enhancements to the Databricks platform brings added H3 functionality to allow dynamic aggregation natively within Databricks. And in addition  our spatial data catalog opens up a wealth of H3-indexed datasets that can be used for highly efficient data enrichment workflows.

This Databricks release includes 28 native H3 functions. These include:

  • Functions to generate H3 cells and grids from geometries or well known text (WKT)  such as h3_polyfillash3() where users can generate a H3 grid of a defined resolution to cover the extent of a polygon. Similarly  h3_longlatash3() can be used to generate a H3 cell at a defined coordinate.
  • The reverse of this; functions which create a geometry feature from H3. These include h3_boundaryaswkt() - which converts a H3 cell into a polygon - and h3_centeraswkb() - which converts a H3 cell into a point at its centroid. Having the ability to easily move between H3 and geometry types is very useful for running spatial filters and joins.
  • Distance-based functions which are far cheaper to compute than analyzing distance based on geometry functions such as ST_DISTANCE(). For instance  h3_kring() creates a ring of H3 cells around an origin cell for a defined grid distance  and h3_distance() will return the grid distance between two cells.
  • Functions to move between resolutions  such as h3_tochildren() and h3_toparent().

…And many more! Check out the full list here.

Getting started with H3 in CARTO: An example

The example SQL code below illustrates how a point table can easily be aggregated to a H3 grid. This code can be executed in your Databricks console or directly in CARTO Builder. Once executed  this query can be visualized as a dynamic H3 layer (using the H3 field)  which renders faster than a conventional geometry table and is rendered dynamically; the more you zoom in  the more detailed the rendering.

--define inputs
points AS (SELECT geom  value from POINTS) 
study_area AS (SELECT carto.ST_MAKEPOLYGON(carto.ST_GEOMFROMWKT('LINESTRING(long lat  long lat  long lat  long lat)')) AS geom) 
h3grid AS (SELECT h3_polyfillash3(geom) as h3 FROM study_area)

--aggregate the point variables to h3 cells  based on the geospatial relationship "contains"
SELECT h3grid.h3  COUNT(points.geom) AS count  SUM(points.value) AS value_total
FROM h3grid
LEFT JOIN carto.ST_CONTAINS(h3_boundaryasgeojson (h3index.h3)  points.geom)

In addition to the new H3 functions available in Databricks  users can also leverage functions from CARTO’s Analytics Toolbox to undertake complex geospatial analysis. Check out our guide to installing our Analytics Toolbox for Databricks here and start unleashing the power of big spatial data!

Our roadmap includes even more enhancements to support CARTO users running advanced spatial analytics in the Databricks Lakehouse platform. Stay tuned for more exciting announcements in the coming weeks!

And if you would like to test drive the CARTO Location Intelligence platform in full  why not sign up for our free 14-day trial!