MVT generation: Mapnik vs PostGIS
With the last release of PostGIS a new set of functions were added to generate Mapbox vector tiles (MVT). The most important of these functions is ST_AsMVT which allows users to create a full tile from a SQL query. Björn Harrtell completed the majority of the project with partial sponsorship from CARTO.
You might be wondering "What's an MVT"? A map is usually created with a set of small images called tiles that are rendered together to create a mosaic. MVT's instead of using small images are created using the data that is stored in those tiles. That allows for rendering of maps client-side with WebGL (BTW we are hiring someone who loves rendering stuff on the client or any other render technology.
At CARTO we use Mapnik to render those tiles using a combination of Postgres & PostGIS. We wondered if rendering those tiles straight from Postgres/PostGIS would be faster.
The benchmark
First of all I created a benchmark took it as a baseline and then compared PostGIS'sST_AsMVT against Mapnik's MVT renderer.
The benchmark was to generate a tile with a lot ofpoints and then check how long it takes to render.
So let's get our hands dirty and create a tile with 100K points:
Mapnik
Let's now request the tile with 100K points using our Maps API:
Out of 223 ms what portion of that time is spent in DBquerying features versus encoding them into the MVT?
Let's instrument mapnik in order to help answer that question.
First we have to locate the interesting pieces of code:
- here's the datasource that queries the database
- and here's the processor which encodes vector tiles
Using systemtap we can hook into those functions to getan idea of how much time is spent retrieving features versus encodingthem into an MVT. Here's a simple script to do that:
then we can request the same tile and gather execution times:
This tells us that the update_tile function takes about 217 ms ofwhich 93 ms are spent retrieving features from the database.
If we execute the query used to retrieve features directly from the database weget the following numbers:
that is significantly faster than the execution times reported inthe database log:
This is due to the mapnik postgis input plugin iteratingon the result set from postgres WKB format and transforming it to themapnik internal format.
Don't freak out at those magic numbers in the ST_Simplify call they are calculated using the tile zoom to simplify the geometry to a resolution enough to be rendered correctly but without sending too many vertices to the rendering process.
So roughly speaking out of those 223 ms:
- 93 ms are spent retrieving data and transforming it
- 124 ms are spent on the actual encoding of the tile
- the rest (around 6 ms) is overhead because of authorization datatransfer etc.
ST_AsMVT
What if we made a similar query to spit the MVT straight from the database?
Note we removed the ST_AsBinary. It is not required by ST_AsMVT as it works with PostGIS native geometry types (saving some time as well).
We may also consider removing the ST_Simplify and ST_SnapToGrid as they aim to facilitate raster rendering and it's done by the MVT encoding mechanism. On the other hand the equivalent query also requires a transformation from webmercator to the coordinate space of an MVT which is achieved through ST_AsMVTGeom. The final query looks like this:
The execution time is significantly worse than in the previous case. Why is that?
In order to answer that question we need to look into how ST_AsMVT and ST_AsMVTGeom are implemented:
- ST_AsMVT is actually anaggregate.Here isan interesting discussion about that design choice for theinterface. As mentioned in that thread there are some advantages tohaving the data first before encoding. However nothingprevents you from concatenating MVT's (e.g: from different layers)into a single tile.
- ST_AsMVTGeom is meant to transform an arbitrary geometry into thecoordinate space of an MVT. It is kind of a requirement ofST_AsMVT to have the input geometries in such coordinatespace. Basically it performs an affine transformation from onecoordinate space to another plus some additional checks.
There's an excellent tutorial on how to profile postgreshere.
Using those tools we can profile our query and get an overviewof where the time is spent. Take a look at this kcachegrind screenshot:
By looking at that graph and the code you can figure out that a lot ofthe time is spent copying checking and passing geometriesaround. Here's a PR thatoptimizes some of those paths for the points case.
Here are the results after applying those optimizations:
To sum up: around 200 ms for data retrieval and encoding of anMVT. That is about 10% faster than mapnik.
Putting all the pieces together
You can find an implementation of a Windshaft renderer (the main piece of ourMaps API) based on ST_AsMVT in the 1000x branch.
Let's run our little benchmark again using this new renderer:
Ouch! The bulk of the job is meant to be done in the database (takingabout 200 ms). I expected it to have some overhead on the node sideof things but not that much. How come?
We can alsoprofile the node serverto get some insights about where the time is being spent:
What you can see is some initial setup time then the querybeing executed in the database and outside node (in grey taking about210 ms in this very instance) and then node copying parsing andprocessing the results of the query.
It is worth noting that despite the node server is able to performnon-blocking IO it has to wait for the query results and cannotbegin streaming the encoded tile until the query returns.
Actually if we just request a tile with 10k features instead the time goes down to about 25 ms on my machine which is on par withmapnik and that after all makes perfect sense.
Conclusions
Most of the above has been centered around executionspeed. This was my main focus at the time of writing the code andexploring the possibilities.
It is true that generating the MVT's straight in the database ispotentially faster as it can avoid data transfer conversion betweeninternal representations of the features to encode and soon. It also opens up the possibility of offloading the encoding operation(which is CPU-intensive) from the tiler to the database. Depending onyour set up this may be (or may be not) a good idea.
We also found out that MVT's are good for rendering client side butnot so good when you want to render tiles with a lot of features.
However it is not all about speed. More importantly having MVT's inthe database as a building block lets you build the SQL on top which gives you a lot of flexibility.
I hope you enjoyed reading this post as much as I enjoyed writingit. If that is the case you might also be interested injoining us. Thanks!