Measuring tile rendering processes to improve performance
One of the most important parts of our infrastructure is how we generate tiles. This process consists of two parts: gathering data and rendering tiles. As part of our mission to maintain the best possible product for our clients we set out to improve the performance of our tiles. To do so we needed to better understand and measure the tile rendering process. We determined that data gathering had a much wider improvement margin due to it's time and resource consumption compared to actual tile rendering.
Put simply the tile rendering process has two main steps:
- Gather the required data from the database filtered by tile bounding box.
- Send this data to the renderer and generate the tile which can be a raster tile (png) a vector tile (mvt) a grid and so on.
Depending on the size of the table how the data is queried and other more minor factors calls to the database could take anywhere from 10 milliseconds to 20 seconds. We knew we were rendering the same tiles multiple times but we didn't know the implications of this on rendering performance. In any case we didn't want to keep executing the same query without having a good reason to do so.
Our first task was to understand how many times we were drawing the same tile by gathering the necessary data. The only reliable source at the time to obtain this data was our database logs and we needed to process our PostgreSQL logs and extract information such as the query execution time the XYZ coordinates of the requested tile the db the affected tables the username etc.
There are great log analyzers for PostgreSQL like pgbadger. However these analyzers did not allow us to run operations such as transforming the tile bounding box to XYZ format. Because of this we ended up creating a custom parser to better serve our needs.
Processing the PostgreSQL logs was a challenge that exceeded our original assessment. While there is a common and configurable prefix for each log line everything else could be custom and we discovered many consistency issues. We faced a series of challenges with our approach:
- Some queries were split between different lines and needed to be merged
- There was a ton of information we did not need slowing down our processes
- There were inconsistencies in line formatting
- Prepared statements have three parts: parse bind and execute with different execution times we needed to aggregate.
- We had huge amounts of log data to process
Since we are working with geospatial databases we needed to extract the XYZ value from the bounding box of the query originally stored as coordinates in the 3857 projection:
In addition to this we had to take into account the possible values for metatiling and [buffer-size]that we were using for each query to help us make better assumptions about optimization strategies.
Here is an example of a query used to generate a tile:
and this is the result in CSV format after we process the log:
As you can see this gives us more user friendly data to work with.
Once we had processed all the logs we loaded the resulting data into the database which then allowed us to extract the most useful information. Now we know query runtimes and how many times we're generating the same tile to help guide us in improving the process.
Do you want to know more about how we improve our tile generation process? More CARTO blogpost are coming to our Inside blog.
Did you like it? Join us