CARTO Core Team and 5x
CARTO is open source and is built on core open source components but historically most of the code we have written has been in the "CARTO" parts and we have used the core components "as is" from the open source community.
While this has worked well in the past we want to increase the velocity at which we improve our core infrastructure and that means getting intimate with the core projects: PostGIS Mapnik PostgreSQL Leaflet MapBox GL and others.
Our new core technology team is charged with being the in-house experts on the key components and the first problem they have tackled is squeezing more performance out of the core technology for our key use cases. We called the project "5x" as an aspirational goal -- can we get multiples of performance improvement from our stack? We knew "5x" was going to be a challenge but by trying to get some percentage improvement from each step along the way we hoped to at least get a respectable improvement in global performance.
Our Time Budget
A typical CARTO visualization might consist of a map and a couple widget elements.
The map will be composed of perhaps 12 (visible) tiles which the browser will download in parallel 3 or 4 at a time. In order to get a completed visualization delivered in under 2 seconds that implies the tiles need to be delivered in under 0.5s and the widgets in no more than 1s.
Ideally everything should be faster so that more load can be stacked onto the servers without affecting overall performance.
The time budget for a tile can be broken down even further:
- database retrieval time
- data transit to map renderer
- map render time
- map image compression and
- image transit to browser.
The time budget for a widget is basically all on the database:
- database query execution and
- data transit to JavaScript widget.
The project goal was to add incremental improvements to as many slices as possible which would hopefully together add up to a meaningful difference.
Measure Twice Cut Once
In order to continuously improve the core components we needed to monitor how changes affected the overall system against both a long-term baseline (for project-level measurements) and short-term baselines (for patch-level measurements).
To get those measurements we:
- Enhanced the metrics support in Mapnik so we could measure the amount of time spent in retrieving data rendering data and compressing output.
- Built an internal performance harness so we can measure the cost of various workloads end-to-end.
- Carried out micro-benchmarks of particular workloads at the component level. For PostGIS that meant running particular SQL against sample data. For Mapnik that meant running particular kinds of data (large collections of points or lines or polygons) through the renderer with various stylings.
Using the measurements as a guide we then attacked the performance problem.
Low Hanging Fruit
Profiling and running performance tests and doing a little bit of research showed up three major opportunities for performance improvements:
- PostgreSQL parallelism was the biggest potential win. With version 10 coming out shortly we had an opportunity get "free" improvements "just" by ensuring all the code in CARTO was parallel safe and marked as such. Reviewing all the code for parallel safety also surfaced a number of other potential efficiency improvements.
- Mapnik turned out to have a couple areas where performance could be improved through caching features rather than re-querying and in improving the algorithms used for rendering large collections of points.
- PostGIS had some small bottlenecks in the critical path for CARTO rendering including some inefficient memory handling in TWKB that impacted point encoding performance.
Most importantly during our work on core code improvements we brought all the core software into the CARTO build and deployment chain so these and future improvements can be quickly deployed to production without manual intervention.
We want to bring our improvements back to the community versions and at the same time have early access to them in the CARTO infrastructure so we follow a policy of contributing improvements to the community development versions while back-patching them into our own local branches (PostGIS Mapnik PostgreSQL).
And In the End
Did we get to "5x"? No in our end-to-end benchmarks we notched a range of different improvements ranging from a new percent to a few times depending on the use cases. We also found our integration benchmarks were sensitive to pressure from other load on our testing servers so relied mostly on micro-benchmarks of different components to confirm local performance improvements.
While the performance improvements have been gratifying some of the biggest wins have been the little improvements we made along the way:
- New function support in pg_bench to make it easier to write spatial test harnesses.
- Multiple correctness fixes in the PostGIS code base.
- PostGIS support for the development version (11) of PostgreSQL
- Memory management fixes for PostGIS.
- Minor fixes in Mapnik.
We made a lot of performance improvements across all the major projects in which CARTO is based upon: you may have already noticed those improvements. We've also shown that optimizations can only get you that so far -- sometimes taking a whole new approach is a better plan. A good example of this is the vector and raster data aggregations work we have been doing reducing the amount of data transfered with clever summarization and advanced styling.
More changes from our team are still rolling out and you can expect further platform improvements as time goes on. Keep on mapping!