CartoDB loves Open Data

Summary

Unlock the power of live, API-accessible databases with CartoDB. Enhance your open data publishing, simplify SQL queries, and make data shine online.

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
CartoDB loves Open Data

As more governments and businesses are adopting proactive open data policies and programs  the infrastructure of data publishing is becoming increasingly important.  The time-honored tradition of publishing file-based machine-readable data on the web is still alive and well  but live  api-accessible databases can help when publishing data that is updated frequently  or is so large that publishing it as files becomes inefficient.  Luckily for you  if you have a CartoDB account  you already have a live  api-accessible database at your disposal!  Read on to learn how CartoDB can help make your open data shine.

PLUTO Custom Data Downloader

With CartoDB  any data you import becomes a bona fide PostgreSQL database table.  We're not talking about Postgres buried deep in the stack with a pretty UI obscuring it (our UI is indeed pretty)  you have direct access to the database  and can run any SQL queries you want on it.  The SQL pane in CartoDB's UI is where most people start interacting with their tables  but it's also accessible via our
SQL API.  

That's all well and good  but what if I want other people to have access to my data… to make it a bit more… open?  CartoDB has the notion of public tables  where unauthenticated access to database read operations is available both via our GUI and via the SQL API.  If you set a table's privacy to public  anyone can access its various download links  or run SELECT queries to their heart's content.  This makes CartoDB the SIMPLEST way to go from a file on your computer to an easily-accessible published database on the web.  

An Example: NYC's PLUTO Dataset

The New York City Department of City Planning publishes a cadastral dataset called PLUTO  which contains a wealth of information about every tax lot in the city.  The dataset includes zoning information  tax exemption status  number of floors  and has a detailed polygon for each parcel of land.  As you can imagine  this is a very large dataset  and it includes over 800 000 features with over 80 attributes each. Check out our own Andrew Hill's tour of NYC PLUTO data if you want to learn more about PLUTO).

The city publishes PLUTO as five separate file-based datasets  one for each of the five boroughs  on their infamous "Bytes of the Big Apple" open data site.  While these chunks are more digestable  they are all still very large in their own right.  PLUTO is only updated a couple of times a year  but due to its large size  it would be a great fit for publishing as a CartoDB public table.  



PLUTO Public Dataset Page

So how is this any better than just publishing one big static file?  Enter the SQL API  where CartoDB becomes really powerful.  The full-dataset download links above are really just "SELECT * FROM {tablename}" queries executed against the SQL API  but with a little more SQL  a user can grab a much more specific subset of this very large dataset.  The same SQL queries you apply in the editor to limit what data to show in your map can also be used to download raw data via the SQL API:

Here's an API call to get only the first 10 rows:

[https://cwhong.cartodb.com:443/api/v2/sql?q=select address zipcode from public.pluto15v1 LIMIT 1](https://cwhong.cartodb.com:443/api/v2/sql?q=select address zipcode from public.pluto15v1 LIMIT 1)

Go ahead and click it  you'll get back some JSON.  Here's the same query  but requesting the data as CSV instead of JSON:

[https://cwhong.cartodb.com:443/api/v2/sql?format=CSV&q=select address zipcode from public.pluto15v1 LIMIT 1](https://cwhong.cartodb.com:443/api/v2/sql?format=CSV&q=select address zipcode from public.pluto15v1 LIMIT 1)

Depending on your browser  clicking the above link should get you a file download.

To further illustrate this  I'll provide some cdbfiddle examples from the same dataset that use different SQL queries.  The same SQL used to define the map can also be passed to the SQL API to get raw data.

Get everything in zipcode 11201 (Downtown Brooklyn):

Here's the same query as an API call  specifying geoJSON format (again  depending on your browser  clicking this link should start a file download!):

[https://cwhong.cartodb.com:443/api/v2/sql?format=GeoJSON&q=SELECT the_geom  the_geom_webmercator  address  zipcode
FROM pluto15v1
WHERE zipcode = 11201](https://cwhong.cartodb.com:443/api/v2/sql?format=GeoJSON&q=SELECT the_geom  the_geom_webmercator  address  zipcode
FROM pluto15v1
WHERE zipcode = 11201)

Get everything where the primary zoning is Commercial:

This time let's get a shapefile from the SQL API:

[https://cwhong.cartodb.com:443/api/v2/sql?format=SHP&q=SELECT the_geom the_geom_webmercator  address  zipcode  allzoning1 FROM pluto15v1 WHERE allzoning1 ILIKE '%C%'](https://cwhong.cartodb.com:443/api/v2/sql?format=SHP&q=SELECT the_geom  the_geom_webmercator  address  zipcode  allzoning1 FROM pluto15v1 WHERE allzoning1 ILIKE %27%25C%25%27)

With a little bit of frontend web development  it's possible to build a custom interface for this data that helps the user hone in on a specific subset of the data to download without writing SQL.  This PLUTO downloader tool does just that.  The UI allows the user to choose a geographic area  a set of attributes  and a format.  Behind the scenes it is building a SQL query and sending it to CartoDB  which serves up the data on-demand!  

The big take-away for this blog post is that you shouldn't think of CartoDB as simply as a map rendering tool  it serves up raw data just as elegantly and efficiently as it does map tiles.  

But what about the catalog?  CartoDB will provide a list of your public tables on your public profile page. This is certainly not a substitute for a fully-baked open data catalog with standards-compliant metadata  but it does tie together all the public tables in your account and make them a bit more discoverable.  

The 'by-hand' option if you don't have too many datasets to manage would simply be to create a page on your website or blog with links to each CartoDB table's landing page  information about the datasets  and an embedded map preview.

Another option for the catalog side of the equation is CKAN (currently in use by data.gov and many other open data programs wordlwide)  where you could quickly set up a listing for data that lives in a CartoDB public table.  We've even worked on a script that can programmatically create a CKAN dataset listing for a CartoDB Table  adding name  description  and assets for the various SQL API download links. Data.gov has developed a CKAN extension that adds [Open in CartoDB] functionality to all of their dataset listings  allowing for one-click import of data into a user's CartoDB account. Ontodia  an NYC Open Data consultancy and CartoDB partner  has also developed a tighter CKAN integration that allows for cloning of data between the CKAN datastore and CartoDB  and inclusion of CartoDB maps into a CKAN dataset page.  

New York City's IT Department (DoITT) is currently publishing geospatial open data for the New York City Subway along with a city-wide building footprints dataset via their enterprise CartoDB account.  

Public Tables == Published Tables

When you make one of your CartoDB datasets public  you've essentially published it.  Downloads are a click away  and API accessibility is easy using regular-old SQL statements.  It only takes a few more steps to document and publicize your public table  either on a static page  a data catalog  or via a custom download tool.  

Happy open data publishing!