Guest post: Visualize a year of Citi Bike rides with Kimono and CartoDB
Today we are excited to bring you a guest post from our friends at KimonoLabs. If you aren't familiar with Kimono they bring a set of tools that enable you to create APIs from data and information scattered throughout the Web. We highly recommend checking them out. Below they will share with you the steps for turning a website into an API and then how to use CartoDB to map that API in some really interesting ways. This was posted originally over on Kimono's blog here thanks for the permission to repost!
Tutorial: Mapping your own location data in 10 minutes
Data is more accessible tangible and interesting when you can visualize and interact with the figures on a page. That’s why we love teaming up with our friends over at CartoDB! Kimono is a smart web scraper that let’s you turn data on a website into an API – a structured feed of updating data. CartoDB let’s you take that data set and create beautiful interactive maps. In this post we will use kimono to get over a year’s worth of bike trip data from New York City’s Citi Bike bike sharing program. We’ll then use cartoDB to plot our friend Andrew’s movements on a map. A big thanks to Andrew for riding his bike a lot and sharing his data with us!
Here’s all that you’ll need to build your own data-driven map:
- A kimono account (it’s free) and the kimono chrome extension
- A cartoDB account (it’s also free)
- A website with location data – we’ll use data from NYC’s Citi Bike bike sharing program but you can capture anything you like (e.g. Uber Lyft public transportation routes)
- 10 minutes
Create an API
Navigate to the website with the data you want to map. For this example we are using data from Andrew’s Citi Bike account which looks like this:
Click on the kimono chrome extension…
and the kimono toolbar will appear on top of the webpage:
Notice the flashing lock icon on the toolbar. This indicates that the page requires you to log in. Click on the lock icon.
Kimono will then direct you to the site’s login page (if you need to navigate further to get to the login page click the navigation icon and go to the appropriate login page).
Once at the login page you must identify the username password and submit fields – this teaches kimono how to login to this site. Do this by clicking the username password and submit icons on the toolbar and then clicking on the matching field on the webpage.
Click done and then enter your login credentials. Kimono securely stores the credentials so that it can access your data automatically on the schedule you specify.
Extract Data
Once you’ve completed the login cycle you will see the original page with your data. Here we want to grab four types of data – the start station start times end station and end times. To do this click on one of the start stations. Kimono will suggest other start stations to you. Click the check mark to accept all the start stations into your first data property. Click the check mark to accept all the start stations into your first data property. The number in the yellow circle…
…will increase to reflect the number of data points in that property.
Now click the grey plus…
…to add a new property and repeat this process with start times end stations and end times. You can preview the structured data that kimono will extract in the data preview pane and make more granular adjustments in the data model view.
Click DONE to create an API. Select daily to make sure your data is refreshed every day. Once it’s done click the link to check out your new API.
(Pro-tip: if you are having trouble selecting just the data you want try clicking and dragging to select just the part you want and kimono will strip off extraneous text.)
Configure Your API
You can view the first page of data extracted on the API detail page for your new API.
The API we just set up only extracts from one page of rides. If you have several pages of data you’ll need to configure your API to extract from multiple pages as well. To do this go to the crawl setup tab click on ‘crawl strategy’ to specify the type of crawl you want to do – select generated URL list in this case. Then on the lower right you will see the URL generator.
Kimono has broken your source URL into its relevant sub-components. For us the number after ‘trips’ specifies the page you’re on. To the right of the number parameter click ‘range’ and specify 1 to 20 (instead of 20 use whatever number corresponds to the total number of pages of data that you have).
You will see a list of URLs generated by kimono below. Click save changes then hit ‘start crawl above’. Once the crawl completes go back to the data preview tab and download the CSV. Open it up in excel and remove the top row – the row that says ‘collection1’ to get it formatted for use with cartoDB.
Now that we have our structured data set let’s start mapping our route data.
Geo-Code Your Data
Log in to your cartoDB account and select ‘tables’. Click the large plus on the right to add a new table.
Choose ‘data file’ and select the kimono csv file that we just downloaded. Once the data is loaded into cartoDB click the drop-down next to the property with your start station data. Select ‘georeference’ to translate this into coordinates i.e. latitude/longitude pairs.
Select referencing ‘by street address’ specify the city and country and hit continue. You’ll see a new column appear with latitude and longitude data for each station. We’re almost done.
Map the Results
At the top click on ‘map view’ and click the wizard/wand icon on the right.
To create an animated map with categories select ‘Torque Cat’ then use the drop down menus to set ‘time column’ to your start time property. Then set the category column to the end station and use the fields below to map colors to end stations by region for example.
Ta-da! You’re done! You just built an awesome animated map. But suppose you wanted to calculate a few more interesting things and plot the output? With kimono’s filter functions you can do just that. We’re beta testing this feature right now so just email us at support@kimonolabs.com and we’ll give you early access to the feature.
Filter functions allow you to write JavaScript functions that operate on the data returned by your API. With filter functions enabled your APIs will now return the processed output. For example we wrote a frequency function to count the number of times each station appears in the dataset in total and how many times during the day and the night allowing us to create a heatmap of where Andrew spends the most time and how that changes by time of day. Once we’ve enabled you for filter functions you can access them from the ‘advanced’ tab of your API.
Copy in our frequency function below to start:
##_INIT_REPLACE_ME_PRE_##
function transform(data callback) { var collection = data.results.collection1; //shortcut for our collection var totalTimes ={}; //object for histogram for all times. Key is a string of station name. //helper function to return if is during day or night…between 7am and 7pm = day. //assumes a Date-able string as input var dayOrNight = function(date){ var time = new Date(date); if (19 > time.getHours() && time.getHours() > 7 ){ return 'day'; } else{ return 'night'; } };
//helps populate totalTimes with key and value pair of address and total day night times var addToHistograms = function(station date){
##_INIT_REPLACE_ME_PRE_##//initialize for a given address
if(!totalTimes.hasOwnProperty(station)){
totalTimes[station] = {'station': station 'total' : 1 'day' : 0 'night' : 0 };
if(dayOrNight(date) === 'day'){
totalTimes[station].day += 1;
}
else{
totalTimes[station].night += 1;
}
}
//add for a given address
else{
totalTimes[station].total += 1;
if(dayOrNight(date) === 'day'){
totalTimes[station].day += 1;
}
else{
totalTimes[station].night += 1;
}
}
##END_REPLACE_ME_PRE_##
};
//iterate through property2s (start destination) for collection1 and add them to totalTimes for (var i = 0; i < collection.length; i++){ var station = collection[i].property2; var date = collection[i].property3; addToHistograms(station date); }
//do the same for end destination for (var j = 0; j < collection.length; j++){ station = collection[j].property4; date = collection[j].property5; addToHistograms(station date); }
//delete old data collection.splice(0 collection.length);
//pop off totalTimes by just the value and add to the collection array (for csv formatting purposes) for (var key in totalTimes) { if (totalTimes.hasOwnProperty(key)) { collection.push(totalTimes[key]); } }
callback(null data); } ##_END_REPLACE_ME_PRE_##
Using cartoDB’s bubble plot setting we can quickly turn this into a heatmap of where Andrew spends his time.
That’s just a quick preview of some powerful maps you can build with kimono and cartoDB. We’re excited to see what you will build with the tools. Tell us what you create at contribute@kimonolabs.com and reach out to us at support@kimonolabs.com if you get stuck.