The GDELT Project describe how to use Google BigQuery to visualize their dataset of online news coverage focusing on COVID-19
Supported by Google Jigsaw the GDELT Project monitors the world's broadcast print and web news from nearly every corner of every country in over 100 languages and identifies the people locations organizations themes sources emotions counts quotes images and events driving our global society every second of every day creating a free open platform for computing on the entire world.
A GDELT Project visualization highlighting the 25 000 newsmakers mentioned most frequently and the connections among them.
Recently they launched the GDELT Global Geographic Graph the underlying dataset powering the GDELT GEO 2.0 API covering more than 1.7 billion location mentions from worldwide English language online news coverage back to April 4 2017 with full details of each mention including a 600-character contextual snippet of its context and usage.
The GDELT Project is a realtime network diagram and database of global human society for open research
This post guest written by GDELT looks into how the integration with Google BigQuery can be leveraged to map this huge dataset focusing on news coverage in the context of COVID-19 at different points during the pandemic.
The Global Geographic Graph now spans more than 1.7 billion location mentions in worldwide English language news coverage back to 2017 making it possible to map the geography of the global news narrative.
Creating a map from the Global Geographic Graph traditionally involved running a BigQuery SQL query to compile the results downloading them as a CSV file and then importing that file into CARTO to create the actual map. However CARTO actually has a built-in BigQuery connector meaning you can simply paste your SQL query directly into CARTO which runs the query on BigQuery and downloads the results into your map completely seamlessly and can even rerun the query on a fixed internal to keep the map constantly synced!
Let's say we want to create a map of locations mentioned in English language news coverage in the context of COVID-19 during the month of May. Just log into your CARTO account and select "New Dataset" and click on "BigQuery" as your data source and then click on the "Connect" button. You'll get the dialog below:
The "Billing Project ID" dropdown will automatically populate with the list of your Google Cloud projects (this is the project against whose quota the BigQuery query will be billed). You then paste your query in the SQL Query box and pick a name for the new CARTO dataset and then click "Connect to BQ"! That's quite literally all there is to it! CARTO will then send your query to BigQuery to be executed and within a few moments the results will be streamed back to CARTO and populate your new dataset!
Here's the actual query we're going to run (we use "GeoType > 1" to include country-level matches):
The "ST_GeogPoint(Lon Lat) as the_geom" at the beginning of the query takes our latitude and longitude coordinates and converts them into a Geography "Point" type.
Alternatively since the Global Geographic Graph actually already has a GEOGRAPHY column called "the_geom" you could also use:
It is important to remember that grouping by a geography column like we're doing here is extremely bad practice for observed datasets due to the imprecise nature of most observed/sampled geographic data. When collecting GPS points a stationary object can appear to constantly move in a small radius around its actual location due to GPS jitter while sampling imprecision and error can yield clusters of points all slightly different from each other.
Simply snapping to a grid doesn't solve this issue since regardless of the grid resolution a cluster of points will be split across the rounded coordinates. In the case of the Global Geographic Graph all locations are drawn from a set of centroid-based gazeteers in which every reference to Paris France will always yield precisely the same coordinate and thus here we can safely group by our coordinates but for other kinds of data we strongly recommend against it. With observed GIS to perform this kind of "Top N" grouping you would need to cluster your points first.
Within a few moments the data appears as a standard CARTO table and you can instantly map it:
But what if we want to make an interactive clickable map that displays the top 5 articles mentioning each location?
The query below selects the top 5 articles per location and concatenates their titles and URLs into a block of HTML and selects one of their social media images as the header image.
Once again the data is imported into a CARTO dataset within moments. Following the same instructions as we used in April (but this time skipping the part about saving the data out of BigQuery as a CSV and loading back into CARTO since we don't have to do that anymore!) we get the interactive map below.
The GDELT Project also created a video documenting this process which can be viewed below.
As has been demonstrated with our connector you only need to enter a SQL query from BigQuery to connect your data & make it available on the CARTO platform—allowing you to easily visualize perform spatial analyses & build applications.
Want to know more?
This article was originally published on GDELT's blog.