August 2021 By Anibal Gonzalez

From Zero to (almost) Hero: Part 1

Findings from our work on Toyota Mobility Foundation’s City Architecture for Tomorrow Challenge

GeoSpock’s Asia team recently participated in Toyota Mobility Foundation’s City Architecture for Tomorrow Challenge (CATCH), which aims to reimagine and design future city infrastructures through dynamic, intelligent, data-driven and connected solutions. 

The challenge was a great opportunity to showcase GeoSpock DB, our geospatial analytics database, as well as our data science expertise in the mobility and smart city sectors. In this blog we’ll examine our work for the semi-finals, including the processes we went through in designing our solution proposal, the data we worked with whilst building it out, and the techniques employed to help address the programme’s stated challenges.

Challenge Overview

The Malaysian capital of Kuala Lumpur (KL) was the host city for CATCH. Like many fast-growing urban regions, the South East Asian hub is experiencing challenges related to optimising infrastructure and public transport systems to meet the growing needs of its population. In particular, CATCH sought to explore solutions which help address the following:

  • An increasing and ageing population placing greater strain on public transport,
  • Growing levels of traffic congestion and longer commuting times,
  • Ensuring public transport matches the changing needs of the city,
  • Delivering mobility to non-motorised transport,
  • Pedestrian and commuter safety infrastructure planning and development,
  • Reducing pollution and its impact on public health.

Although these are broad challenges, KL, like many other cities, has in recent years made great progress collecting and acquiring data to measure and quantify their impacts. Across the city, data is gathered from numerous sources and sensor types, from environmental sensors to vehicle telematics. Combining these diverse datasets to generate new insights and ultimately make recommendations to improve city life for all inhabitants was at the heart of our ideas for the challenge — so-called data for good.

As much of the data generated by cities is naturally geospatial, CATCH was a good fit for our expertise. Sensors, vehicles, and areas of interest all have a geospatial component as they are located physically in space. GeoSpock DB, as a geospatially optimised database, provides state-of-the-art access times to geospatial datasets. By taking advantage of GeoSpock DB’s geo-temporal indexing, we are able to filter down terabytes of rapidly accumulating event data — for example, the GPS positions of buses — to a manageable size.

GeoSpockDB powers city-wide visualisations, providing bus or traffic operators a powerful insight not only into the status of buses but also the health status of the traffic across the city as the buses serve as traffic probes for traffic behaviour.

Our Approach

During the course of CATCH, we adopted a two-pronged strategy to address the challenge statements:

  1. To deliver a data fusion platform that would provide secure public-private data sharing and collaboration, making data analytics of complex, multi-source datasets easily accessible to support multiple agencies and companies — and enable many future use cases.
  2. To develop reference analytics that could be later brought in-house by the relevant stakeholders and users for further refinement.

Our Semi-Final Solution

In the semi-finals, our aim was to focus on the first part of our strategy — building out a data fusion platform and illustrating its potential utility across a diverse range of mobility and smart city related use cases.

With this in mind, during this stage we ETL’ed and ingested into GeoSpock DB a combination of datasets made available by the organisers — for example, the GPS location of buses, datasets sourced from other competitions such as road segment speed, and privately sourced datasets such as weather across KL. We then examined a number of illustrative use cases across the city.

Examining the impact of weather on traffic speed

Using the ingested datasets, we took advantage of GeoSpock DB’s data fusion capabilities and combined weather and traffic data to analyse the effect of rain on vehicle speed. 

As the visualisation below shows, we can use tools such as geofencing to flexibly analyse different areas of interest — in this case a ~3km radius around KL city centre to visualise and analyse the behaviour of traffic around the city.

Visualisation of road segment speed across KL. The circle is a geofence of 3km around the city centre.

SELECT HOUR(timestamp) AS hour, DAY_OF_WEEK(timestamp) 
AS day_of_week, AVG(hourly_avg) AS hourly_avg_speed
FROM speed_road_segment
WHERE ST_Distance(ST_Point(longitude, latitude), 
ST_Point(101.703389, 3.151557) ) < 0.03 GROUP BY HOUR(timestamp), DAY_OF_WEEK(timestamp)

With this kind of query, we can quickly investigate the effect of rain on KL’s city centre traffic as shown in the chart below.

03_CityCentre_WedDryAverage city centre vehicle speed in dry (blue) and wet (orange) conditions

This plots the average speed of vehicles within the city centre during wet and dry conditions at different hours of the day; pale shaded areas show the 95% confidence interval. From the chart we can see that speeds are, on average, higher in dry conditions, and slower with more variability when it is wet. However, as with any kind of data analysis, understanding data quality and availability is also crucial to making informed conclusions — in this instance, not all hours of the day have enough wet weather data to generate a line plot with confidence intervals — or a point at all!

Improving understanding of bus and route ridership variation over time and space

As well as looking at environmental factors, it is also possible to perform simple SQL queries on the data within GeoSpock DB to merge bus position and ridership data (the KL bus network employs a tap-on/tap-off fare card system, which gives insight into where and when passengers embark and alight from individual bus routes). Combining these datasets together enables easy analysis of passenger levels on individual buses at different times and locations, and is the first step towards an improved understanding of network capacity. An example of this process is the query that follows:

SELECT bus.bus_id, bus.latitude, bus.longitude, bus.timestamp, ride.ridership
FROM bus_data AS bus
JOIN bus_ridership AS ride
ON bus.id = ride.id AND bus.timestamp = ride.timestamp
WHERE bus.id = selected_id

Using GeoSpock DB we can quickly pinpoint times and areas of interest for further investigation. For example, the bus depicted below is close to maximum occupancy at about 6.20 pm on Friday, when people are boarding near the city centre to head home.

Individual bus ridership visualisation: positive or negative values reflect the number of boardings or alights per minute Each bus stop along the bus line is represented by a green dot.

05_IndividualBusRidership (1)Individual bus ridership board and alight counts

Using this type of insight, operators can identify capacity hotspots in time and space, and take steps to ensure a more smoothly running network. For example, using this analysis it is possible to determine the precise schedule and stopping locations for a dedicated peak hours express bus designed to service a portion of a route at particularly busy dates and times.

06_BusRoutesOptDiagram (1)Diagram illustrating an express route optimised to ease congestion on a busy portion of a standard route

By observing where and when the majority of passengers get on and off buses during high capacity periods, the operating hours and calling points of this new express route can be identified. This allows bus routes that serve the same road segment to be optimised based on commuter density, road capacity and bus stop accessibility. 

More generally, data-driven analysis of capacity at individual bus level, route level and across the entire city fleet, as supported efficiently through GeoSpock DB, can enable the smoothing out of localised problems via more efficient resource allocation — ensuring that the most appropriate bus size and type is used in any given situation.

Facilitating predictive modelling of transport demand

We can also use GeoSpock DB to generate the features required to feed a time series predictive model, demonstrating how the platform can be incorporated into an AI/ML workflow. 

To illustrate this case, we focused on another dataset, comprising taxi pick-ups for KL city centre. In order to divide the city into smaller sections, we geohashed this dataset (a geohash is method of encoding a given region of the world as an alphanumeric string — different geohashes relate to different regions, and can be defined to varying levels of precision). 

In particular, we consider geohash w283fm, a precision level 6 geohash corresponding to a roughly 1.2km x 0.61 km region in central KL. We then predicted taxi pickup demand within this geohash from 8am onward on the 30th of September, using an XGBoost model trained on data fed from GeoSpock DB.

Visual representation of taxi pickups per geohash where height and colour (red) represent a higher pickup rate.

08_TaxiDemandPrediction (1)The orange line in the chart on the left shows the trained XGBoost model was able to predict the demand trends even though the dataset has some sparsity.

The aim of this use case is to show that once onboarded to the GeoSpock DB data fusion platform, data is immediately ready for AI/ML training. The predictive analytics example here enables taxi operators to direct drivers to busy areas in anticipation of increasing demand. Similar GeoSpatial AI/ML can also be applied to other use cases such as traffic accident hotspot prediction, or in a multi-modal fashion across transport types — for example to predict demand for other public transport services following the arrival of a train into a station.

Looking ahead to the finals!

In the finals, we moved from illustrating the broader capabilities of GeoSpock DB across mobility and smart city use cases to delivering more specific, tailored solutions for individual challenge stakeholders. In particular, during this stage we proposed, developed and delivered the following solutions:

  • Route hopping analysis and route optimisation,
  • Short-term traffic volume prediction across KL,
  • Site selection for smart CCTV.

To learn more, see you in part 2 of the blog…(coming soon)!

Dr Anibal Gonzalez is a Senior Data Scientist at GeoSpock.

*(banner image by Nik Radzi on Unsplash)

Back to GeoSpock Blog