July 2020

Performance report (II): A database for the extreme data age

Data is the heartbeat of the modern enterprise. Through data, organisations learn more about their environments, their operations and their customers. So it's no wonder more and more organisations are placing data at the centre of their commercial strategy.

As the central component of the digital enterprise, the database is as important as the data itself in realising an organisation’s digital ambitions. The database stores, connects and fuses together different threads, feed and sources of information – helping organisations turn data into decisions, and insights into action. 

As extreme-scale data environments become commonplace, organisations require equally extreme database performance – and that's why we've created our unique extreme-scale space-time database GeoSpock DB.

To illustrate just what it can do, we simulated a year’s worth of an entire city's traffic data to generate an extreme-scale dataset capable of challenging the capabilities of even the most data-centric organisations. Four million daily journeys, across six different vehicle types, were each mapped to follow the major roadways of Singapore. With an average journey time of 15 minutes, and a one-second sample interval, this created a truly extreme dataset – 108 Tb in size and containing 1.3 trillion unique rows.

We used GeoSpock DB to ingest the data and its subsets to understand speed, scalability and cost across a full range of data scales. Better ingest delivers more data, more quickly, to where it’s needed most. And the best ingest does it without increasing costs. That means less time spent processing, aggregating and downsizing data – and more time spent creating value from it. 

To put GeoSpock DB to the test, we performed multiple tests on datasets of different sizes to benchmark performance across the full spectrum of enterprise data scales. Leveraging the resources of the cloud allows GeoSpock DB to scale its resources to suit the size of the data challenge, so the tests also included results from machine clusters of variable sizes.

The result was that GeoSpock DB successfully ingested data across all scales. The largest dataset – 1.3 trillion rows representing 1.46 million vehicle journeys – was ingested at a rate of just $0.7 per CPU hour, exactly the same as that of the smallest dataset.

High-speed ingestion was achieved – 1.29 billion rows per machine per hour – and remained high and constant across all data scenarios. A year's supply of vehicle traffic behaviour, for example, was ingested in just 920 CPU hours. By employing a 200-machine cluster, real-time ingest of 108 Tb of data took under five hours – less than a single overnight run.

When it comes to cost, average ingest rates were just $0.56 per billion rows. Even the largest 108 Tb dataset cost only $640 to ingest. Parallel operations enable the separation of total cost from desired speed, delivering a single price of ingest per dataset – no matter how fast you need it.

For the first time, organisations now have access to a database truly fit for the extreme data age.

Read more about our extreme-scale database

Back to GeoSpock Blog