July 2020

Performance report III: Many questions – one high-performance answer

Querying turns data into decisions. The right questions, asked of the right data, can lead to results with the power to transform a business.

But querying can be time consuming and costly. Searching vast data volumes to find highly specific answers is inefficient at best. And, if the volume is too big or the question too complex, it might not be possible at all.

High-performance querying delivers results faster and at lower computational cost – enabling more questions and generating more answers. And our unique extreme-scale space-time database GeoSpock DB takes things even further – redefining the approach to querying. Its smart indexing capability drastically accelerates performance by providing the signposts needed to direct queries more precisely within the database.

Put simply, GeoSpock DB knows the fastest route from question to answer – no matter how big the dataset or how complex the query. That gives you more evidence, more time and more money.

We put GeoSpock DB to the test using a simulated year’s worth of an entire city's traffic data to generate an extreme-scale dataset – 108 Tb in size and containing 1.3 trillion unique rows.

The data was interrogated using five common geospatial and temporal query types. We ran multiple tests on each query type, randomly varying the size of the query extents to get a full picture of speed and scalability. Our query architecture doesn’t use any form of pre-caching, so the results reflect the performance of GeoSpock DB you can expect at any time, regardless of query history.

The five query types were:

  1. Bounding box queries – What data falls within a given area?
  2. Time range queries – What events occurred within a specific time frame?
  3. Combined bounding box and time range queries – What events occurred in this area within this time frame?
  4. Unique vehicles within a box and time range – What are the registration plates for each unique vehicle that passed through a given area in a given time?
  5. Vehicle type breakdowns within a box and time range - What is the percentage of each vehicle type that passed through a given area in a given time frame?

When it came to query speed, GeoSpock DB was fast. Our queries executed at a constant average rate of 300 Gb per hour. Larger queries used more machines but still delivered the same rates of performance. 

Even our largest query, which scanned more than a terabyte of data, took just 230 minutes of CPU time to complete. That means just over four minutes of real time on a 50-machine cluster – extreme-scale results in less time than it takes to make a cup of coffee. 

GeoSpock DB achieved predictable, stable, scalable querying across a full range of query types and scales. Scalability applied not just to purely spatial or purely temporal-based queries but also to questions involving additional processing and analysis.

In addition, computational effort and cost were drastically reduced across all query types and scales. Cost efficiency extended across all types of queries, with average rates of only $0.11 per 100 Gb of data scanned. Even the largest, most complex queries executed for little more than $1 – dramatically lowering the cost of data analysis.

Smart indexing is a key element of GeoSpock DB's high-performance querying. Indexing imposes order on data. Just like looking for a book in a disorganised library, searching for answers in unindexed data is nearly impossible. The only way to look is to search every shelf, every aisle and every floor. And do the same thing for every new book you need. Time consuming, inefficient and ineffective.

GeoSpock DB’s smart indexes drastically reduce that effort. Indexing organises and signposts the data, so you know exactly where to look to get the answers you need. Our indexing adapts to the data contained within the database, ensuring it remains efficient whatever the data complexity and scale.

Instead of scanning all the data looking for answers, GeoSpock DB directs queries to the right data portion from the start – whether your interest is by space, time, device or any combination of all three. So our query performance is based on the much smaller size of data included in the query selection, not on the total size of the dataset itself – making it fast, efficient and effective.

GeoSpock DB remakes the data analysis paradigm. Fast, cost-efficient querying, on unlimited data scales, allows you to make the most of your data – so that it can help you make the most of your business.

Ask us for a copy of our performance report

Back to GeoSpock Blog