BigQuery BI Engine is a fast, in-memory analysis service that lets users analyze data stored in BigQuery with rapid response times and with high concurrency to accelerate certain BigQuery SQL queries. BI Engine caches data instead of query results, allowing different queries over the same data to be accelerated as you look at different aspects of the data. By using BI Engine with BigQuery streaming, you can perform real-time data analysis over streaming data without sacrificing write speeds or data freshness.

​​BI Engine architecture

The BI Engine SQL interface expands BI Engine support to any business intelligence (BI) tool that works with BigQuery such as Looker, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. With BI Engine, you can build rich, interactive dashboards and reports in BI tool of your choice without compromising performance, scale,security, or data freshness. To learn more about the BI Engine SQL interface, please refer here.

The following diagram shows the updated architecture for BI Engine:

1 BI Engine.jpg

Shown here is one simple example of a Looker dashboard that was created with BI Engine capacity reservation (top) versus the same dashboard without any reservation (bottom).This dashboard is created from the BigQuery public dataset `bigquery-public-data.chicago_taxi_trips.taxi_trips`  to analyze the Sum of total_trip cost and logarithmic average of total trip cost over time.

2 BI Engine.gif
total_trip cost for past 5 years

BI Engine will cache the minimum amount of data possible to resolve a query to maximize the capacity of the reservation. Running business intelligence on big data can be tricky.

Running business intelligence on big data can be challenging. BigQuery BI Engine is a blazing-fast, in-memory analysis service for BigQuery that allows users to analyze complex data sets interactively with sub-second query response time and with high concurrency. BigQuery BI Engine seamlessly integrates with familiar tools like Data Studio and Looker, along with other popular BI tools, to accelerate data exploration and analysis.  Discover how BI Engine works and see a demo across different business intelligence tools.

Here is a query against the same public dataset, ‘bigquery-public-data.chicago_taxi_trips.taxi_trips,’ to demonstrate BI Engine performance with/without reserved BigQuery slots.

Example Query

code_block
[StructValue([(u’code’, u”SELECTrn (DATE(trip_end_timestamp , ‘America/Chicago’)) AS trip_end_timestamp_date,rn (DATE(trip_start_timestamp , ‘America/Chicago’)) AS trip_start_timestamp_date,rn COALESCE(SUM(CAST(trip_total AS FLOAT64)), 0) AS sum_trip_total,rn CONCAT (‘Hour :’,(DATETIME_DIFF(trip_end_timestamp,trip_start_timestamp,DAY) * 1440) ,’ , ‘,’Day :’,(DATETIME_DIFF(trip_end_timestamp,trip_start_timestamp,DAY)) ) AS trip_time,rn CASE WHENrn ROUND(fare + tips + tolls + extras) = trip_total THEN ‘Tallied’rn WHEN ROUND(fare + tips + tolls + extras) < trip_total THEN ‘Tallied Less’rn WHEN ROUND(fare + tips + tolls + extras) > trip_total THEN ‘Tallied More’rn WHEN (ROUND(fare + tips + tolls + extras) = 0.0 AND trip_total = 0.0) THEN ‘Tallied 0’rn ELSE ‘N/A’ END AS trip_total_tally,rn REGEXP_REPLACE(TRIM(company),’null’,’N/A’) as company,rn CASE WHENrn TRIM(payment_type) = ‘Unknown’ THEN ‘N/A’rn WHEN payment_type IS NULL THEN ‘N/A’ ELSE payment_type END AS payment_typern FROMrn `bigquery-public-data.chicago_taxi_trips.taxi_trips`rn GROUP BYrn 1,rn 2,rn 4,rn 5,rn 6,rn 7rnORDER BYrn 1 DESC,rn 2 ,rn 4 DESC,rn 5 ,rn 6 ,rn 7rnLIMIT 5000″), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ee1a4db3b10>)])]

The above query was run with the below combinations: 

  • Without any BigQuery slot reservation/BI Engine reservation,  the query observed 7.6X more average slots and 6.3X more job run time compared to the run with reservations (last stats in the result). 

  • Without BI Engine reservation but with BigQuery slot reservation, the query observed 6.9X more average slots and 5.9X more job run time compared to the run with reservations (last stats in the result). 

  • With BI Engine reservation and no BigQuery slot reservation, the query observed 1.5 more average slots and the job completed in sub-seconds (868 ms). 

  • With both BI Engine reservation and BigQuery slot reservation, only 23 average slots were used and the job completed in sub-second as shown in results.This is the most cost effective way in regards to average slots and run time compared to all other options (23.27 avg_slots , 855 ms run time).

3 BI Engine.jpg

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data. You can query the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, and the history of jobs completed in the past 180 days.

Query to determine bi_engine_statistics and number of slots. More schema information can be found here.

code_block
[StructValue([(u’code’, u”SELECTrn project_id,rn job_id,rn reservation_id,rn job_type,rn TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND) AS job_duration_mseconds,rn CASErn WHEN job_id = ‘bquxjob_54033cc8_18164d54ada’ THEN ‘YES_BQ_RESERV_NO_BIENGINE’rn WHEN job_id = ‘bquxjob_202f17eb_18149bb47c3’ THEN ‘NO_BQ_RESERV_NO_BIENGINE’rn WHEN job_id = ‘bquxjob_404f2321_18164e0f801’ THEN ‘YES_BQ_RESERV_YES_BIENGINE’rnWHEN job_id = ‘bquxjob_48c8910d_18164e520ac’ THEN ‘NO_BQ_RESERV_YES_BIENGINE’ ELSE ‘NA’ END as query_method,rn bi_engine_statistics,rn — Average slot utilization per job is calculated by dividingrn– total_slot_ms by the millisecond duration of the jobrn SAFE_DIVIDE(total_slot_ms,(TIMESTAMP_DIFF(end_time, start_time, MILLISECOND))) AS avg_slotsrnFROMrnregion-us.INFORMATION_SCHEMA.JOBS_BY_PROJECTrnwhere creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 80 DAY) AND CURRENT_TIMESTAMP()rnAND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()rnANd job_id in (‘bquxjob_202f17eb_18149bb47c3′,’bquxjob_54033cc8_18164d54ada’,’bquxjob_404f2321_18164e0f801′,’bquxjob_48c8910d_18164e520ac’)rnORDER BY avg_slots DESC”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ee18b949590>)])]

From the observation, the most effective way of improving performance  for BI queries is to use BI ENGINE reservation along with BigQuery slot reservation.This will increase query performance, throughput and also utilizes less number of slots. Reserving BI Engine capacity will let you save on slots in your projects.

BigQuery BI Engine optimizes the standard SQL functions and operators when connecting business intelligence (BI) tools to BigQuery. Optimized SQL functions and operators for BI Engine are found here.

Monitor BI Engine with Cloud Monitoring

BigQuery BI Engine integrates with Cloud Monitoring so you can monitor BI Engine metrics and configure alerts.

For information on using Monitoring to create charts for your BI Engine metrics, see Creating charts in the Monitoring documentation.

4 BI Engine.jpg

We ran the same query without BI engine reservation and noticed 15.47 GB were processed.

5 BI Engine.jpg

After BI Engine capacity reservation, in Monitoring under BIE Reservation Used Bytes dashboard we got a compression ratio of ~11.74x (15.47 GB / 1.317 MB). However compression is very data dependent, primarily compression depends on the data cardinality. Customers should run tests on their data to determine their compression rate.

6 BI Engine.jpg

Monitoring metrics ‘Reservation Total Bytes’ gives information about the BI engine capacity reservation whereas ‘Reservation Used Bytes’ gives information about the total used_bytes. Customers can make use of these 2 metrics to come up with the right capacity for reservation. 

When a project has BI engine capacity reserved, queries running in BigQuery will use BI engine to accelerate the compatible subquery performance.​​The degree of acceleration of the query falls into one of the below mentioned modes:

BI Engine Mode FULL – BI Engine compute was used to accelerate leaf stages of the query but the data needed may be in memory or may need to be scanned from a disk. Even when BI Engine compute is utilized, BQ slots may also be used for parts of the query. The more complex the query,the more slots are used.This mode executes all leaf stages in BI Engine (and sometimes all stages).

BI Engine Mode PARTIAL – BI Engine accelerates compatible subqueries and BigQuery processes the subqueries that are not compatible with BI Engine.This mode also provides bi-engine-reason for not using BI Engine mode fully.This mode executes some leaf stages in BI Engine and rest in BigQuery.

BI Engine Mode DISABLED – When BI Engine process subqueries that are not compatible for acceleration, all leaf stages will get processed in BigQuery. This mode also provides bi-engine-reason for not using BI Engine mode fully/partially.

7 BI Engine.jpg

Note that when you purchase a flat rate reservation, BI Engine capacity (GB) will be provided as part of the monthly flat-rate price. You can get up to 100 GB of BI Engine capacity included for free with a 2000-slot annual commitment. As BI Engine reduces the number of slots processed for BI queries, purchasing less slots by topping up little BI Engine capacity along with freely offered capacity might suffice your requirement instead of going in for more slots!

If you’ve never sat at your desk waiting for a dashboard to load, this webinar isn’t for you. For everyone else, read on. We’re surrounded by mini data experiences in our personal lives (think doctor visits, wearable tech, online bills, etc..) and they help us make decisions in context, on the go, and near instantly. Now, why should it be any different in our workplace, even if working from home? Lots of the data-rich applications of yesteryear, or BI tools connected to un-optimized databases are simply not designed to accommodate the types of experiences we as human beings expect. Let’s change that, shall we? It is 2021.  In this webinar, we’ll introduce new innovations from Google Cloud and Looker, and hear customer stories that’ll inspire you to start designing and building your next set of 2021-proof data experiences. You’ll learn how BigQuery BI Engine works, how to enable it in Looker (hint, it’s already enabled!) and how to make sure all your data end users get their data fast, and fresh.

References

To learn more about how BI Engine and BigQuery can help your enterprise, try out listed Quickstarts page 

Related Article

Introducing Firehose: An open source tool from Gojek for seamless data ingestion to BigQuery and Cloud Storage

The Firehose open source tool allows Gojek to turbocharge the rate it streams its data into BigQuery and Cloud Storage.

Read Article