LOG IN SIGN UP
Documentation

Log streaming: Google BigQuery

  Last updated August 24, 2017

Fastly's Real-Time Log Streaming feature can send log files to BigQuery, Google's managed enterprise data warehouse.

Prerequisites

Before adding BigQuery as a logging endpoint for Fastly services, you will need to:

Creating a service account

BigQuery uses service accounts for third-party application authentication. To create a new service account, see Google's guide on generating service account credentials. When you create the service account, set the key type to JSON.

Obtaining the private key and client email

After you create the service account, download the JSON file to your computer. This file contains the credentials for your BigQuery service account. Open the file and make a note of the private_key and client_email.

Enabling the BigQuery API

To send your Fastly logs to your GCS bucket, you'll need to enable the BigQuery API in the Google Cloud Platform API Manager.

Creating the BigQuery dataset

After you've enabled the BigQuery API, follow these instructions to create a BigQuery dataset:

  1. Log in to BigQuery.
  2. Click the arrow next to your account name on the sidebar and select Create new dataset.

    The BigQuery dataset link

    The Create Dataset window appears.

    The BigQuery dataset window

  3. In the Dataset ID field, type a name for the dataset (e.g., fastly_bigquery).
  4. Click the OK button.

Adding a BigQuery table

After you've created the BigQuery dataset, you'll need to add a BigQuery table. There are four ways of creating the schema for the table:

Follow these instructions to add a BigQuery table:

  1. On the BigQuery website, click the arrow next to the dataset name on the sidebar and select Create new table.

    The BigQuery dataset link

    The Create Table page appears.

    The BigQuery create table page

  2. In the Source Data section, select Create empty table.
  3. In the Table name field, type a name for the table (e.g., logs).
  4. In the Schema section of the BigQuery website, use the interface to add fields and complete the schema. See the example schema section for details.
  5. Create the Create Table button.

Adding BigQuery as a logging endpoint

Follow these instructions to add BigQuery as a logging endpoint:

  1. Review the information in our Setting Up Remote Log Streaming guide.
  2. Click the BigQuery logo. The Create a BigQuery endpoint page appears.

    The BigQuery endpoint page

  3. Fill out the Create a BigQuery endpoint fields as follows:
    • In the Name field, type a human-readable name for the endpoint.
    • In the Log format field, enter the data to send to BigQuery. See the example format section for details.
    • In the Email field, type the client_email address associated with the BigQuery account.
    • In the Secret key field, type the secret key associated with the BigQuery account.
    • In the Project ID field, type the ID of your Google Cloud Platform project.
    • In the Dataset field, type the name of your BigQuery dataset.
    • In the Table field, type the name of your BigQuery table.
    • In the Template field, optionally type an strftime compatible string to use as the template suffix for your table.
  4. Click Create to create the new logging endpoint.
  5. Click the Activate button to deploy your configuration changes.

Example format

Data sent to BigQuery must be serialized as a JSON object, and every field in the JSON object must map to a string in your table's schema. The JSON can have nested data in it (e.g. the value of a key in your object can be another object). Here's an example format string for sending data to BigQuery:

{
  "timestamp":"%{begin:%Y-%m-%dT%H:%M:%S%z}t",
  "time_elapsed":%{time.elapsed.usec}V,
  "is_tls":%{if(req.is_ssl, "true", "false")}V,
  "client_ip":"%{req.http.Fastly-Client-IP}V",
  "geo_city":"%{client.geo.city}V",
  "geo_country_code":"%{client.geo.country_code}V",
  "request":"%{req.request}V",
  "host":"%{req.http.Fastly-Orig-Host}V",
  "url":"%{cstr_escape(req.url)}V",
  "request_referer":"%{cstr_escape(req.http.Referer)}V",
  "request_user_agent":"%{cstr_escape(req.http.User-Agent)}V",
  "request_accept_language":"%{cstr_escape(req.http.Accept-Language)}V",
  "request_accept_charset":"%{cstr_escape(req.http.Accept-Charset)}V",
  "cache_status":"%{regsub(fastly_info.state, "^(HIT-(SYNTH)|(HITPASS|HIT|MISS|PASS|ERROR|PIPE)).*", "\\2\\3") }V"
}

Example schema

The BigQuery schema for the example format shown above would look something like this:

timestamp:STRING,time_elapsed:FLOAT,is_tls:BOOLEAN,client_ip:STRING,geo_city:STRING,geo_country_code:STRING,request:STRING,host:STRING,url:STRING,request_referer:STRING,request_user_agent:STRING,request_accept_language:STRING,request_accept_charset:STRING,cache_status:STRING

Additional resources:


Back to Top