GCEの Container-Optimized OS から Stackdriver logging にログを吐き出す

ログの飛ばし方

方法は他にもあるかもしれないが試したのは以下の2つ。

  1. コンテナ起動時に log-driver=gcplogs をつける
  2. fluentd イメージを使って飛ばす

とりあえず gcplogs を付けたやりかたを試したのだが、あまり有用な感じではなかった。

GCE の docker で log-driver=gcplogs にしてもログを送ってくれない

これは結論を先にいうと、ログ自体は Stackdriver logging に送ってくれていた。

ただし、リソースをグローバルにしないと見れなかった。

GCE のページから飛ぶとリソースがその VM に指定されていたためなかなか気づかなかった。

グローバルでログを見ることはほぼなくて、GCE のページからログページに飛んで見れるのが理想。

タグとかをつければ実現できそうな気もするけど、 gcplogs は使わない方向で進んでみた。

GAE のログの見やすさを目指す

GAE のログ画面
GAE log

GCE docker (log-driver=gcplogs) のログ画面

GCE log

GAE 見やすいな神か・・・。

GAE に乗り込んで確認してみた

fluentd

GAE では gcr.io/google-appengine/fluentd-logger イメージでログを飛ばしている。

設定ファイルは /etc/google-fluentd/google-fluentd.conf
詳細は以下のようになっていた。

<source>
  type tail
  format none
  path /var/log/app_engine/app/app*.log
  pos_file /var/tmp/fluentd.app.pos
  read_from_head true
  rotate_wait 10s
  tag app
</source>

<source>
  type tail
  format json
  path /var/log/app_engine/app/app*.json
  pos_file /var/tmp/fluentd.app_json.pos
  read_from_head true
  rotate_wait 10s
  tag app
</source>

<source>
  type tail
  format none
  path /var/log/app_engine/app/request*.log
  pos_file /var/tmp/fluentd.request.pos
  read_from_head true
  rotate_wait 10s
  tag request
</source>

<source>
  type tail
  format none
  path /var/log/app_engine/app/STDOUT*.log
  pos_file /var/tmp/fluentd.STDOUT.pos
  read_from_head true
  rotate_wait 10s
  tag stdout
</source>

<source>
  type tail
  format none
  path /var/log/app_engine/app/STDERR*.log
  pos_file /var/tmp/fluentd.STDERR.pos
  read_from_head true
  rotate_wait 10s
  tag stderr
</source>

<source>
  type tail
  format none
  path /var/log/app_engine/app/custom_logs/*.log
  pos_file /var/tmp/fluentd.custom_logs.pos
  read_from_head true
  rotate_wait 10s
  tag custom.*
</source>

<source>
  type tail
  format json
  path /var/log/app_engine/app/custom_logs/*.json
  pos_file /var/tmp/fluentd.custom_logs_json.pos
  read_from_head true
  rotate_wait 10s
  tag custom.*
</source>

<source>
  type tail
  format none
  path /var/log/app_engine/monitoring/*.log
  pos_file /var/tmp/fluentd.mvm-monitoring.pos
  read_from_head true
  rotate_wait 10s
  tag monitoring.*
</source>

<source>
  type tail
  format none
  path /var/log/syslog
  pos_file /var/tmp/fluentd.syslog.pos
  read_from_head true
  rotate_wait 10s
  tag vm.syslog
</source>

# Parse nginx request (access) logs, which may inc. multiple custom suffixes:
#   tracecontext="(hexadecimal traceId)/options"
#   timestampSeconds="(epoch seconds, floating point)"
#   latencySeconds="(latency)"
# where the / and the options are themselves optional. Instead of using the
# default "format nginx" directive, this uses a custom regex to capture both
# standard and modified nginx logs and add the traceId or latencySeconds if
# either is found.
<source>
  type tail
  # The default format ends with ..."(?<agent>[^"]*)")?$/ . This regex is
  # the same as the default up until the end, at which point, before the $,
  # there are optional groups which are specified above.
  # The timeSeconds and timeNanos are used preferentially by cloud logging
  # in place of the standard time capture, which grants us the ability to do
  # sub-second granular logs.
  # When making changes here, test with http://fluentular.herokuapp.com .
  format /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) [(?<time>[^]]*)] "(?<method>S+)(?: +(?<path>[^"]*) +(?<protocol>S*))?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^"]*)" "(?<agent>[^"]*)")?(?: tracecontext="(?<traceId>[^/"]*)[^"]*")?(?: timestampSeconds="(?<timestampSeconds>[^."]*).(?<timestampNanos>[^"]*)")?(?: latencySeconds="(?<latencySeconds>[^"]*)")?(?: x-forwarded-for="(?:-|(?<realClientIP>[^,"-]+),?)[^"]*")?(?: appLatencySeconds="(?<appLatencySeconds>[^"]*)")?.*$/
  time_format %d/%b/%Y:%H:%M:%S %z
  path /var/log/nginx/access.log
  pos_file /var/tmp/fluentd.nginx-access.pos
  read_from_head true
  rotate_wait 10s
  tag nginx.unmatched
</source>

<filter nginx.unmatched>
  @type record_transformer
  enable_ruby
  <record>
    # If realClientIP is set, use that as the remote.
    remote ${record["realClientIP"] or record["remote"]}
  </record>
</filter>

# Sort nginx logs into reasonable buckets.
<match nginx.unmatched>
  type copy
  <store>
    type rewrite_tag_filter
    rewriterule1 path /_ah/health nginx.health_check
    rewriterule2 path /_ah/vm_health nginx.health_check
    rewriterule3 path /liveness_check nginx.health_check
    rewriterule4 path /readiness_check nginx.health_check
    rewriterule5 path .* nginx.request
  </store>
  <store>
    type rewrite_tag_filter
    rewriterule1 path /_ah/health nginx.health_check_copy
    rewriterule2 path /_ah/vm_health nginx.health_check_copy
    rewriterule3 path /liveness_check nginx.health_check_copy
    rewriterule4 path /readiness_check nginx.health_check_copy
  </store>
</match>

# Save health checks to file on disk every minute
# Path is path + time + .log
<match nginx.health_check_copy>
  type file
  flush_at_shutdown true
  flush_interval 2m
  path /var/log/app_engine/health_check
  time_slice_format %Y%m%d%H%M
</match>

# Parse nginx error logs in the common format
# see http://docs.fluentd.org/articles/common-log-formats
<source>
  type tail
  path /var/log/nginx/error.log

  format multiline
  format_firstline /^d{4}/d{2}/d{2} d{2}:d{2}:d{2} [w+] d+.d+: /
  format1 /^(?<time>d{4}/d{2}/d{2} d{2}:d{2}:d{2}) (?<message>.*)/
  multiline_flush_interval 3s
  tag nginx.error
</source>

# Link /var/lib/docker/containers we want to save to /var/log/docker_containers
<source>
  type tail
  path /var/log/saved_docker/*/*-json.log
  pos_file /var/tmp/saved-docker.pos
  time_format %Y-%m-%dT%H:%M:%S
  tag docker.*
  read_from_head true
  format json
</source>

# Do not collect fluentd's own logs to avoid infinite loops.
<match fluent.**>
  type null
</match>

<filter **>
  @type record_transformer
  <record>
    instanceName "#{ENV['GAE_INSTANCE_NAME']}"
  </record>
</filter>

# Transform nginx request (access) log fields into a nested hash suitable
# for consumption by Cloud Logging UI's request_log formatters.
# This extra step is necessary because fluentd regex input cannot be nested.
# An alternative would be to change nginx config to output valid json with
# the correct nested hash keys.
<filter nginx.{health_check,request}>
  type record_transformer
  <record>
    httpRequest {
      "requestMethod": "${record["method"]}",
      "referer": "${record["referer"]}",
      "remoteIp": "${record["remote"]}",
      "userAgent": "${record["agent"]}",
      "requestUrl": "${record["path"]}",
      "responseSize": "${record["size"]}",
      "status": "${record["code"]}",
      "latency": "${record["latencySeconds"]}s",
      "protocol": "${record["protocol"]}"
    }
    trace "${record["traceId"]}"
  </record>
  renew_record true
  keep_keys time,traceId,timestampSeconds,timestampNanos,latencySeconds,appLatencySeconds,instanceName
</filter>

# Docker container logs go through the from_docker container
<match docker.var.log.saved_docker.*.*.log>
  type from_docker
  stdout_tag raw.stdout
  stderr_tag raw.stderr
</match>

# Detect exceptions from stdout and stderr of Docker containers.
<match raw.**>
  type detect_exceptions
  remove_tag_prefix raw
  message message
  multiline_flush_interval 5
  max_bytes 50000
  max_lines 500
</match>

<match **>
  type google_cloud
  # Detect and parse JSON written to text logs.
  detect_json true
  buffer_chunk_limit 1m
  flush_interval 5s
  # Never wait longer than 5 minutes between retries
  max_retry_wait 300
  num_threads 8
  disable_retry_limit
  # Send these fields as labels instead of in the struct_payload
  label_map {
    "thread": "appengine.googleapis.com/thread_id",
    "traceId": "appengine.googleapis.com/trace_id",
    "instanceName": "appengine.googleapis.com/instance_name"
  }
  use_grpc true
</match>

nginx

設定ファイルは /etc/nginx/nginx.conf
詳細は以下のようになっていた。

daemon off;

worker_processes auto;

events {
  worker_connections 4096;
  multi_accept on;
}

http {
  include mime.types;
  server_tokens off;

  variables_hash_max_size 2048;

  # set max body size to 32m as appengine supports.
  client_max_body_size 32m;

  tcp_nodelay on;
  tcp_nopush on;

  underscores_in_headers on;

  # GCLB uses a 10 minutes keep-alive timeout. Setting it to a bit more here
  # to avoid a race condition between the two timeouts.
  keepalive_timeout 650;
  # Effectively unlimited number of keepalive requests in the case of GAE flex.
   keepalive_requests 4294967295; 

  upstream app_server {
    keepalive 192;
    server gaeapp:8080;
  }
  upstream mvm_agent {
    server 127.0.0.1:8000;
  }

  geo $source_type {
     default ext;
     xxx.xxx.0.0/8 lo;
     xxx.xxx.0.0/16 sb;
     xxx.xxx.0.0/16 lb;
     xxx.xxx.0.0/22 lb;
     xxx.xxx.0.0/12 do;
  }

  map $http_upgrade $ws_connection_header_value {
    default "";
    websocket upgrade;
  }
  # ngx_http_realip_module gets the second IP address from the last of the X-Forwarded-For header
  # X-Forwarded-For: [USER REQUEST PROVIDED X-F-F.]USER-IP.GCLB_IP
  set_real_ip_from  0.0.0.0/0;
  set_real_ip_from  0::/0;
  real_ip_header    X-Forwarded-For;

    iap_jwt_verify off;
    iap_jwt_verify_project_number xxxxxxxxxx;
    iap_jwt_verify_app_id xxxxxxxxxx;
    iap_jwt_verify_key_file /iap_watcher/iap_verify_keys.txt;
    iap_jwt_verify_iap_state_file /iap_watcher/iap_state;
    iap_jwt_verify_state_cache_time_sec 300;
    iap_jwt_verify_key_cache_time_sec 43200;
    iap_jwt_verify_logs_only on;

  server {    
      iap_jwt_verify on;        

    # self signed ssl for load balancer traffic
    listen 8443 default_server ssl;
    ssl_certificate /etc/ssl/localcerts/lb.crt;
    ssl_certificate_key /etc/ssl/localcerts/lb.key;
    ssl_protocols TLSv1.2;
    ssl_ciphers EECDH+AES256:!SHA1;
    ssl_prefer_server_ciphers on;
    ssl_session_timeout 3h;

    proxy_pass_header Server;

      gzip on;
      gzip_proxied any;
      gzip_types text/html text/plain text/css text/xml text/javascript application/json application/javascript application/xml application/xml+rss;
      gzip_vary on;

    # Allow more space for request headers.
    large_client_header_buffers 4 32k;

    # Allow more space for response headers. These settings apply for response
    # only, not requests which buffering is disabled below.
    proxy_buffer_size 64k;
    proxy_buffers 32 4k;
    proxy_busy_buffers_size 72k;

    # Explicitly set client buffer size matching nginx default.
    client_body_buffer_size 16k;

    # If version header present, make sure it's correct.
    if ($http_x_appengine_version !~ '(?:^$)|(?:^20180809t235555(?:..*)?$)') {
      return 444;
    }

    set $x_forwarded_for_test "";

    # If request comes from sb, lo, or do, do not care about x-forwarded-for header.
    if ($source_type !~ sb|lo|do) {
      set $x_forwarded_for_test $http_x_forwarded_for;
    }

    # For local health checks only.
    if ($http_x_google_vme_health_check = 1) {
      set $x_forwarded_for_test "";
    }

    # If the request does not come from sb, strip trusted appengine headers if
    # it did not pass through the trusted appengine pipeline.

    if ($x_forwarded_for_test !~ 'b(?:(?:10.0.0.1)|(?:2002:a[0-9a-f:]*)|(?:130.211.[0-3].[0-9]{1,3})|(?:xxx.xxx.[0-9]{1,3}.[0-9]{1,3}))$') {

      set $http_x_appengine_api_ticket "";
      set $http_x_appengine_auth_domain "";
      set $http_x_appengine_blobchunksize "";
      set $http_x_appengine_blobsize "";
      set $http_x_appengine_blobupload "";
      set $http_x_appengine_cron "";
      set $http_x_appengine_current_namespace "";
      set $http_x_appengine_datacenter "";
      set $http_x_appengine_default_namespace "";
      set $http_x_appengine_default_version_hostname "";
      set $http_x_appengine_federated_identity "";
      set $http_x_appengine_federated_provider "";
      set $http_x_appengine_https "";
      set $http_x_appengine_inbound_appid "";
      set $http_x_appengine_inbound_user_email "";
      set $http_x_appengine_inbound_user_id "";
      set $http_x_appengine_inbound_user_is_admin "";
      set $http_x_appengine_queuename "";
      set $http_x_appengine_request_id_hash "";
      set $http_x_appengine_request_log_id "";
      set $http_x_appengine_tasketa "";
      set $http_x_appengine_taskexecutioncount "";
      set $http_x_appengine_taskname "";
      set $http_x_appengine_taskretrycount "";
      set $http_x_appengine_taskretryreason "";
      set $http_x_appengine_upload_creation "";
      set $http_x_appengine_user_email "";
      set $http_x_appengine_user_id "";
      set $http_x_appengine_user_is_admin "";
      set $http_x_appengine_user_nickname "";
      set $http_x_appengine_user_organization "";
    }

    location / {

      proxy_pass http://app_server;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "";
      proxy_set_header Host $host;

      proxy_set_header X-Forwarded-Host $server_name;

      proxy_send_timeout 3600s;
      proxy_read_timeout 3600s;

      proxy_set_header X-AppEngine-Api-Ticket $http_x_appengine_api_ticket;    
      proxy_set_header X-AppEngine-Auth-Domain $http_x_appengine_auth_domain;
      proxy_set_header X-AppEngine-BlobChunkSize $http_x_appengine_blobchunksize;
      proxy_set_header X-AppEngine-BlobSize $http_x_appengine_blobsize;
      proxy_set_header X-AppEngine-BlobUpload $http_x_appengine_blobupload;
      proxy_set_header X-AppEngine-Cron $http_x_appengine_cron;
      proxy_set_header X-AppEngine-Current-Namespace $http_x_appengine_current_namespace;
      proxy_set_header X-AppEngine-Datacenter $http_x_appengine_datacenter;
      proxy_set_header X-AppEngine-Default-Namespace $http_x_appengine_default_namespace;
      proxy_set_header X-AppEngine-Default-Version-Hostname $http_x_appengine_default_version_hostname;
      proxy_set_header X-AppEngine-Federated-Identity $http_x_appengine_federated_identity;
      proxy_set_header X-AppEngine-Federated-Provider $http_x_appengine_federated_provider;
      proxy_set_header X-AppEngine-Https $http_x_appengine_https;
      proxy_set_header X-AppEngine-Inbound-AppId $http_x_appengine_inbound_appid;
      proxy_set_header X-AppEngine-Inbound-User-Email $http_x_appengine_inbound_user_email;
      proxy_set_header X-AppEngine-Inbound-User-Id $http_x_appengine_inbound_user_id;
      proxy_set_header X-AppEngine-Inbound-User-Is-Admin $http_x_appengine_inbound_user_is_admin;
      proxy_set_header X-AppEngine-QueueName $http_x_appengine_queuename;
      proxy_set_header X-AppEngine-Request-Id-Hash $http_x_appengine_request_id_hash;
      proxy_set_header X-AppEngine-Request-Log-Id $http_x_appengine_request_log_id;
      proxy_set_header X-AppEngine-TaskETA $http_x_appengine_tasketa;
      proxy_set_header X-AppEngine-TaskExecutionCount $http_x_appengine_taskexecutioncount;
      proxy_set_header X-AppEngine-TaskName $http_x_appengine_taskname;
      proxy_set_header X-AppEngine-TaskRetryCount $http_x_appengine_taskretrycount;
      proxy_set_header X-AppEngine-TaskRetryReason $http_x_appengine_taskretryreason;
      proxy_set_header X-AppEngine-Upload-Creation $http_x_appengine_upload_creation;
      proxy_set_header X-AppEngine-User-Email $http_x_appengine_user_email;
      proxy_set_header X-AppEngine-User-Id $http_x_appengine_user_id;
      proxy_set_header X-AppEngine-User-Is-Admin $http_x_appengine_user_is_admin;
      proxy_set_header X-AppEngine-User-Nickname $http_x_appengine_user_nickname;
      proxy_set_header X-AppEngine-User-Organization $http_x_appengine_user_organization;
      proxy_set_header X-AppEngine-Version "";

      location = /_ah/health {

          iap_jwt_verify off;
          if ( -f /tmp/nginx/lameducked ) {
            return 503 'lameducked';
          }

          if ( -f /tmp/nginx/app_lameducked ) {
            return 503 'app lameducked';
          }

          if ( !-f /var/lib/google/ae/disk_not_full ) {
            return 503 'disk full';
          }

          proxy_intercept_errors on;
          proxy_pass http://app_server;
          proxy_redirect off;
          proxy_http_version 1.1;
          proxy_set_header Connection "";
          proxy_set_header Host $host;

          proxy_set_header X-Forwarded-Host $server_name;

          proxy_send_timeout 3600s;
          proxy_read_timeout 3600s;


          error_page 301 302 303 307 401 402 403 404 405 =200 /_ah/vm_health;
      }

      location = /_ah/vm_health {

          iap_jwt_verify off;
          if ( -f /tmp/nginx/lameducked ) {
            return 503 'lameducked';
          }

          if ( -f /tmp/nginx/app_lameducked ) {
            return 503 'app lameducked';
          }

          if ( !-f /var/lib/google/ae/disk_not_full ) {
            return 503 'disk full';
          }

          return 200 'ok';        
      }
    }
    include /var/lib/nginx/extra/*.conf;
  }
  server {
    # expose /nginx_status but on a different port (8090) to avoid
    # external visibility / conflicts with the app.
    listen 8090;
    location /nginx_status {
      stub_status on;
      access_log off;
    }
    location / {
      root /dev/null;
    }
  }

  # Add session affinity entry to log_format line i.i.f. the GCLB cookie
  # is present.
  map $cookie_gclb $session_affinity_log_entry {
    '' '';
    default sessionAffinity=$cookie_gclb;
  }

  # Output nginx access logs in the standard format, plus additional custom
  # fields containing "X-Cloud-Trace-Context" header, the current epoch
  # timestamp, the request latency, and "X-Forwarded-For" at the end.
  # If you make changes to the log format below, you MUST validate this against
  # the parsing regex at:
  # GoogleCloudPlatform/appengine-sidecars-docker/fluentd_logger/managed_vms.conf
  # (In general, adding to the end of the list does not require a change if the
  # field does not need to be logged.)

  log_format custom '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    'tracecontext="$http_x_cloud_trace_context" '
                    'timestampSeconds="${msec}000000" '
                    'latencySeconds="$request_time" '
                    'x-forwarded-for="$http_x_forwarded_for" '
                    'appLatencySeconds="$upstream_response_time" '
                    'appStatusCode="$upstream_status" '
                    'upgrade="$http_upgrade" '

                      'iap_jwt_action="$iap_jwt_action" '

                    '$session_affinity_log_entry';

  access_log /var/log/nginx/access.log custom;
  error_log /var/log/nginx/error.log warn;
}

構成

fluentd は詳しくないけど、GAE のタイプの記述だとファイルに吐かれたログを読み取って投げるっぽい。

docker の log-driver を fluentd にする場合は forward input とあるので、fluentd が待ち受けている特定のポートにログを投げつけるようなイメージでいいのかな?

いろいろやってできたのがこれ。App Engine の設定ファイルをベースに改変した。

ファイルはGitHubから見れる。

参考