daft.io.GCSConfig

daft.io.GCSConfig#

class GCSConfig(project_id=None, credentials=None, token=None, anonymous=None, max_connections=None, retry_initial_backoff_ms=None, connect_timeout_ms=None, read_timeout_ms=None, num_tries=None)#

Create configurations to be used when accessing Google Cloud Storage.

Credentials may be provided directly with the credentials parameter, or set with the GOOGLE_APPLICATION_CREDENTIALS_JSON or GOOGLE_APPLICATION_CREDENTIALS environment variables.

Parameters:
  • project_id (str, optional) – Google Project ID, defaults to value in credentials file or Google Cloud metadata service

  • credentials (str, optional) – Path to credentials file or JSON string with credentials

  • token (str, optional) – OAuth2 token to use for authentication. You likely want to use credentials instead, since it can be used to refresh the token. This value is used when vended by a data catalog.

  • anonymous (bool, optional) – Whether or not to use “anonymous mode”, which will access Google Storage without any credentials. Defaults to false

  • max_connections (int, optional) – Maximum number of connections to GCS at any time per io thread, defaults to 8

  • retry_initial_backoff_ms (int, optional) – Initial backoff duration in milliseconds for an GCS retry, defaults to 1000ms

  • connect_timeout_ms (int, optional) – Timeout duration to wait to make a connection to GCS in milliseconds, defaults to 30 seconds

  • read_timeout_ms (int, optional) – Timeout duration to wait to read the first byte from GCS in milliseconds, defaults to 30 seconds

  • num_tries (int, optional) – Number of attempts to make a connection, defaults to 5

Example

>>> io_config = IOConfig(gcs=GCSConfig(anonymous=True))
>>> daft.read_parquet("gs://some-path", io_config=io_config)
__init__()#

Methods

__init__()

replace([project_id, credentials, token, ...])

Attributes

anonymous

Whether to use anonymous mode

connect_timeout_ms

credentials

Credentials file path or string to use when accessing Google Cloud Storage

max_connections

num_tries

project_id

Project ID to use when accessing Google Cloud Storage

read_timeout_ms

retry_initial_backoff_ms

token

OAuth2 token to use when accessing Google Cloud Storage