daft.io.S3Config

daft.io.S3Config#

class daft.io.S3Config(region_name=None, endpoint_url=None, key_id=None, session_token=None, access_key=None, max_connections=None, retry_initial_backoff_ms=None, connect_timeout_ms=None, read_timeout_ms=None, num_tries=None, retry_mode=None, anonymous=None, use_ssl=None, verify_ssl=None, check_hostname_ssl=None, requester_pays=None, force_virtual_addressing=None)#

Create configurations to be used when accessing an S3-compatible system

Parameters:
  • region_name – Name of the region to be used (used when accessing AWS S3), defaults to “us-east-1”. If wrongly provided, Daft will attempt to auto-detect the buckets’ region at the cost of extra S3 requests.

  • endpoint_url – URL to the S3 endpoint, defaults to endpoints to AWS

  • key_id – AWS Access Key ID, defaults to auto-detection from the current environment

  • access_key – AWS Secret Access Key, defaults to auto-detection from the current environment

  • max_connections – Maximum number of connections to S3 at any time, defaults to 64

  • session_token – AWS Session Token, required only if key_id and access_key are temporary credentials

  • retry_initial_backoff_ms – Initial backoff duration in milliseconds for an S3 retry, defaults to 1000ms

  • connect_timeout_ms – Timeout duration to wait to make a connection to S3 in milliseconds, defaults to 10 seconds

  • read_timeout_ms – Timeout duration to wait to read the first byte from S3 in milliseconds, defaults to 10 seconds

  • num_tries – Number of attempts to make a connection, defaults to 5

  • retry_mode – Retry Mode when a request fails, current supported values are standard and adaptive, defaults to adaptive

  • anonymous – Whether or not to use “anonymous mode”, which will access S3 without any credentials

  • use_ssl – Whether or not to use SSL, which require accessing S3 over HTTPS rather than HTTP, defaults to True

  • verify_ssl – Whether or not to verify ssl certificates, which will access S3 without checking if the certs are valid, defaults to True

  • check_hostname_ssl – Whether or not to verify the hostname when verifying ssl certificates, this was the legacy behavior for openssl, defaults to True

  • requester_pays – Whether or not the authenticated user will assume transfer costs, which is required by some providers of bulk data, defaults to False

  • force_virtual_addressing – Force S3 client to use virtual addressing in all cases. If False, virtual addressing will only be used if endpoint_url is empty, defaults to False

Example

>>> io_config = IOConfig(s3=S3Config(key_id="xxx", access_key="xxx"))
>>> daft.read_parquet("s3://some-path", io_config=io_config)
__init__()#

Methods

__init__()

from_env()

Creates an S3Config from the current environment, auto-discovering variables such as credentials, regions and more.

replace([region_name, endpoint_url, key_id, ...])

Attributes

access_key

AWS Secret Access Key

anonymous

AWS Anonymous Mode

check_hostname_ssl

AWS Check SSL Hostname

connect_timeout_ms

AWS Connection Timeout in Milliseconds

endpoint_url

S3-compatible endpoint to use

force_virtual_addressing

AWS force virtual addressing

key_id

AWS Access Key ID

max_connections

AWS max connections per IO thread

num_tries

AWS Number Retries

read_timeout_ms

AWS Read Timeout in Milliseconds

region_name

Region to use when accessing AWS S3

requester_pays

AWS Requester Pays

retry_initial_backoff_ms

AWS Retry Initial Backoff Time in Milliseconds

retry_mode

AWS Retry Mode

session_token

AWS Session Token

use_ssl

AWS Use SSL

verify_ssl

AWS Verify SSL