daft.io.S3Config

daft.io.S3Config#

class S3Config(region_name=None, endpoint_url=None, key_id=None, session_token=None, access_key=None, credentials_provider=None, buffer_time=None, max_connections=None, retry_initial_backoff_ms=None, connect_timeout_ms=None, read_timeout_ms=None, num_tries=None, retry_mode=None, anonymous=None, use_ssl=None, verify_ssl=None, check_hostname_ssl=None, requester_pays=None, force_virtual_addressing=None, profile_name=None)#

Create configurations to be used when accessing an S3-compatible system

Parameters:
  • region_name (str, optional) – Name of the region to be used (used when accessing AWS S3), defaults to “us-east-1”. If wrongly provided, Daft will attempt to auto-detect the buckets’ region at the cost of extra S3 requests.

  • endpoint_url (str, optional) – URL to the S3 endpoint, defaults to endpoints to AWS

  • key_id (str, optional) – AWS Access Key ID, defaults to auto-detection from the current environment

  • access_key (str, optional) – AWS Secret Access Key, defaults to auto-detection from the current environment

  • credentials_provider (Callable[[], S3Credentials], optional) – Custom credentials provider function, should return a S3Credentials object

  • buffer_time (int, optional) – Amount of time in seconds before the actual credential expiration time where credentials given by credentials_provider are considered expired, defaults to 10s

  • max_connections (int, optional) – Maximum number of connections to S3 at any time, defaults to 64

  • session_token (str, optional) – AWS Session Token, required only if key_id and access_key are temporary credentials

  • retry_initial_backoff_ms (int, optional) – Initial backoff duration in milliseconds for an S3 retry, defaults to 1000ms

  • connect_timeout_ms (int, optional) – Timeout duration to wait to make a connection to S3 in milliseconds, defaults to 10 seconds

  • read_timeout_ms (int, optional) – Timeout duration to wait to read the first byte from S3 in milliseconds, defaults to 10 seconds

  • num_tries (int, optional) – Number of attempts to make a connection, defaults to 5

  • retry_mode (str, optional) – Retry Mode when a request fails, current supported values are standard and adaptive, defaults to adaptive

  • anonymous (bool, optional) – Whether or not to use “anonymous mode”, which will access S3 without any credentials

  • use_ssl (bool, optional) – Whether or not to use SSL, which require accessing S3 over HTTPS rather than HTTP, defaults to True

  • verify_ssl (bool, optional) – Whether or not to verify ssl certificates, which will access S3 without checking if the certs are valid, defaults to True

  • check_hostname_ssl (bool, optional) – Whether or not to verify the hostname when verifying ssl certificates, this was the legacy behavior for openssl, defaults to True

  • requester_pays (bool, optional) – Whether or not the authenticated user will assume transfer costs, which is required by some providers of bulk data, defaults to False

  • force_virtual_addressing (bool, optional) – Force S3 client to use virtual addressing in all cases. If False, virtual addressing will only be used if endpoint_url is empty, defaults to False

  • profile_name (str, optional) – Name of AWS_PROFILE to load, defaults to None which will then check the Environment Variable AWS_PROFILE then fall back to default

Example

>>> io_config = IOConfig(s3=S3Config(key_id="xxx", access_key="xxx"))
>>> daft.read_parquet("s3://some-path", io_config=io_config)
__init__()#

Methods

__init__()

from_env()

Creates an S3Config from the current environment, auto-discovering variables such as credentials, regions and more.

replace([region_name, endpoint_url, key_id, ...])

Attributes

access_key

AWS Secret Access Key

anonymous

AWS Anonymous Mode

buffer_time

AWS Buffer Time in Seconds

check_hostname_ssl

AWS Check SSL Hostname

connect_timeout_ms

AWS Connection Timeout in Milliseconds

credentials_provider

Custom credentials provider function

endpoint_url

S3-compatible endpoint to use

force_virtual_addressing

AWS force virtual addressing

key_id

AWS Access Key ID

max_connections

AWS max connections per IO thread

num_tries

AWS Number Retries

profile_name

AWS profile name

read_timeout_ms

AWS Read Timeout in Milliseconds

region_name

Region to use when accessing AWS S3

requester_pays

AWS Requester Pays

retry_initial_backoff_ms

AWS Retry Initial Backoff Time in Milliseconds

retry_mode

AWS Retry Mode

session_token

AWS Session Token

use_ssl

AWS Use SSL

verify_ssl

AWS Verify SSL