daft.io.S3Config#
- class S3Config(region_name=None, endpoint_url=None, key_id=None, session_token=None, access_key=None, credentials_provider=None, buffer_time=None, max_connections=None, retry_initial_backoff_ms=None, connect_timeout_ms=None, read_timeout_ms=None, num_tries=None, retry_mode=None, anonymous=None, use_ssl=None, verify_ssl=None, check_hostname_ssl=None, requester_pays=None, force_virtual_addressing=None, profile_name=None)#
Create configurations to be used when accessing an S3-compatible system
- Parameters:
region_name (str, optional) – Name of the region to be used (used when accessing AWS S3), defaults to “us-east-1”. If wrongly provided, Daft will attempt to auto-detect the buckets’ region at the cost of extra S3 requests.
endpoint_url (str, optional) – URL to the S3 endpoint, defaults to endpoints to AWS
key_id (str, optional) – AWS Access Key ID, defaults to auto-detection from the current environment
access_key (str, optional) – AWS Secret Access Key, defaults to auto-detection from the current environment
credentials_provider (Callable[[], S3Credentials], optional) – Custom credentials provider function, should return a
S3Credentials
objectbuffer_time (int, optional) – Amount of time in seconds before the actual credential expiration time where credentials given by
credentials_provider
are considered expired, defaults to 10smax_connections (int, optional) – Maximum number of connections to S3 at any time, defaults to 64
session_token (str, optional) – AWS Session Token, required only if
key_id
andaccess_key
are temporary credentialsretry_initial_backoff_ms (int, optional) – Initial backoff duration in milliseconds for an S3 retry, defaults to 1000ms
connect_timeout_ms (int, optional) – Timeout duration to wait to make a connection to S3 in milliseconds, defaults to 10 seconds
read_timeout_ms (int, optional) – Timeout duration to wait to read the first byte from S3 in milliseconds, defaults to 10 seconds
num_tries (int, optional) – Number of attempts to make a connection, defaults to 5
retry_mode (str, optional) – Retry Mode when a request fails, current supported values are
standard
andadaptive
, defaults toadaptive
anonymous (bool, optional) – Whether or not to use “anonymous mode”, which will access S3 without any credentials
use_ssl (bool, optional) – Whether or not to use SSL, which require accessing S3 over HTTPS rather than HTTP, defaults to True
verify_ssl (bool, optional) – Whether or not to verify ssl certificates, which will access S3 without checking if the certs are valid, defaults to True
check_hostname_ssl (bool, optional) – Whether or not to verify the hostname when verifying ssl certificates, this was the legacy behavior for openssl, defaults to True
requester_pays (bool, optional) – Whether or not the authenticated user will assume transfer costs, which is required by some providers of bulk data, defaults to False
force_virtual_addressing (bool, optional) – Force S3 client to use virtual addressing in all cases. If False, virtual addressing will only be used if
endpoint_url
is empty, defaults to Falseprofile_name (str, optional) – Name of AWS_PROFILE to load, defaults to None which will then check the Environment Variable
AWS_PROFILE
then fall back todefault
Example
>>> io_config = IOConfig(s3=S3Config(key_id="xxx", access_key="xxx")) >>> daft.read_parquet("s3://some-path", io_config=io_config)
- __init__()#
Methods
__init__
()from_env
()Creates an S3Config from the current environment, auto-discovering variables such as credentials, regions and more.
replace
([region_name, endpoint_url, key_id, ...])Attributes
access_key
AWS Secret Access Key
anonymous
AWS Anonymous Mode
buffer_time
AWS Buffer Time in Seconds
check_hostname_ssl
AWS Check SSL Hostname
connect_timeout_ms
AWS Connection Timeout in Milliseconds
credentials_provider
Custom credentials provider function
endpoint_url
S3-compatible endpoint to use
force_virtual_addressing
AWS force virtual addressing
key_id
AWS Access Key ID
max_connections
AWS max connections per IO thread
num_tries
AWS Number Retries
profile_name
AWS profile name
read_timeout_ms
AWS Read Timeout in Milliseconds
region_name
Region to use when accessing AWS S3
requester_pays
AWS Requester Pays
retry_initial_backoff_ms
AWS Retry Initial Backoff Time in Milliseconds
retry_mode
AWS Retry Mode
session_token
AWS Session Token
use_ssl
AWS Use SSL
verify_ssl
AWS Verify SSL