daft.expressions.expressions.ExpressionUrlNamespace.download#

ExpressionUrlNamespace.download(max_connections: int = 32, on_error: Literal['raise'] | Literal['null'] = 'raise', fs: fsspec.AbstractFileSystem | None = None, io_config: IOConfig | None = None, use_native_downloader: bool = False) Expression[source]#

Treats each string as a URL, and downloads the bytes contents as a bytes column

Parameters
  • max_connections – The maximum number of connections to use per thread to use for downloading URLs, defaults to 32

  • on_error – Behavior when a URL download error is encountered - “raise” to raise the error immediately or “null” to log the error but fallback to a Null value. Defaults to “raise”.

  • fs (fsspec.AbstractFileSystem) – fsspec FileSystem to use for downloading data. By default, Daft will automatically construct a FileSystem instance internally.

  • use_native_downloader (bool) – Use the native downloader rather than python based one. Defaults to False.

Returns

a Binary expression which is the bytes contents of the URL, or None if an error occured during download

Return type

Expression