daft.read_csv

Contents

daft.read_csv#

daft.read_csv(path: Union[str, List[str]], schema_hints: Optional[Dict[str, DataType]] = None, has_headers: bool = True, column_names: Optional[List[str]] = None, delimiter: Optional[str] = None, double_quote: bool = True, quote: Optional[str] = None, escape_char: Optional[str] = None, comment: Optional[str] = None, io_config: Optional[IOConfig] = None, use_native_downloader: bool = True, _buffer_size: Optional[int] = None, _chunk_size: Optional[int] = None) DataFrame[source]#

Creates a DataFrame from CSV file(s)

Example

>>> df = daft.read_csv("/path/to/file.csv")
>>> df = daft.read_csv("/path/to/directory")
>>> df = daft.read_csv("/path/to/files-*.csv")
>>> df = daft.read_csv("s3://path/to/files-*.csv")
Parameters:
  • path (str) – Path to CSV (allows for wildcards)

  • schema_hints (dict[str, DataType]) – A mapping between column names and datatypes - passing this option will override the specified columns on the inferred schema with the specified DataTypes

  • has_headers (bool) – Whether the CSV has a header or not, defaults to True

  • delimiter (Str) – Delimiter used in the CSV, defaults to “,”

  • doubled_quote (bool) – Whether to support double quote escapes, defaults to True

  • escape_char (str) – Character to use as the escape character for double quotes, or defaults to "

  • comment (str) – Character to treat as the start of a comment line, or None to not support comments

  • io_config (IOConfig) – Config to be used with the native downloader

  • use_native_downloader – Whether to use the native downloader instead of PyArrow for reading Parquet. This is currently experimental.

Returns:

parsed DataFrame

Return type:

DataFrame