daft.read_csv

Contents

daft.read_csv#

daft.read_csv(path: Union[str, List[str]], schema_hints: Optional[Dict[str, DataType]] = None, infer_schema: bool = True, schema: Optional[Dict[str, DataType]] = None, has_headers: bool = True, delimiter: Optional[str] = None, double_quote: bool = True, quote: Optional[str] = None, escape_char: Optional[str] = None, comment: Optional[str] = None, allow_variable_columns: bool = False, io_config: Optional[IOConfig] = None, use_native_downloader: bool = True, _buffer_size: Optional[int] = None, _chunk_size: Optional[int] = None) DataFrame[source]#

Creates a DataFrame from CSV file(s)

Example

>>> df = daft.read_csv("/path/to/file.csv")
>>> df = daft.read_csv("/path/to/directory")
>>> df = daft.read_csv("/path/to/files-*.csv")
>>> df = daft.read_csv("s3://path/to/files-*.csv")
Parameters:
  • path (str) – Path to CSV (allows for wildcards)

  • schema_hints (dict[str, DataType]) –

    A mapping between column names and datatypes - passing this option will override the specified columns on the inferred schema with the specified DataTypes

    Deprecated since version 0.2.27: Schema hints are deprecated and will be removed in the next release. Please use schema and infer_schema instead.

  • infer_schema (bool) – Whether to infer the schema of the CSV, defaults to True.

  • schema (dict[str, DataType]) – A schema that is used as the definitive schema for the CSV if infer_schema is False, otherwise it is used as a schema hint that is applied after the schema is inferred.

  • has_headers (bool) – Whether the CSV has a header or not, defaults to True

  • delimiter (Str) – Delimiter used in the CSV, defaults to “,”

  • doubled_quote (bool) – Whether to support double quote escapes, defaults to True

  • escape_char (str) – Character to use as the escape character for double quotes, or defaults to "

  • comment (str) – Character to treat as the start of a comment line, or None to not support comments

  • allow_variable_columns (bool) – Whether to allow for variable number of columns in the CSV, defaults to False. If set to True, Daft will append nulls to rows with less columns than the schema, and ignore extra columns in rows with more columns

  • io_config (IOConfig) – Config to be used with the native downloader

  • use_native_downloader – Whether to use the native downloader instead of PyArrow for reading Parquet. This is currently experimental.

Returns:

parsed DataFrame

Return type:

DataFrame