daft.read_json

Contents

daft.read_json#

read_json(path: Union[str, List[str]], infer_schema: bool = True, schema: Optional[Dict[str, DataType]] = None, io_config: Optional[IOConfig] = None, file_path_column: Optional[str] = None, hive_partitioning: bool = False, use_native_downloader: bool = True, schema_hints: Optional[Dict[str, DataType]] = None, _buffer_size: Optional[int] = None, _chunk_size: Optional[int] = None) DataFrame[source]#

Creates a DataFrame from line-delimited JSON file(s)

Example

>>> df = daft.read_json("/path/to/file.json")
>>> df = daft.read_json("/path/to/directory")
>>> df = daft.read_json("/path/to/files-*.json")
>>> df = daft.read_json("s3://path/to/files-*.json")
Parameters:
  • path (str) – Path to JSON files (allows for wildcards)

  • infer_schema (bool) – Whether to infer the schema of the JSON, defaults to True.

  • schema (dict[str, DataType]) – A schema that is used as the definitive schema for the JSON if infer_schema is False, otherwise it is used as a schema hint that is applied after the schema is inferred.

  • io_config (IOConfig) – Config to be used with the native downloader

  • file_path_column – Include the source path(s) as a column with this name. Defaults to None.

  • hive_partitioning – Whether to infer hive_style partitions from file paths and include them as columns in the Dataframe. Defaults to False.

  • use_native_downloader – Whether to use the native downloader instead of PyArrow for reading Parquet. This is currently experimental.

Returns:

parsed DataFrame

Return type:

DataFrame