daft.read_json

Contents

daft.read_json#

daft.read_json(path: Union[str, List[str]], schema_hints: Optional[Dict[str, DataType]] = None, infer_schema: bool = True, schema: Optional[Dict[str, DataType]] = None, io_config: Optional[IOConfig] = None, use_native_downloader: bool = True, _buffer_size: Optional[int] = None, _chunk_size: Optional[int] = None) DataFrame[source]#

Creates a DataFrame from line-delimited JSON file(s)

Example

>>> df = daft.read_json("/path/to/file.json")
>>> df = daft.read_json("/path/to/directory")
>>> df = daft.read_json("/path/to/files-*.json")
>>> df = daft.read_json("s3://path/to/files-*.json")
Parameters:
  • path (str) – Path to JSON files (allows for wildcards)

  • schema_hints (dict[str, DataType]) –

    A mapping between column names and datatypes - passing this option will override the specified columns on the inferred schema with the specified DataTypes

    Deprecated since version 0.2.28.

    Schema hints are deprecated and will be removed in the next release. Please use schema and infer_schema instead.

  • infer_schema (bool) – Whether to infer the schema of the JSON, defaults to True.

  • schema (dict[str, DataType]) – A schema that is used as the definitive schema for the JSON if infer_schema is False, otherwise it is used as a schema hint that is applied after the schema is inferred.

  • io_config (IOConfig) – Config to be used with the native downloader

  • use_native_downloader – Whether to use the native downloader instead of PyArrow for reading Parquet. This is currently experimental.

Returns:

parsed DataFrame

Return type:

DataFrame