StructType#

class pyspark.sql.types.StructType(fields=None)[source]#

Struct type, consisting of a list of StructField.

This is the data type representing a Row.

Iterating a StructType will iterate over its StructFields. A contained StructField can be accessed by its name or position.

Examples

>>> from pyspark.sql.types import *
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct1["f1"]
StructField('f1', StringType(), True)
>>> struct1[0]
StructField('f1', StringType(), True)

>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", CharType(10), True)])
>>> struct2 = StructType([StructField("f1", CharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct2 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", IntegerType(), False)])
>>> struct1 == struct2
False

The below example demonstrates how to create a DataFrame based on a struct created using class:StructType and class:StructField:

>>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])]
>>> schema = StructType([
...     StructField("name", StringType()),
...     StructField("languagesSkills", ArrayType(StringType())),
... ])
>>> df = spark.createDataFrame(data=data, schema=schema)
>>> df.printSchema()
root
 |-- name: string (nullable = true)
 |-- languagesSkills: array (nullable = true)
 |    |-- element: string (containsNull = true)
>>> df.show()
+-----+---------------+
| name|languagesSkills|
+-----+---------------+
|Alice|  [Java, Scala]|
|  Bob|[Python, Scala]|
+-----+---------------+

Methods

`add`(field[, data_type, nullable, metadata])	Construct a `StructType` by adding new elements to it, to define the schema.
`fieldNames`()	Returns all field names in a list.
`fromDDL`(ddl)	Creates `DataType` for a given DDL-formatted string.
`fromInternal`(obj)	Converts an internal SQL object into a native Python object.
`fromJson`(json)	Constructs `StructType` from a schema defined in JSON format.
`json`()
`jsonValue`()
`needConversion`()	Does this type needs conversion between Python object and internal SQL object.
`simpleString`()
`toInternal`(obj)	Converts a Python object into an internal SQL object.
`toNullable`()	Returns the same data type but set all nullability fields are true (StructField.nullable, ArrayType.containsNull, and MapType.valueContainsNull).
`treeString`([maxDepth])
`typeName`()

Methods Documentation

add(field, data_type=None, nullable=True, metadata=None)[source]#

Construct a StructType by adding new elements to it, to define the schema. The method accepts either:

A single parameter which is a StructField object.

Between 2 and 4 parameters as (name, data_type, nullable (optional), metadata(optional). The data_type parameter may be either a String or a DataType object.

Parameters

fieldstr or StructField: Either the name of the field or a StructField object
data_typeDataType, optional: If present, the DataType of the StructField to create
nullablebool, optional: Whether the field to add should be nullable (default True)
metadatadict, optional: Any additional metadata (default None)

Returns

StructType

Examples

>>> from pyspark.sql.types import IntegerType, StringType, StructField, StructType
>>> struct1 = StructType().add("f1", StringType(), True).add("f2", StringType(), True, None)
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", StringType(), True, None)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add(StructField("f1", StringType(), True))
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add("f1", "string", True)
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True

fieldNames()[source]#

Returns all field names in a list.

Examples

>>> from pyspark.sql.types import StringType, StructField, StructType
>>> struct = StructType([StructField("f1", StringType(), True)])
>>> struct.fieldNames()
['f1']

classmethod fromDDL(ddl)#

Creates DataType for a given DDL-formatted string.

New in version 4.0.0.

Parameters

ddlstr: DDL-formatted string representation of types, e.g. pyspark.sql.types.DataType.simpleString, except that top level struct type can omit the struct<> for the compatibility reason with spark.createDataFrame and Python UDFs.

Returns

DataType

Examples

Create a StructType by the corresponding DDL formatted string.

>>> from pyspark.sql.types import DataType
>>> DataType.fromDDL("b string, a int")
StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)])

Create a single DataType by the corresponding DDL formatted string.

>>> DataType.fromDDL("decimal(10,10)")
DecimalType(10,10)

Create a StructType by the legacy string format.

>>> DataType.fromDDL("b: string, a: int")
StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)])

fromInternal(obj)[source]#: Converts an internal SQL object into a native Python object.

classmethod fromJson(json)[source]#

Constructs StructType from a schema defined in JSON format.

Below is a JSON schema it must adhere to:

 {
   "title":"StructType",
   "description":"Schema of StructType in json format",
   "type":"object",
   "properties":{
      "fields":{
         "description":"Array of struct fields",
         "type":"array",
         "items":{
             "type":"object",
             "properties":{
                "name":{
                   "description":"Name of the field",
                   "type":"string"
                },
                "type":{
                   "description": "Type of the field. Can either be
                                   another nested StructType or primitive type",
                   "type":"object/string"
                },
                "nullable":{
                   "description":"If nulls are allowed",
                   "type":"boolean"
                },
                "metadata":{
                   "description":"Additional metadata to supply",
                   "type":"object"
                },
                "required":[
                   "name",
                   "type",
                   "nullable",
                   "metadata"
                ]
             }
        }
     }
  }
}

Parameters

jsondict or a dict-like object e.g. JSON object: This “dict” must have “fields” key that returns an array of fields each of which must have specific keys (name, type, nullable, metadata).

Returns

StructType

Examples

>>> json_str = '''
...  {
...      "fields": [
...          {
...              "metadata": {},
...              "name": "Person",
...              "nullable": true,
...              "type": {
...                  "fields": [
...                      {
...                          "metadata": {},
...                          "name": "name",
...                          "nullable": false,
...                          "type": "string"
...                      },
...                      {
...                          "metadata": {},
...                          "name": "surname",
...                          "nullable": false,
...                          "type": "string"
...                      }
...                  ],
...                  "type": "struct"
...              }
...          }
...      ],
...      "type": "struct"
...  }
...  '''
>>> import json
>>> scheme = StructType.fromJson(json.loads(json_str))
>>> scheme.simpleString()
'struct<Person:struct<name:string,surname:string>>'

json()#

jsonValue()[source]#

needConversion()[source]#

Does this type needs conversion between Python object and internal SQL object.

This is used to avoid the unnecessary conversion for ArrayType/MapType/StructType.

simpleString()[source]#

toInternal(obj)[source]#: Converts a Python object into an internal SQL object.

toNullable()[source]#

Returns the same data type but set all nullability fields are true (StructField.nullable, ArrayType.containsNull, and MapType.valueContainsNull).

New in version 4.0.0.

Returns

StructType

Examples

Example 1: Simple nullability conversion

>>> StructType([StructField("a", IntegerType(), nullable=False)]).toNullable()
StructType([StructField('a', IntegerType(), True)])

Example 2: Nested nullability conversion

>>> StructType([
...     StructField("a",
...         StructType([
...             StructField("b", IntegerType(), nullable=False),
...             StructField("c", StructType([
...                 StructField("d", IntegerType(), nullable=False)
...             ]))
...         ]),
...         nullable=False)
... ]).toNullable()
StructType([StructField('a', StructType([StructField('b', IntegerType(), True),
StructField('c', StructType([StructField('d', IntegerType(), True)]), True)]), True)])

treeString(maxDepth=2147483647)[source]#

classmethod typeName()#