StructType#

class pyspark.sql.types.StructType(fields=None)[source]#

Struct type, consisting of a list of StructField.

This is the data type representing a Row.

Iterating a StructType will iterate over its StructFields. A contained StructField can be accessed by its name or position.

Examples

>>> from pyspark.sql.types import *
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct1["f1"]
StructField('f1', StringType(), True)
>>> struct1[0]
StructField('f1', StringType(), True)
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", CharType(10), True)])
>>> struct2 = StructType([StructField("f1", CharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct2 = StructType([StructField("f1", VarcharType(10), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType([StructField("f1", StringType(), True)])
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", IntegerType(), False)])
>>> struct1 == struct2
False

The below example demonstrates how to create a DataFrame based on a struct created using class:StructType and class:StructField:

>>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])]
>>> schema = StructType([
...     StructField("name", StringType()),
...     StructField("languagesSkills", ArrayType(StringType())),
... ])
>>> df = spark.createDataFrame(data=data, schema=schema)
>>> df.printSchema()
root
 |-- name: string (nullable = true)
 |-- languagesSkills: array (nullable = true)
 |    |-- element: string (containsNull = true)
>>> df.show()
+-----+---------------+
| name|languagesSkills|
+-----+---------------+
|Alice|  [Java, Scala]|
|  Bob|[Python, Scala]|
+-----+---------------+

Methods

add(field[, data_type, nullable, metadata])

Construct a StructType by adding new elements to it, to define the schema.

fieldNames()

Returns all field names in a list.

fromDDL(ddl)

Creates DataType for a given DDL-formatted string.

fromInternal(obj)

Converts an internal SQL object into a native Python object.

fromJson(json)

Constructs StructType from a schema defined in JSON format.

json()

jsonValue()

needConversion()

Does this type needs conversion between Python object and internal SQL object.

simpleString()

toInternal(obj)

Converts a Python object into an internal SQL object.

toNullable()

Returns the same data type but set all nullability fields are true (StructField.nullable, ArrayType.containsNull, and MapType.valueContainsNull).

treeString([maxDepth])

typeName()

Methods Documentation

add(field, data_type=None, nullable=True, metadata=None)[source]#

Construct a StructType by adding new elements to it, to define the schema. The method accepts either:

  1. A single parameter which is a StructField object.

  2. Between 2 and 4 parameters as (name, data_type, nullable (optional), metadata(optional). The data_type parameter may be either a String or a DataType object.

Parameters
fieldstr or StructField

Either the name of the field or a StructField object

data_typeDataType, optional

If present, the DataType of the StructField to create

nullablebool, optional

Whether the field to add should be nullable (default True)

metadatadict, optional

Any additional metadata (default None)

Returns
StructType

Examples

>>> from pyspark.sql.types import IntegerType, StringType, StructField, StructType
>>> struct1 = StructType().add("f1", StringType(), True).add("f2", StringType(), True, None)
>>> struct2 = StructType([StructField("f1", StringType(), True),
...     StructField("f2", StringType(), True, None)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add(StructField("f1", StringType(), True))
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
>>> struct1 = StructType().add("f1", "string", True)
>>> struct2 = StructType([StructField("f1", StringType(), True)])
>>> struct1 == struct2
True
fieldNames()[source]#

Returns all field names in a list.

Examples

>>> from pyspark.sql.types import StringType, StructField, StructType
>>> struct = StructType([StructField("f1", StringType(), True)])
>>> struct.fieldNames()
['f1']
classmethod fromDDL(ddl)#

Creates DataType for a given DDL-formatted string.

New in version 4.0.0.

Parameters
ddlstr

DDL-formatted string representation of types, e.g. pyspark.sql.types.DataType.simpleString, except that top level struct type can omit the struct<> for the compatibility reason with spark.createDataFrame and Python UDFs.

Returns
DataType

Examples

Create a StructType by the corresponding DDL formatted string.

>>> from pyspark.sql.types import DataType
>>> DataType.fromDDL("b string, a int")
StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)])

Create a single DataType by the corresponding DDL formatted string.

>>> DataType.fromDDL("decimal(10,10)")
DecimalType(10,10)

Create a StructType by the legacy string format.

>>> DataType.fromDDL("b: string, a: int")
StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)])
fromInternal(obj)[source]#

Converts an internal SQL object into a native Python object.

classmethod fromJson(json)[source]#

Constructs StructType from a schema defined in JSON format.

Below is a JSON schema it must adhere to:

 {
   "title":"StructType",
   "description":"Schema of StructType in json format",
   "type":"object",
   "properties":{
      "fields":{
         "description":"Array of struct fields",
         "type":"array",
         "items":{
             "type":"object",
             "properties":{
                "name":{
                   "description":"Name of the field",
                   "type":"string"
                },
                "type":{
                   "description": "Type of the field. Can either be
                                   another nested StructType or primitive type",
                   "type":"object/string"
                },
                "nullable":{
                   "description":"If nulls are allowed",
                   "type":"boolean"
                },
                "metadata":{
                   "description":"Additional metadata to supply",
                   "type":"object"
                },
                "required":[
                   "name",
                   "type",
                   "nullable",
                   "metadata"
                ]
             }
        }
     }
  }
}
Parameters
jsondict or a dict-like object e.g. JSON object

This “dict” must have “fields” key that returns an array of fields each of which must have specific keys (name, type, nullable, metadata).

Returns
StructType

Examples

>>> json_str = '''
...  {
...      "fields": [
...          {
...              "metadata": {},
...              "name": "Person",
...              "nullable": true,
...              "type": {
...                  "fields": [
...                      {
...                          "metadata": {},
...                          "name": "name",
...                          "nullable": false,
...                          "type": "string"
...                      },
...                      {
...                          "metadata": {},
...                          "name": "surname",
...                          "nullable": false,
...                          "type": "string"
...                      }
...                  ],
...                  "type": "struct"
...              }
...          }
...      ],
...      "type": "struct"
...  }
...  '''
>>> import json
>>> scheme = StructType.fromJson(json.loads(json_str))
>>> scheme.simpleString()
'struct<Person:struct<name:string,surname:string>>'
json()#
jsonValue()[source]#
needConversion()[source]#

Does this type needs conversion between Python object and internal SQL object.

This is used to avoid the unnecessary conversion for ArrayType/MapType/StructType.

simpleString()[source]#
toInternal(obj)[source]#

Converts a Python object into an internal SQL object.

toNullable()[source]#

Returns the same data type but set all nullability fields are true (StructField.nullable, ArrayType.containsNull, and MapType.valueContainsNull).

New in version 4.0.0.

Returns
StructType

Examples

Example 1: Simple nullability conversion

>>> StructType([StructField("a", IntegerType(), nullable=False)]).toNullable()
StructType([StructField('a', IntegerType(), True)])

Example 2: Nested nullability conversion

>>> StructType([
...     StructField("a",
...         StructType([
...             StructField("b", IntegerType(), nullable=False),
...             StructField("c", StructType([
...                 StructField("d", IntegerType(), nullable=False)
...             ]))
...         ]),
...         nullable=False)
... ]).toNullable()
StructType([StructField('a', StructType([StructField('b', IntegerType(), True),
StructField('c', StructType([StructField('d', IntegerType(), True)]), True)]), True)])
treeString(maxDepth=2147483647)[source]#
classmethod typeName()#