StructType#
- class pyspark.sql.types.StructType(fields=None)[source]#
Struct type, consisting of a list of
StructField
.This is the data type representing a
Row
.Iterating a
StructType
will iterate over itsStructField
s. A containedStructField
can be accessed by its name or position.Examples
>>> from pyspark.sql.types import * >>> struct1 = StructType([StructField("f1", StringType(), True)]) >>> struct1["f1"] StructField('f1', StringType(), True) >>> struct1[0] StructField('f1', StringType(), True)
>>> struct1 = StructType([StructField("f1", StringType(), True)]) >>> struct2 = StructType([StructField("f1", StringType(), True)]) >>> struct1 == struct2 True >>> struct1 = StructType([StructField("f1", CharType(10), True)]) >>> struct2 = StructType([StructField("f1", CharType(10), True)]) >>> struct1 == struct2 True >>> struct1 = StructType([StructField("f1", VarcharType(10), True)]) >>> struct2 = StructType([StructField("f1", VarcharType(10), True)]) >>> struct1 == struct2 True >>> struct1 = StructType([StructField("f1", StringType(), True)]) >>> struct2 = StructType([StructField("f1", StringType(), True), ... StructField("f2", IntegerType(), False)]) >>> struct1 == struct2 False
The below example demonstrates how to create a DataFrame based on a struct created using class:StructType and class:StructField:
>>> data = [("Alice", ["Java", "Scala"]), ("Bob", ["Python", "Scala"])] >>> schema = StructType([ ... StructField("name", StringType()), ... StructField("languagesSkills", ArrayType(StringType())), ... ]) >>> df = spark.createDataFrame(data=data, schema=schema) >>> df.printSchema() root |-- name: string (nullable = true) |-- languagesSkills: array (nullable = true) | |-- element: string (containsNull = true) >>> df.show() +-----+---------------+ | name|languagesSkills| +-----+---------------+ |Alice| [Java, Scala]| | Bob|[Python, Scala]| +-----+---------------+
Methods
add
(field[, data_type, nullable, metadata])Construct a
StructType
by adding new elements to it, to define the schema.Returns all field names in a list.
fromDDL
(ddl)Creates
DataType
for a given DDL-formatted string.fromInternal
(obj)Converts an internal SQL object into a native Python object.
fromJson
(json)Constructs
StructType
from a schema defined in JSON format.json
()Does this type needs conversion between Python object and internal SQL object.
toInternal
(obj)Converts a Python object into an internal SQL object.
Returns the same data type but set all nullability fields are true (StructField.nullable, ArrayType.containsNull, and MapType.valueContainsNull).
treeString
([maxDepth])typeName
()Methods Documentation
- add(field, data_type=None, nullable=True, metadata=None)[source]#
Construct a
StructType
by adding new elements to it, to define the schema. The method accepts either:A single parameter which is a
StructField
object.Between 2 and 4 parameters as (name, data_type, nullable (optional), metadata(optional). The data_type parameter may be either a String or a
DataType
object.
- Parameters
- fieldstr or
StructField
Either the name of the field or a
StructField
object- data_type
DataType
, optional If present, the DataType of the
StructField
to create- nullablebool, optional
Whether the field to add should be nullable (default True)
- metadatadict, optional
Any additional metadata (default None)
- fieldstr or
- Returns
Examples
>>> from pyspark.sql.types import IntegerType, StringType, StructField, StructType >>> struct1 = StructType().add("f1", StringType(), True).add("f2", StringType(), True, None) >>> struct2 = StructType([StructField("f1", StringType(), True), ... StructField("f2", StringType(), True, None)]) >>> struct1 == struct2 True >>> struct1 = StructType().add(StructField("f1", StringType(), True)) >>> struct2 = StructType([StructField("f1", StringType(), True)]) >>> struct1 == struct2 True >>> struct1 = StructType().add("f1", "string", True) >>> struct2 = StructType([StructField("f1", StringType(), True)]) >>> struct1 == struct2 True
- fieldNames()[source]#
Returns all field names in a list.
Examples
>>> from pyspark.sql.types import StringType, StructField, StructType >>> struct = StructType([StructField("f1", StringType(), True)]) >>> struct.fieldNames() ['f1']
- classmethod fromDDL(ddl)#
Creates
DataType
for a given DDL-formatted string.New in version 4.0.0.
- Parameters
- ddlstr
DDL-formatted string representation of types, e.g.
pyspark.sql.types.DataType.simpleString
, except that top level struct type can omit thestruct<>
for the compatibility reason withspark.createDataFrame
and Python UDFs.
- Returns
Examples
Create a StructType by the corresponding DDL formatted string.
>>> from pyspark.sql.types import DataType >>> DataType.fromDDL("b string, a int") StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)])
Create a single DataType by the corresponding DDL formatted string.
>>> DataType.fromDDL("decimal(10,10)") DecimalType(10,10)
Create a StructType by the legacy string format.
>>> DataType.fromDDL("b: string, a: int") StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)])
- classmethod fromJson(json)[source]#
Constructs
StructType
from a schema defined in JSON format.Below is a JSON schema it must adhere to:
{ "title":"StructType", "description":"Schema of StructType in json format", "type":"object", "properties":{ "fields":{ "description":"Array of struct fields", "type":"array", "items":{ "type":"object", "properties":{ "name":{ "description":"Name of the field", "type":"string" }, "type":{ "description": "Type of the field. Can either be another nested StructType or primitive type", "type":"object/string" }, "nullable":{ "description":"If nulls are allowed", "type":"boolean" }, "metadata":{ "description":"Additional metadata to supply", "type":"object" }, "required":[ "name", "type", "nullable", "metadata" ] } } } } }
- Parameters
- jsondict or a dict-like object e.g. JSON object
This “dict” must have “fields” key that returns an array of fields each of which must have specific keys (name, type, nullable, metadata).
- Returns
Examples
>>> json_str = ''' ... { ... "fields": [ ... { ... "metadata": {}, ... "name": "Person", ... "nullable": true, ... "type": { ... "fields": [ ... { ... "metadata": {}, ... "name": "name", ... "nullable": false, ... "type": "string" ... }, ... { ... "metadata": {}, ... "name": "surname", ... "nullable": false, ... "type": "string" ... } ... ], ... "type": "struct" ... } ... } ... ], ... "type": "struct" ... } ... ''' >>> import json >>> scheme = StructType.fromJson(json.loads(json_str)) >>> scheme.simpleString() 'struct<Person:struct<name:string,surname:string>>'
- json()#
- needConversion()[source]#
Does this type needs conversion between Python object and internal SQL object.
This is used to avoid the unnecessary conversion for ArrayType/MapType/StructType.
- toNullable()[source]#
Returns the same data type but set all nullability fields are true (StructField.nullable, ArrayType.containsNull, and MapType.valueContainsNull).
New in version 4.0.0.
- Returns
Examples
Example 1: Simple nullability conversion
>>> StructType([StructField("a", IntegerType(), nullable=False)]).toNullable() StructType([StructField('a', IntegerType(), True)])
Example 2: Nested nullability conversion
>>> StructType([ ... StructField("a", ... StructType([ ... StructField("b", IntegerType(), nullable=False), ... StructField("c", StructType([ ... StructField("d", IntegerType(), nullable=False) ... ])) ... ]), ... nullable=False) ... ]).toNullable() StructType([StructField('a', StructType([StructField('b', IntegerType(), True), StructField('c', StructType([StructField('d', IntegerType(), True)]), True)]), True)])
- classmethod typeName()#