Towards Robust and Error-Free Python Code With Pydantic
Finally! Python autocomplete that actually works (like in C# and Java) and type hinting support.
Python is a dynamically typed language which means that type checking is performed at run-time (when executed). If there is an error in the code it will be thrown at execution. Languages such as Java, C# and C are statically typed meaning type checking is performed at compile-time. In this case, the error will be thrown before the program is run.
In a statically typed language, the type of constructs cannot be changed. The compiler needs to know the types beforehand. A variable declared as an int
in C for example cannot be changed to a string
later.
We can do this in Python however:
myVar = 1
myVar = "hello" #this works
This enhanced flexibility means that dynamically typed languages are slower at execution than statically typed ones. A lot of checking has to be done at run-time to figure out the type of variables and other constructs so that the program can be executed. This creates overhead.
Now that Python is the go-to language for machine learning there is an increasing use-case for developing APIs and web applications that serve machine learning models. It is a lot simpler to have a single language for creating models and wrapping them up in a user-facing application than a variety of languages.
However, for these full-stack applications the chances of type errors increase when type checking is performed at run-time rather than compile time. This is where Python type hinting helps. It allows us to declare types of programming constructs directly in our code.
First we look at basics of type hints.
Python type hints and code completion
To define a type hint for a function argument we can write a :
(colon) followed by the type after the variable name. For non-singular types such as Lists, Dicts, Sets we need to import the typing
package.
Let’s look at some code:
We define a function get_stuff()
that appends the provided item
to the item list fridge
.
Afterward, all items in the fridge
are capitalized.
The code works as expected returning the list of fruits:
['Apple', 'Grape', 'Pear', 'Orange']
Since we define fridge
to be a list of strings VS Code (with PyLance and Python extensions) provides instant code completion. If you type fridge.
notice how the suggestions pop up:
Similarly, as we have defined fridge
to be a list of strings, we can write x.
to see all operations possible on every item in the fridge which is a string:
As you can see, type hinting saves a ton of time as there is no need to go back and forth looking up methods and attributes from online documentation.
Pydantic Models
Data validation and settings management using python type annotations. pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid. Source: Pydantic
Although Python supports type hinting, this is not enforced. So passing an object of an incorrect type is still possible and would cause an error if an unsupported operation is attempted. For example attempting str
operations on an int
type. Pydantic is a Python library that enforces this, meaning it circumvents such errors.
Let’s see an example to consolidate this point.
Let’s say we get some bad input to our function and the fridge
contains an int
along with the strings
.
The rest of the code remains unchanged and we call get_stuff()
with the modified fridge
:
print(get_stuff("orange", ["apple", 1, "pear"]))
What happens?
We get the following runtime error:
Even though we declared x
to be of type str
the get_stuff()
function happily accepts a List with one int
element and toUpper()
attempts to call capitalize()
on the int
object.
At this point it may seem like the benefits of type hinting are limited to autocompletion only.
We can refactor the code to use Pydantic. We define a data model that inherits from a Pydantic BaseModel
. This is the main way to create data models in Pydantic.
Since this is a blueprint for how our data should be represented, we define it as a class.
Go ahead and install Pydantic with:
pip3 install pydantic
Then define a Frdige
class that inherits from BaseModel
like so:
from pydantic import BaseModel
class Fridge(BaseModel):
items: List[str]
We give the Fridge
class an attribute called items
which will be a list of strings.
We create an instance of a Fridge
and pass it as an argument when we call the get_stuff()
function.
The refactored code looks as follows:
If we now attempt to run it again you will notice the code is error free!
The int
gets casted to a string
object and appended to the list giving the following return object:
['1', 'Apple', 'Pear', 'Orange']
You will also notice that we pass a Python set
instead of a list
when we create an instance of a Fridge
object. Here again, Pydantic takes care of casting the set
to a list
!
You might be wondering what should be done if we do wish to have a list of mixed types such as a list that contains either strings or integers. For that we can use the Union
type annotation which acts like a logical OR.
For example the Fridge
would be as follows:
class Fridge(BaseModel):
items: List[Union[int, str]]
Passing the following list to Fridge
would now work:
[1, "apple", "orange", "pear"]
Please note that Pydantic gives precedence to the first type listed in the Union
. So if we had instead written:
class Fridge(BaseModel):
items: List[Union[str, int]]
Then the int
in the passed list would be casted to a string
even though int
appears in the type annotation. This would give (which is not what we want):
["1", "apple", "orange", "pear"]
Ok we have covered a lot of ground!
Pydantic really shines when it comes to modelling more complex data types.
For that we need to look at recursive models.
Recursive Models
It is also possible to define recursive models in Pydantic for more complex data models.
A recursive model is a model that contains another model as a type definition in one of its attributes.
So instead of List[str]
we could have List[Cars]
where Cars
would be a Pydantic model defined in our code.
Onto another example!
Let’s assume we also want to store the number of each fruit in the fridge. To do this, we create a Fruit
data model:
class Fruit(BaseModel):
name:str
num:int
In the Fridge
data model we can define the list to be a list of Fruits
instead of a list of ints
:
class Fridge(BaseModel):
items: List[Fruit]
The full code is as follows:
We call get_most_fruits()
with a Fridge
object containing a list of Fruit
objects. Pretty straightforward.
We wish to return the fruit with the highest number. Before doing operations on the list of fruit we use the jsonable_encoder()
method to convert the list into a JSON compatible type. If we hadn’t done this, then an element in the list would be of type Fruit
which cannot be operated on.
After the encoding stage, we get a list of dict
objects with key, value pairs corresponding to the name and num fields defined in the Fruit
class.
We can now sort this list and return the fruit with the highest number.
Conclusion
In this post we had a recap of dynamically and statically typed languages.
We looked at type hinting in Python and use of Pydantic to enforce the type hints.
To conclude, type hinting helps:
- Speed up software development through IDE autocompletion.
- Contribute to increased code quality by making code easier to understand and read.
- Improve coding style and overall software design.
- Create more robust and error-free software by reducing run-time errors.
Especially in large and complex software projects.
Hope you learned something useful in this post.
Next time we will look at FastAPI, a popular Python web framework that fully supports Pydantic.
Check out more articles here.