Blogs

Carry Out JSONField Schema Validation Using JSON Models

March 30, 2022

988

This article shows how JSONField schema can be both manually written and automatically generated, and entails modifying the same in the latter case.

Among the numerous fields that exist for the models in the Django REST framework, JSONField is one where data is often required to abide by a strict format (or) schema, if included in the data used for testing generated APIs. Surely one can manually write the entire schema, but what if there was a way to skip writing it yourself and have it generated? This is where JSON models come to the rescue. In this article, we will be using example data, represented in snippets, on the software ‘Insomnia’, a popular platform that helps design and test APIs. We will cover two scenarios — writing the schema ourselves, followed by using JSON models to have the schema handed to us without much manual work.

Initial setup
Before getting familiar with the measures that can be taken for the validation of JSONField entry in Django REST framework (not bringing a defined schema into the equation just yet), it can be pointed out that in addition to other nonJSON fields, a POST operation can successfully be carried out if the following are included:

An empty JSONField
A JSONField filled with various types of values

The following examples will help understand this better.

models.py
from django.db import models
class Student(models.Model):
student_name=models.CharField(max_length=100)
roll_no=models.IntegerField()
detailed_data=models.JSONField()

serializers.py

from rest_framework import serializers
from studentapp.models import Student

class StudentSerializer(serializers.ModelSerializer):
class Meta:
model=Student
fields=’__all__’

views.py

from studentapp.serializers import StudentSerializer
from rest_framework.views import APIView
from rest_framework.response import Response
from django.http import JsonResponse,HttpResponse
from rest_framework import status

class StudentView(APIView):
def post(self,request):
serializer=StudentSerializer(data=request.data)
if serializer.is_valid():
serializer.save()
return Response(serializer.data,status=status.HTTP_201_CREATED)#inside if serializer.is_valid()
return Response(serializer.errors,status=status.HTTP_400_BAD_REQUEST) #when serializer is not valid

urls.py

from django.contrib import admin
from django.urls import path,include
from studentapp.views import StudentView
urlpatterns = [
path(‘student/’,StudentView.as_view()),
]

As can be seen in the Student model above, we have three fields: a character field, an integer field and a JSONField. Let us enter three types of data in our JSONField: an empty value, a dictionary and a list of dictionaries. With the help of the code snippets given in Figures 1, 2 and 3, let’s see how the two points mentioned in the beginning come into play.

Figure 1: When entering an empty value into our JSONField ‘detailed_data’

Unlike our ‘StudentSerializer’, which strictly checks the presence of and types of values entered for the ‘student_name’ and ‘roll_no’ fields within our Student model (while merely checking if the JSONField ‘detailed_data’ exists regardless of the type of data entered for it, as observed in Figures 1, 2 and 3), we do not currently possess a defined format that our JSONField must follow. This is where the keyword ‘schema’ kicks in. Schema is the outline or the structure that the data within JSONField needs to abide by. If the data/values we enter into ‘detailed_data’ in Insomnia abide by the schema, then we can say that the validation of ‘detailed_data’ is successful. However, if the value entered for a field within the ‘detailed_data’ is of a type different from the one defined in the schema, or if any field(s) within ‘detailed_data’ is/are missing, it implies that we have encountered validation error(s). Let’s try to understand this by deep-diving into the ‘detailed_data’ field within our Student model.

Figure 2: When entering a key-value pair into our JSONField

Let’s follow the layout given below for the fields within our ‘detailed_data’ JSONField:

1. Date of joining — Date format

2. Mother’s name — Character type

3. Father’s name — Character type

4. Teacher information — A dictionary comprising: Name, Date of birth, Department, Date of joining

5. Semester 1 subjects — A list of dictionaries, with each dictionary comprising subject name, score

6. Semester 2 subjects — Another list of dictionaries with the same structure and subfields as mentioned in point 3.

Figure 3: When entering a list of dictionaries

Given the above, our objective is to first check if the serialisation of ‘student_name’ and ‘roll_no’ is valid, followed by checking for the validity of the contents of ‘detailed_data’ by comparing with the schema. We can do either of the following: manually write the schema or use JSON models. The latter is recommended since it auto-generates the schema, but to get an idea of what the schema looks like, we will go through both methods.

Methods of schema validation

Writing the schema manually
To map our schema onto the target data that we want to enter in the JSONField, consider Figure 4.

The schema suitable to fit the type of data is as follows:

{
“type”:”object”,
“properties”: {
“date_of_joining”: {“type”: “string”,”pattern”:’^[0-9]{2}-[0-9]{2}-[0-9]{4}$’},
“mother_name”:{“type”:”string”},
“father_name”:{“type”:”string”},
“teacher_information”: {
“type”:”object”,
“properties”:{
“name”: {“type”: “string”},
“date_of_birth”: {“type”:”string”,”pattern”:’^[0-9]{2}-[0-9]{2}-[0-9]{4}$’},
“department”:{“type”:”string”},
“joining_date”:{“type”:”string”,”pattern”:’^[0-9]{2}-[0-9]{2}-[0-9]{4}$’}
},
“required”:[“name”,”date_of_birth”,”department”,”joining_date”], --(1)
“additionalProperties”: False
},
“semester_1_subjects”:{
“type”:”array”,
“items”:{
“type”:”object”,
“properties”:{
“subject_name”:{“type”:”string”},
“professor”:{“type”:”string”},
“score”:{“type”:”integer”}
},
“required”:[“subject_name”,”professor”,”score”], --(2)
“additionalProperties”:False
}
},
“semester_2_subjects”:{
“type”:”array”,
“items”:{
“type”:”object”,
“properties”:{
“subject_name”:{“type”:”string”},
“professor”:{“type”:”string”},
“score”:{“type”:”integer”}
},
“required”:[“subject_name”,”professor”,”score”], --(3)
“additionalProperties”:False
}
},
“required”:[“date_of_joining”,”mother_name”,”father_name”,”teacher_information”,”semester_1_subjects”,”semester_2_subjects”], --(4)
“additionalProperties”:False
}

The list of fields which serve as ‘value’ to the ‘required’ key in line (1) imply that those fields (within teacher_information) MUST be included. Even if one of them is missed, a validation error occurs. The same applies for lines (2) and (3). However, line (4) deals with the outer layer of fields. Lines (1), (2) and (3) deal with the compulsory inclusion of fields within ‘teacher_information’, ‘semester_1_subjects’ and ‘semester_2_subjects’. Line (4) also deals with the same, but for the inclusion of the aforementioned fields on the whole.

Next, we include our schema in views.py and use Draft7Validator to compare what is being entered in the defined schema.

Updated views.py

import jsonschema
from jsonschema import Draft7Validator

class StudentView(APIView):
def post(self,request):
data=request.data ------(1)
serializer=StudentSerializer(data=request.data)
if serializer.is_valid():
myschema= #insert above defined schema here
v=Draft7Validator(myschema) -------(5)
if len(list(v.iter_errors(data[“detailed_data”])))!=0:
return Response({“error”:str(list(v.iter_errors(data[“detailed_data”])))})
else:
serializer.save()return Response(serializer.data,status=status.HTTP_201_CREATED) -----(10)
else:
return Response(serializer.errors,status=status.HTTP_400_BAD_REQUEST)

The aim of line (1) is to obtain the data entered by the user and store it in a variable, namely, ‘data’. In line (2), we aim to map this data against the Student model via our StudentSerializer and check for the presence of ‘student_name’, ‘roll_n’ and ‘detailed_data’ fields, while also validating the type of values entered for the first two fields. If the ‘serializer’ is valid, i.e., if none of the three fields are missing and there is no objection to the type of values entered for the former two fields, we check the data within the ‘detailed_data’ JSONField. After assigning the manually-written schema to ‘myschema’ in line (4), line (5) passes the schema to Draft7Validator and the same is assigned to ‘v’. The ‘v.iter_errors(data[“detailed_data”])’ in line (6) intends to point out any validation errors, should they exist, such as missing fields, wrong value entered for fields, etc. We simply append these errors in a list as ‘elements’ and check the length of the list. If the length is 0, it means the data entered in the JSONField abides by the schema and validation is successful. If not, the validation errors are displayed as a list, serving as ‘value’ to the ‘error’ key mentioned in line (7).

Figure 5: An example of a flawless POST operation

In Figure 5, we can see that values for all the fields are exactly of the types written in the schema. Additionally, there are no missing fields, which results in the particular entry being saved without any validation errors. Let’s try making a few changes to see how the validation errors are shown. Let’s not include the first ‘professor’ key-value pair in ‘semester_2_subjects’. Do the same for the second ‘subject_name’ within ‘semester_1_subjects’. Let’s write an integer type value for ‘department’ within ‘teacher_information’ and a wrong pattern for ‘date_of_joining’.

We have covered the various types of validation errors. This method requires us to write out the schema manually, but what if you didn’t want to do so and what if it could be done for you instead? This is where JSON models come into play.

Using JSON models
To generate the previous schema, the following additions have to be made in models.py.

models.py

from jsonmodels import models as m1,fields

class teacherField(m1.Base):
name=fields.StringField(required=True)
date_of_birth=fields.DateField(required=True)
department=fields.StringField(required=True)
joining_date=fields.DateField(required=True)
class sem1Field(m1.Base):
subject_name=fields.StringField(required=True)
professor=fields.StringField(required=True)
score=fields.IntField(required=True)
class sem2Field(m1.Base):
subject_name=fields.StringField(required=True)
professor=fields.StringField(required=True)
score=fields.IntField(required=True)
class detailed_data_field(m1.Base):
date_of_joining=fields.DateField(required=True)
mother_name=fields.StringField(required=True)
father_name=fields.StringField(required=True)
teacher_information=fields.EmbeddedField(teacherField)
semester_1_subjects=fields.EmbeddedField(sem1Field)
semester_2_subjects=fields.EmbeddedField(sem2Field)

We are treating the distinct fields within ‘detailed_data’, notably ‘teacher_information’ (a dictionary), ‘semester_1_subjects’ and ‘semester_2_subjects’ (both are lists of dictionaries), as separate JSON models and then giving a call to them using the embedded field within the outlying ‘detailed_data_field’ JSON model. The ‘EmbeddedField’ mentioned in the last three lines above performs the same function as that of foreign keys in databases.

Figure 6: Display of list of validation errors

Updated views.py

from marksapp1.models import detailed_data_field ----(1)
class StudentView(APIView):
def post(self,request):
data=request.data
serializer=StudentSerializer(data=request.data) ----(5)
if serializer.is_valid():
data1=detailed_data_field()
myschema=data1.to_json_schema()
print(myschema)
v=Draft7Validator(myschema) ----(10)
if len(list(v.iter_errors(data[“detailed_data”])))!=0:
return Response({“error”:str(list(v.iter_errors(data[“detailed_data”])))})
else:
serializer.save()
return Response(serializer.data,status=status.HTTP_201_CREATED) ----(15)
else:
return Response(serializer.errors,status=status.HTTP_400_BAD_REQUEST)

In line (7), our overall JSON model ‘detailed_data_field’ is assigned to variable ‘data1’, followed by schema generation of the same in line (8). A glimpse of the schema generated is seen below.

{ ----(1)
‘type’: ‘object’,
‘additionalProperties’: False,
‘properties’: {
‘date_of_joining’: {‘type’: ‘string’}, ----(5)
‘father_name’: {‘type’: ‘string’},
‘mother_name’: {‘type’: ‘string’},
‘semester_1_subjects’: {‘type’: ‘object’,
‘additionalProperties’: False,
‘properties’: {‘professor’: {‘type’: ‘string’}, ----(10)
‘score’: {‘type’: ‘number’},
‘subject_name’: {‘type’: ‘string’}
},
‘required’: [‘professor’, ‘score’, ‘subject_name’]
}, ----(15)
‘semester_2_subjects’: {‘type’: ‘object’,
‘additionalProperties’: False,
‘properties’: {‘professor’: {‘type’: ‘string’},
‘score’: {‘type’: ‘number’},
‘subject_name’: {‘type’: ‘string’} ----(20)
},
‘required’: [‘professor’, ‘score’, ‘subject_name’]
},
‘teacher_information’: {‘type’: ‘object’,
‘additionalProperties’: False, ----(25)
‘properties’: {‘date_of_birth’: {‘type’: ‘string’},
‘department’: {‘type’: ‘string’},
‘joining_date’: {‘type’: ‘string’},
‘name’: {‘type’: ‘string’}
}, ----(30)
‘required’: [‘date_of_birth’, ‘department’, ‘joining_date’, ‘name’]
}
},
‘required’: [‘date_of_joining’, ‘father_name’, ‘mother_name’]}

We can observe in the auto-generated schema that for ‘semester_1_subjects’ and ‘semester_2_subjects’, the ‘type’ is ‘object’, whereas it should actually be ‘array’. Among the fields that exist for JSON models, there are none that help develop a type:’array’ in the schema. To visualise this, let us compare the ‘semester_1_subjects’ structure from the auto_generated schema with the manually written one.

Auto-generated schema

‘semester_1_subjects’: {‘type’: ‘object’,
‘additionalProperties’: False,
‘properties’: {‘professor’: {‘type’: ‘string’},
‘score’: {‘type’: ‘number’},
‘subject_name’: {‘type’: ‘string’}
},
‘required’: [‘professor’, ‘score’, ‘subject_name’]
},

Manually written schema

Figure 8: Successful validation of JSONField schema

“semester_1_subjects”:{“type”:”array”,
“items”:{“type”:”object”,
“properties”:{
“subject_name”:{“type”:”string”},
“professor”:{“type”:”string”},
“score”:{“type”:”integer”}
},
“required”:[“subject_name”,”professor”,”score”],
“additionalProperties”:False
}
},

We need to make the auto-generated structure of ‘semester_1_subjects’ look like that of a manually written schema. All we have to do is make the entire dictionary of the auto-generated schema serve as a ‘value’ to a key titled ‘items’ within the dictionary, which acts like a ‘value’ to ‘semester_1_subjects’, followed by adding a key-value pair ‘type’:’array’ to the dictionary, i.e., ‘value to ‘semester_1_subjects’. Hence, the structure of ‘semester_1_subjects’ will look like that defined in the manually written schema. We do the same for ‘semester_2_subjects’. We also need to append ‘semester_1_subjects’, ‘semester_2_subjects’ and ‘teacher_information’ to the ‘required’ list mentioned towards the end of the auto-generated schema. The ‘pattern’ for the date-type fields needs to be included as well. Let us take a look at the new view, which will factor in all the aforementioned changes.

class StudentView(APIView): ----(1)
def post(self,request):
data=request.data
serializer=StudentSerializer(data=request.data)
if serializer.is_valid(): ----(5)
data1=detailed_data_field()
myschema=data1.to_json_schema()
myfields=[‘semester_1_subjects’,’semester_2_subjects’]
for i in myfields:
d={} ----(10)
d[“items”]=myschema[“properties”][i]
myschema[“properties”][i]=d
myschema[“properties”][i][“type”]=”array”
myschema[“required”].extend([“teacher_information”,”semester_1_subjects”,”semester_2_subjects”])
myschema[“properties”][“date_of_joining”][“pattern”]=’^[0-9]{2}-[0-9]{2}-[0-9]{4}$’ ----(15)
myschema[“properties”][“teacher_information”][“properties”][“joining_date”][“pattern”]=’^[0-9]{2}-[0-9]{2}-[0-9]{4}$’
myschema[“properties”][“teacher_information”][“properties”][“date_of_birth”][“pattern”]=’^[0-9]{2}-[0-9]{2}-[0-9]{4}$’
v=Draft7Validator(myschema)
if len(list(v.iter_errors(data[“detailed_data”])))!=0:
return Response({“error”:str(list(v.iter_errors(data[“detailed_data”])))}) ----(20)
else:
serializer.save()
return Response(serializer.data,status=status.HTTP_201_CREATED)
else:
return Response(serializer.errors,status=status.HTTP_400_BAD_REQUEST) ----(25)

The section of views from lines (10) to (13) aims to convert the structure of ‘semester_1_subjects’ and ‘semester_2_subjects’ from ‘type’:’object’ to ‘type’:’array’. Line (14) intends to append ‘teacher_information’, ‘semester_1_subjects’ and ‘semester_2_subjects’ to the outermost ‘required’ list. Lines (15) to (17) help include a ‘pattern’ key-value pair in the schema structure for date-related fields.

So you now have a basic understanding of how the structure of data of a JSONField within a Django REST framework model can be cross-checked against a well-defined schema. Covering the lengthier method, i.e., writing the schema yourself, was essential to understand how a schema looks. You also have an idea of how JSON models help generate the schema for you, and even though the auto-generated schema is not an exact reflection of the schema you may write yourself, you can manipulate it to suit your needs utilising a fundamental data structure in Python: dictionaries, and its basic operations.

Carry Out JSONField Schema Validation Using JSON Models

Methods of schema validation

NO COMMENTS

LEAVE A REPLY Cancel reply

Methods of schema validation

RELATED ARTICLES

Kubernetes: A Dependable and Popular Platform

Upsampling And Downsampling: Correcting The Imbalances In Data

Open Source Software Adoption on the Rise: Insights from a Survey

NO COMMENTS

LEAVE A REPLY Cancel reply