1. 程式人生 > >Using UUIDs as primary keys

Using UUIDs as primary keys

Using UUIDs as primary keys

A Guide for Django and Postgres

If you’re designing a REST API, auto incremented primary keys can be a threat. They expose a lot of informations about your API and the internal structure. UUIDs can help to cover these information and make your API more secure. In the following I’m going to explain what a primary key is and what problems can occur with auto incremented primary keys.

1. What is a primary key?

A primary key is used in a relational database to uniquely identify a dataset. There are two types of primary keys: natural primary keys and surrogate primary keys.

Natural primary keys

Natural primary keys are made of the data that already exsist in the table. For example:

Here you could use the phone number as a natural primary key since it’s very likely that this is a unique value.

Surrogate primary keys

Surrogate primary keys have nothing to do with the rest of the data. They’re “artificial” identifiers for a dataset. Most relational databases using auto incremented integer numbers as a surrogate primary key. For example:

2. Why auto incremented primary keys are bad

At a first glance a auto-incremented primary key seems to be very convenient. In most cases the database handles the auto-increment and you only have to save your records in the database to get back the primary key, but if you’re working on a public API you don’t want this internal information to get to the outside world. For example, we have a REST API with endpoints like:

/users/ # list all users
/users/1234 # get the user with the id 1234

As you see the internal auto-incremented primary key is part of our URL. If someone now creates a new user account he would know how many users are in my database. If he would do this for multiple samples over time, he has the possibility to compute the growth of users. In some businesses these are very important and confidential informations that you don’t want to be public. Also if there is a lack of permissions in the REST API an attacker could scrape together all of my users’ information by just decrementing the auto-incremented primary key in the URL.

Another more design specific argument is that if you trying to insert multiple records into the database you have to make multiple database calls to get the auto-incremented primary key for every record. Further, you don’t have unique identifiers across tables since every new table starts to count from 1 to identify their records. This can cause problems, especially if you are trying to migrate data from development to staging or from staging to production.

3. What is a UUID?

UUID (Universally Unique Identifier) is a 128-bit value used to identify information. Normally it looks like this format:

5bec9289–3a5e-436d-97ad-aff4722d61c7

We are going to use UUIDv4. These (pseudo-) random numbers will uniquely identify all of our records across multiple tables and databases. The pros of UUIDs are that they don’t reveal information about your data like auto-incremented primary keys. Further, you know your primary key before you insert it into the database since we have to create it. Django and Postgres offer a convenient and simple way to realize UUIDs as primary keys.

4. How to implement with Django and Postgres

If you search in the Postgres documentation for UUID you’ll find this:

“PostgreSQL provides storage and comparison functions for UUIDs, but the core database does not include any function for generating UUIDs”

Postgres got its own built-in uuid data type, and creates a regular B-tree index on it. So you don’t have to be afraid that there’s a lack of performance. But like I’ve mentioned before you have to generate the UUID somewhere else. We are going to generate this UUID in Django. Django offers a UUIDField to save UUID values. If you instantiate a UUIDField with the parameter primary_key = TruePostgres uses this UUID as your primary key. Also the generation of the UUID is done in the parameters of UUIDFields. If you import the Python standard library uuid and call the function uuid.uuid4()on the default parameter you’re going to create a new UUID for your model. In order to make sure no one is able to edit your newly created UUID you set editable = False.

The complete code looks like this:

This is an simple way to use UUIDs as your primary keys and to avoid the problems I’ve mentioned above.

Thank you for reading!