What is a UUID in MySQL?
A UUID is a Universal Unique Identifier specified by RFC 4122 (It is a Universally Unique Identifier URN Namespace) and 128-bit long value. It is designed in such a way that it generates a number which is unique globally according to space and time. we will get two different values, and even they were performed on two separate devices that are not connected. MySQL UUID is defined as “Universally Unique Identifier” which shows the usage of a Primary key for a table. Basically, it is described on the basis of RFC 4122 that is a URN Namespace i.e. Universally Unique Identifier (UUID). Here, we will show UUID as PK (Primary Key) and its pros and cons while application of it in MySQL query. UUID is considered as a numeral which is distinct worldwide in universe and time interval. If UUID are produced on two independent severs that are associated to each other then also the both UUID values are projected to be unique.
NOTE: Although UUID() always generates unique values, they are not guessable or predictable. It means it always returns a random number.
Structure of UUID
UUID in MySQL returns a value which is 128-bit long. It is represented in a human-readable format as a UTF8 string of five hexadecimal numbers in the below format:
- The first three numbers are generated as a part of timestamp format, such as low, middle, and high. Here, the high part contains the UUID version number.
- The fourth number is responsible for preserving the temporal uniqueness whenever the timestamp value loses monotonicity.
- The fifth number represents the IEEE 802 node number, which indicates spatial uniqueness. If the letter is not available, it will substitute the random number that does not guarantee the spatial uniqueness.
The following are the valid string format of the UUID values in MySQL which is an arrangement of 32 digit of the hexadecimal format along with four hyphens (-):
We can generate UUID values in MySQL using the function as follows:
mysql> SELECT UUID();
The above function returns a UUID value in agreement with UUID version 1 described in RFC 4122. After successful execution of the above statement, it will generate the UUID value as follows:
Say we have a table of users, and each user has a UUID. MySQL has a UUID() function, which makes MySQL generate a UUID value and return it in a human-readable form, of type VARCHAR(36). Let’s try it on MySQL 5.7.8:
|123456||mysql> select uuid();+————————————–+| uuid() |+————————————–+| aab5d5fd-70c1-11e5-a4fb-b026b977eb28 |+————————————–+|
So the first idea would be to simply do this:
|12||create table users(id varchar(36), name varchar(200));insert into users values(uuid(), ‘Andromeda’);|
But this human-readable form of UUID isn’t compact; let’s observe that:
- the fours dashes are superfluous
- each pair of characters is actually a hexadecimal number in the range 00-FF; that makes 16 numbers in total (above: 0xAA, 0xB5, etc), each of them can be stored in a byte.
|12||create table users(id_bin binary(16), name varchar(200));insert into users values(unhex(replace(uuid(),’-‘,”)), ‘Andromeda’);|
This binary form uses 16 bytes which is much smaller than VARCHAR(36) used by the human-readable form (which I’ll call the “text” form now). If the UUID has to be a primary key, the gain is even greater, as in InnoDB the primary key value is copied into all secondary index values. BINARY(16) is… well… just binary! No character set, no collation, just sixteen bytes. Perfect for our need. Perhaps the text form is still necessary in some application, so let’s keep it as an additional column in the table; but, to minimize disk occupation, let’s make the text form a virtual generated column (that’s a new feature of MySQL 5.7, described in the documentation of CREATE TABLE). This column will be calculated by a formula of the binary-form column: we convert the binary form back to hexadecimal digits and insert dashes.
|12345678910111213141516171819202122||create table users(id_bin binary(16),id_text varchar(36) generated always as (insert( insert( insert( insert(hex(id_bin),9,0,’-‘), 14,0,’-‘), 19,0,’-‘), 24,0,’-‘) ) virtual,name varchar(200)); insert into users (id_bin,name) values(unhex(replace(uuid(),’-‘,”)), ‘Andromeda’); select id_text, name from users;+————————————–+———–+| id_text | name |+————————————–+———–+| C2770D2E-70E6-11E5-A4FB-B026B977EB28 | Andromeda |+————————————–+———–+|
I didn’t include id_bin in the SELECT because it would come out as cryptic characters (of ASCII code 0xC2, 0x77, etc: generally not in the human-readable range of characters). There’s no reason we should need to look at the content of id_bin; but, if you do, you can visualize its hexadecimal codes by using HEX(id_bin).
Note that id_text is declared VIRTUAL, so takes no space in the table on disk.
Another benefit of making id_text a generated column, is that it eliminates any risk of inconsistency between the two columns. Indeed, if id_text were a plain column, one could do
|1||update users set id_bin = <something>;|
without updating id_text, accidentally. But as a generated column, id_text is never updatable directly: instead, it is automatically updated when one updates id_bin. In other words, information is in only one place (id_bin) and the database guarantees consistency.
Then, what about queries? for example, we could want to find a user by UUID:
|1||select * from users where <it has UUID XYZ>;|
Should the WHERE clause specify the binary or the text form? It depends:
- If we create an index over the binary form:
1alter table users add unique(id_bin);
then, for this index to be used, WHERE should specify the binary form:
1WHERE id_bin = binary_form_of_XYZ
- If instead we create an index over the text form:
1alter table users add unique(id_text);
then, WHERE should specify the text form:
1WHERE id_text = text_form_of_XYZ
Even though id_text is a virtual column, it is possible, as above, to add an index on it (in that case, the index does occupy disk space). That is a new feature introduced in MySQL . However, if we have a choice, as the binary form is shorter, it looks more logical to index it, not the text form – the index will be smaller and thus faster to traverse, faster to backup… Finally, there is the question of how to smartly re-arrange bytes in the binary form. To understand that, we need to learn more about UUIDs. They exist in several versions, and different sources can generate different versions. MySQL’s UUID() uses version 1, which implies, as explained in paragraph of the RFC, that the three leftmost dash-separated groups are a 8-byte timestamp: leftmost group is the low four bytes of the timestamp; second group is the middle two bytes, third group is the high (most significant) two bytes of the timestamp. Thus the leftmost group varies the fastest (10 times per microsecond). We can verify that:
|1234567891011||mysql> select uuid(); do sleep(2); select uuid();+————————————–+| uuid() |+————————————–+| 3b96402f-70c5-11e5-a4fb-b026b977eb28 |+————————————–++————————————–+| uuid() |+————————————–+| 3cc7f7dc-70c5-11e5-a4fb-b026b977eb28 |+————————————–+|
You can see how the 8 leftmost characters changed while the others did not.
So, in a sequence of UUIDs continuously generated by a single machine, all UUIDs have different first bytes. Inserting this sequence into an indexed column (in binary or text form) will thus modify a different index page each time, preventing in-memory caching. So it makes sense to re-arrange the UUID, making the rapidly-changing parts go last, before we store into id_bin. Again, note that this idea applies only to UUIDs of version 1.
Below, the binary form is re-arranged, by changing time-low/time-mid/time-high to time-high/time-mid/time-low.
|1234567891011||create table users(id_bin binary(16), name varchar(200)); set @u = unhex(replace(uuid(),’-‘,”)); insert into users (id_bin,name)values( concat(substr(@u, 7, 2), substr(@u, 5, 2), substr(@u, 1, 4), substr(@u, 9, 8)), ‘Andromeda’);|
I used a user variable above (@u), because each SUBSTR() invokation needs to reference the UUID value, but I cannot afford to write UUID() four times: it would generate a new UUID each time! So I call UUID() once, remove dashes, convert it to binary, store it in a variable, and do the four SUBSTR on it.
However, I still want the text form to be in “non-rearranged” order, because… perhaps this text form will be used for some error logging, debugging? If humans are to read it, I don’t want to confuse them by using a rearranged order.
Adding id_text can be done in CREATE TABLE, or as a follow-up ALTER TABLE:
|12345678910111213141516||alter table users addid_text varchar(36) generated always as( insert( insert( insert( insert( hex( concat(substr(id_bin,5,4),substr(id_bin,3,2), substr(id_bin,1,2),substr(id_bin,9,8)) ), 9,0,’-‘), 14,0,’-‘), 19,0,’-‘), 24,0,’-‘)) virtual;|
which takes the parts out of the binary form (with SUBSTR), puts them in the “normal” position (CONCAT), converts bytes to hexadecimal digits (HEX) and inserts dashes. Right, it’s a complex expression, but you type it only once, when creating the generated column.
Now look at the data:
|123456||select id_bin, hex(id_bin), name, id_text from users;+——————+———————————-+———–+————————————–+| id_bin | hex(id_bin) | name | id_text |+——————+———————————-+———–+————————————–+| �p�:�ˤ� &�w� | 11E570EA3A059CCBA4FBB026B977EB28 | Andromeda | 3A059CCB-70EA-11E5-A4FB-B026B977EB28 |+——————+———————————-+———–+————————————–+|
- the cryptic characters (rearranged binary UUID)
- the corresponding hex codes for the first column
- the name
- the non-rearranged text UUID. See how the leftmost 3A059CCB of this text form, the rapidly-changing part, is in the middle of the binary form (column 2), so the binary form will give more efficient indexing.
MySQL UUID vs. AUTO_INCREMENT PRIMARY KEY
UUIDs in MySQL are a good alternative to AUTO_INCREMENT PRIMARY KEY. The following are advantages of UUID over AUTO_INCREMENT PRIMARY KEY:
The following are the advantages of using UUID for a primary key:
- UUID values in MySQL are unique across tables, databases, and servers. It allows us to merge rows from distributed/different databases across servers.
- UUID values do not provide information about our data, which means it is hard to guess. Thus it is safe to use in URLs.
- UUID values can be generated offline means we can generate it anywhere without exchanging information with the database server.
- It also simplifies replication (logic in the application). For example, if we want to insert data into parent and child tables, we must first insert data into the parent table, get generated id, and then fill record into the child table. Using UUID, we can generate the PRIMARY KEY value of the parent table and insert rows into both tables simultaneously.
Besides the advantages, the following are the disadvantages of using UUID for a primary key:
- If we store the UUID (16-bytes) values in the databases, it occupies more space/storage than integers (4-bytes) or big integers (8-bytes).
- It makes the debugging more difficult. For example, we can imagine the expression WHERE id = ‘185e6dfd-1cc8-11eb-9a2c-107d1a24f935’ instead of WHERE id = 5.
- It can also cause performance issues because of the unordered values and their size.
MySQL UUID solution
We can overcome these issues in MySQL by using the functions given below. These functions allow us to store UUID values in a compact format (BINARY) and display them in a human-readable format (VARCHAR). The name of the functions are:
NOTE: It is to note that these functions are only available in the MySQL version 8.0 or later.
The UUID_TO_BIN() function is used to convert the UUID values from a human-readable format into a compact format for storing it in the databases.
The BIN_TO_UUID() function is used to convert the UUID from the compact format to a human-readable format for displaying.
The IS_UUID() function is used to validate the string format of UUID. It returns 1 when the argument is valid and returns 0 for an invalid argument. If the argument is NULL, it will return NULL.
MySQL UUID Example
Let us understand how to use the UUID with the help of an example. First, we will create a new table named employee using the below statement:
CREATE TABLE employee ( emp_id BINARY(16) PRIMARY KEY, name VARCHAR(255) );
Next, we need to insert the values into the table. Also, if we want to add the UUID values into the emp_id column, we must use the UUID() and UUID_TO_BIN() functions as follows:
INSERT INTO employee (emp_id, name) VALUES (UUID_TO_BIN(UUID()),'John Doe'), (UUID_TO_BIN(UUID()),'Johnny Dope'), (UUID_TO_BIN(UUID()),'Jason Gillespie');
Now, execute the SELECT statement to verify the inserted record.
Finally, we will query data from a UUID column using the BIN_TO_UUID() function that converts binary format to a human-readable format. See the below statement:
mysql> SELECT BIN_TO_UUID(emp_id) AS ID, name FROM employee;
We will get the output as follows:
MySQL UUID vs UUID(short)
UUID() and UUID(short) both are different functions in MySQL. The basic differences between them are discussed in the below comparison chart:
|A UUID is a Universal Unique Identifier specified by RFC 4122 and 128-bit long value represented as UTF8 string of five hexadecimal numbers.||This function produces a Short Universal Unique Identifier as a 64-bit unsigned integer, which differs from the string-format 128-bit identifiers produced by the UUID() function.|
|It produces a value that conforms to a 16 byte version1 UUID. The version 1 UUID is the server ID’s bitwise conglomeration, the current timestamp, a few bytes, and utility bits.||It’s returned value contains a server ID’s bitwise conglomeration, a fairly static time component, and a sequentially increasing 24-bit integer.|
|In the UUID, the server ID is 6 bytes long that makes the space unique.||In the UUID (short), the server ID is only one byte; that’s why it loses the space uniqueness.|