MySQL SET

If you are interested learn about the MySQL Enum

SET is a data type of String object that can hold zeterro or more, or any number of string values. They must be chosen from a predefined list of values specified during table creation. It is one of the rarely used data type in the MySQL database. It is referred to as a complex type because its implementation increases the complexity.

The SET and ENUM data types work in a similar way because MySQL uses it with a predefined list of values. But one thing makes it different that are ENUM data type can hold only a single member of the predefined list of values, whereas SET data type allows us to store zero or any number of values together. If we define the SET column values with multiple set members, then it is required to separate it using the comma (,) operator. For example, we have defined a column as:

SET (''one'', ''two'') NOT NULL  

Then, our column can have any of the following values:

''  
'one'  
'two'  
'one,two'  

Why set is used in MySQL?

The MySQL SET datatype allows us to compare multiple values without using complex JOIN operations. We can manipulate the set with binary functions to do complex comparisons by comparing bit values on a single column instead of comparing multiple rows of multiple tables.

What is data set in SQL?

A dataset is a snapshot of all the information in a database at a given moment in time. The data in a dataset is further segmented into structures called tables. A table contains information that goes together. For example, all of the people in an address book could go in a table called Contacts.

Storage of SET data type

A SET data type can store a maximum of 64 distinct members in its column. If the SET column contains duplicate values, it throws an error when the strict SQL mode is enabled. When we create a table and if the SET member values in the table definition have spaces, then they are automatically deleted from the table definition after the execution of the CREATE statement. The SET data type is stored as a numeric value within the MySQL tables. It occupies from one to eight( i.e., 1, 2, 3, 4, 8) bytes, which depends on the number of elements contains in a SET column. The following table explains the storage requirement for the SET data type in MySQL:

No. of ElementsNo. of Bytes
1-81
9-162
17-243
25-324
33-648

In MySQL, the SET element stored in the bitmap where each member represented by a single bit value. Let us look at the following table that illustrates it more clearly:

CREATE TABLE myset_test(  
    Id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,  
    Myset SET('Java','Python','Android','PHP')  
  );  

In the above definition, each of the set members assigned a single bit that have following decimal and binary values:

SET ElementDecimal ValueBinary Value
Java10001
Python20010
Android40100
PHP81000

Now, if we assign a value 9 to this column, in terms of binary, it will be 1001. So, it selects the first and fourth SET elements, and the resulting value is ”Java, PHP”, which will be the fifth member of the SET column. Sometimes our SET data type column has more than one element; in that case, there is no need to insert the value in a specific order. It means the element can be in any order. MySQL also allows us to insert the same element many times in the SET data type column. When we retrieved the value, it will return each element once based on the order in which they were listed during the time of table creation. Let us understand it with the help of following example where we are going to create a table named “myset_test” that contains a column SET(‘Java’, ‘Python’, ‘Android’, ‘PHP’):

mysql> CREATE TABLE myset_test (  
  Id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,   
  Myset_col SET('Java', 'Python', 'Android', 'PHP')  
);  

Next, we need to insert the values into the table:

INSERT INTO myset_test(Myset_col) VALUES ('Java,PHP'), ('PHP,Java'),   
('Java,PHP,Java'), ('Java,PHP,PHP'), ('PHP,Java,PHP');  

Now, when we retrieve the value from the table, we will get all these values to appear as “Java, PHP” so execute the below statement to see this:

Mysql> SELECT * FROM myset_test;

In the output, we can see that our SET column have only (Java, PHP) values:

MySQL set

How to update SET data

MySQL SET data type can updates the data in three ways, which are given below:

1. By completely replacing the SET data

If we want to replace the complete data into a SET, it is required to specify the new values. Execute the following statements where the first query updates row = 4 with values = ‘Java, Python’ and second query updates row = 5 with values = 11. Here, 11 means 8+2+1=Java, Python, and PHP.

mysql> UPDATE myset_test SET myset_col = 'Java, Python' WHERE Id = 4;  
mysql> UPDATE myset_test SET myset_col = 11 WHERE Id = 5;  

After the successful execution, execute the below command to verify the output:

mysql> SELECT * FROM myset_test;  

Output

MySQL set

2. By adding SET members

Sometimes, there is a need to add the element into an existing SET column, then a CONCAT() or CONCAT_WS() function allows us to insert a new element in the comma-separated list. Execute this statement to understand it more clearly:

mysql> UPDATE myset_test SET myset_col = CONCAT(myset_col, ", Python") WHERE Id = 3;  

After the successful execution, execute the below command to verify the output:

mysql> SELECT * FROM myset_test;  

Output

MySQL set

3. By removing SET members

If you want to drop any element from the existing SET, then a REPLACE() function can be used to remove the element SET column. Execute this statement to understand it more clearly:

mysql> UPDATE myset_test SET myset_col = REPLACE(myset_col, "Python", "") WHERE Id = 4;  

Again, if you use the decimal value to remove the element from an existing set, use the bitwise AND(&) operator and bitwise NOT(~) operator. Execute the below statement:

mysql> UPDATE myset_test SET myset_col = myset_col & ~2 WHERE Id = 4;  

After the successful execution, execute the below command to verify the output:

mysql> SELECT * FROM myset_test;  

Output

MySQL set

If we want to see the integer values corresponding to the SET member, then we need to add the “+0” with the column name. Let us see the following example:

Mysql> SELECT Id, myset_col, myset_col+0 FROM myset_test;  

When we execute the above statement, the following result appears where integer values corresponding to the SET member included:

MySQL set

Disadvantages of Using SET data type

The SET data type does not recommend us to use it in our database because of the following disadvantages:

  • The SET data type limits us to 64 members in the column.
  • We cannot include commas in the SET elements.
  • The SET data is not normalized.
  • The SET data type has only one index that represents the whole set data. So, we cannot search for a particular element or member of the SET.

Why we use SET data type

The following are reasons to use the SET data type in MySQL:

  • It can handle multiple values without much trouble.
  • It allows us to compare multiple values without using complex JOIN operations.
  • Its schema is simple that uses only one column instead of three different tables (for example, Person, Interest, Link the persons to particular interest) to store elements.
  • It allows us to use binary functions for complex comparisons by comparing bit values on a single column.

INSERTing SET Data

There are two ways to INSERT data into MySQL SET columns: by decimal value or by using strings. To INSERT the combination of Travel and Sports into our sample tables, we can use the following:

  INSERT INTO set_test(myset) VALUES(3);
  INSERT INTO set_test(myset) VALUES('Travel,Sports');

Because the decimal value of Travel is 1 and the decimal value of Sports is 2, the combination of the two can be inserted at the sum of the decimal values, or 3. Another example of matching INSERT statements:

  INSERT INTO set_test(myset) VALUES(7);
  INSERT INTO set_test(myset) VALUES('Travel,Sports,Dancing');

Updating SET Data

There are three kinds of UPDATE that we need to deal with: complete replacement of the SET data, addition of set members, and removal of set members. We will deal with these in the order listed above.

To completely replace the data in a SET, we simply specify the new values:

  UPDATE set_test SET myset = 'Travel,Dining' WHERE rowid = 1;
  UPDATE set_test SET myset = 2 WHERE rowid = 2;

To add an element to an existing SET we use the CONCAT() function to add the new element to our comma separated list. To work with decimal values, we can use the bitwise OR operator |.

  UPDATE set_test SET myset = CONCAT(myset,",Travel")
      WHERE rowid = 3;
  UPDATE set_test SET myset = myset | 1 WHERE rowid = 3;

We can also use the CONCAT_WS() function, which handles list separators for us:

  UPDATE set_test SET myset = CONCAT_WS(',',myset,'Dancing')
      WHERE rowid = 6;

To remove set elements from an existing set, we use the REPLACE function to remove the element. If using decimal values, we use a combination of the bitwise AND operator & with the bitwise NOT operator ~.

  UPDATE set_test SET myset = REPLACE(myset,'Dancing','')
      WHERE rowid = 6;
  UPDATE set_test SET myset = myset & ~2 WHERE rowid = 6;

With the bitwise operators we can add and remove multiple items at once by using the sum of the decimal values of the elements we wish to remove.

SELECTing SET Data

There are several ways we can select data from a SET column, but they come down to two distinct types: SELECT queries that have a specific WHERE clause, and SELECT queries that have pattern matching in the WHERE clause. Before discussing the different WHERE clauses, it would probably be good to cover what the data will look like as it is returned from the SELECT query. If the column is merely SELECTed, you will get a comma separated list of the elements involved in your SET. If you add a +0 to the column name, you will see the integer values in the columns.

  mysql> SELECT rowid, myset, myset+0
      -> FROM set_test;
  +-------+-----------------------+---------+
  | rowid | myset                 | myset+0 |
  +-------+-----------------------+---------+
  |     1 | Sports                |       2 |
  |     2 | Travel,Sports         |       3 |
  |     3 | Travel,Dancing        |       5 |
  |     4 | Travel,Sports         |       3 |
  |     5 | Travel,Sports,Dancing |       7 |
  |     6 | Travel,Dancing        |       5 |
  |     7 | Sports                |       2 |
  |     8 | Travel,Dancing        |       5 |
  +-------+-----------------------+---------+
  8 rows in set (0.00 sec)

All of the SELECTs we will be doing will be based on one of these two values. Let’s look at WHERE clauses with specific values. When searching against a specific set of values, we can specify them in our where clause:

  SELECT * FROM set_test WHERE myset = 'Travel,Sports';
  SELECT * FROM set_test WHERE myset = 3;

To check for items with a specific element, we can use a couple of methods:

  SELECT * FROM set_test WHERE myset LIKE '%Sports%';
  SELECT * FROM set_test WHERE myset & 2;

It should be noted that the bitwise AND exhibits an OR-like behavior when used with combinations of values. Take for example the following query, which involves Travel (1) and Sports (2).

  SELECT * FROM set_test WHERE myset & 3;

This query actually returns rows that contain Travel OR Sports, and not just rows that contain both. For rows with have both Travel AND Sports, we need to also check the result of our bitwise AND (&) operation against the value we are performing the bitwise AND (&) against, or use an AND clause in our query:

  SELECT * FROM set_test WHERE myset & 3 = 3;
  SELECT * FROM set_test WHERE myset & 1 AND myset & 2;

This returns all rows that contain Travel AND contain Sports, and also returns the combination of the two. The & 3 = 3 syntax is preferable as it does not require an increasing number of AND clauses.

We can also use multiple wildcard queries like the following:

  SELECT * FROM set_test WHERE myset LIKE '%Travel%Sports%';

It should be noted that the elements must be in order (that is, in the order they are listed in the initial table creation) as order is important in the multiple wildcard queries. For example ‘%Travel%Sports%’ will return rows, but ‘%Sports%Travel%’ will not. Another way to retrieve rows is to use the FIND_IN_SET() function as follows:

  SELECT * FROM set_test WHERE FIND_IN_SET('Sports',myset) > 0;

FIND_IN_SET() will return 0 if there is no match and a number greater than zero if there is a match. NULL will be returned if either the set or the search string is NULL.

Clever Bit Tricks

We can take advantage of the bitmap approach to storing SET data to rank our users in order of matching interests to help our users find people who best match their own interests, thus allowing you to find users who also like Sports, Travel, and Dancing (or at least like some of the things you do). Here’s how it works: each of our users has their interests stored in the SET datatype defined above. We want to rank their fellow users by how well their interests overlap. First let’s look at the interests in binary form using the BIN() function (we use LPAD() to add leading zeroes):

  mysql> SELECT rowid, myset, LPAD(BIN(myset+0),4,'0') AS binset
      -> FROM set_test;
  +-------+-----------------------------------+--------------+
  | rowid | myset                             | binset       |
  +-------+-----------------------------------+--------------+
  |     1 | Travel,Sports,Dancing             | 0111         |
  |     2 | Travel,Dancing                    | 0101         |
  |     3 | Travel                            | 0001         |
  |     4 | Dancing                           | 0100         |
  |     5 | Dancing                           | 0100         |
  |     6 | Sports,Dancing                    | 0110         |
  |     7 | Travel,Sports,Dancing,Fine Dining | 1111         |
  |     8 | Travel,Fine Dining                | 1001         |
  |     9 | Sports,Fine Dining                | 1010         |
  |    10 | Travel,Dancing,Fine Dining        | 1101         |
  +-------+-----------------------------------+--------------+
  10 rows in set (0.00 sec)

What we want to do is order our table by the number of matching interests. For our example, we will order our users by how well they match with user 8, who is interested in Travel, Dancing, and Fine Dining.

  mysql> SELECT A.rowid, A.myset, BIN(A.myset+0 & B.myset+0)
      -> FROM set_test AS A, set_test AS B
      -> WHERE b.rowid = 8
      -> AND A.rowid != 8
      -> ORDER BY BIT_COUNT(A.myset & B.myset) DESC;
  +-------+-----------------------------------+---------+
  | rowid | myset                             | matches |
  +-------+-----------------------------------+---------+
  |     7 | Travel,Sports,Dancing,Fine Dining |       2 |
  |    10 | Travel,Dancing,Fine Dining        |       2 |
  |     1 | Travel,Sports,Dancing             |       1 |
  |     2 | Travel,Dancing                    |       1 |
  |     3 | Travel                            |       1 |
  |     9 | Sports,Fine Dining                |       1 |
  |     4 | Dancing                           |       0 |
  |     5 | Dancing                           |       0 |
  |     6 | Sports,Dancing                    |       0 |
  +-------+-----------------------------------+---------+
  9 rows in set (0.00 sec)

This results in our users being organized by how well they match with user 8. The BIT_COUNT() function in the above example will return the number of set bits in the expression passed to it. Our first two matches (users 7 and 10) have two overlapping interests, the next four have one match, and the remaining three users have no overlapping interests (they could be trimmed out by adding the following to the WHERE clause:

  AND BIT_COUNT(A.myset & B.myset)

Because of the functions being performed on the rows, this type of query cannot benefit from an index. This means that the entire table will have to be scanned to find matches to user 8. However, as this is a bitwise operation, it should be quite efficient in spite of the full table scan (I was able to perform this operation on a randomly generated table of 500,000 rows and have 250,000 rows returned in 1.89 seconds on my 1.6 Ghz PC). By comparison, performing this operation on the normalized three table schema I described above would require a three table JOIN with N AND clauses in the WHERE section of the query, one for every interest user 8 possessed.

Retrieving Set Members

To retrieve a list of the set members, use the DESCRIBE syntax as follows:

  mysql> DESCRIBE set_test myset;
  +-------+-------------------------------------------+------+-----+---------+-------+
  | Field | Type                                      | Null | Key | Default | Extra |
  +-------+-------------------------------------------+------+-----+---------+-------+
  | myset | set('Travel','Sports','Dancing','Dining') | YES  |     | NULL    |       |
  +-------+-------------------------------------------+------+-----+---------+-------+
  1 row in set (0.00 sec)

If working programmically, you need to use the second column and strip the ‘set(‘ and ‘)’ to have a comma separated list, which most programming languages can break into an array automatically (in Visual Basic you would use the split function).

Conclusion

The MySQL SET datatype is not the perfect solution for all MySQL databases, but can be quite powerful when appropriate. If you need to track less than 64 attributes for a given entity and make comparisons between different entities, the MySQL SET datatype may be ideal for your needs.

MySQL SET
Show Buttons
Hide Buttons