Introduction to MySQL Collation

A collation in MySQL is a set of rules used to compare the characters in a specific character set. It is a sequence of orders to any particular set. MySQL supports various character sets, and each character set always uses one or more collation, at least one default collation. MySQL does not allow us to have any two character sets use the same collation.

We can use the below statement to see all default collations of character sets available in MySQL database server:

SHOW <strong>CHARACTER</strong> <strong>SET</strong>;  

It will return the following output:

MySQL Collation

In this image, the Default Collation column values specify the default collations name for each character sets.

It is to note that a collation for any character set always starts with the character set name and ends with _cs (case sensitive), _ci (case insensitive), or _bin (binary). By default, the above statement (SHOW CHARACTER SET) produces all available collations in MySQL. If we want to get the all collation name for any specific character set, use an optional LIKE or WHERE clause with the SHOW COLLATION statement that returns the matched character set names. Following is the syntax used to find collation names for specific character set:

SHOW COLLATION LIKE 'character_set_name%';  

The below statement displays all collation names for latin1 character set in MySQL:

SHOW COLLATION LIKE 'latin1%';  

We can see the output as follows where each character set has at a default collation such as latin1_danish_ci is the default collation for the latin1 character set:

MySQL Collation

The collation also provides us to sort the character string. It is performs ordering of the character based on the weights. Each character available in a character string maps to a weight. If the weights of characters comparison are the same, it is called as equal. Again, if the weights of characters are different, it performs comparison based on the relative magnitude of their weights. MySQL provides the WEIGHT_STRING() function to find the weights for the string’s character. This function returns the value that indicates weights as a binary string. Therefore, we need to use HEX(WEIGHT_STRING(str)) function to display the weights in printable form. Let us understand it with the help of an example. The below statement take a non-binary case-insensitive string ‘AaBbCc’ and returns weights that do not differ for the given letters:

<strong>SELECT</strong> HEX(WEIGHT_STRING('AaBbCc' <strong>COLLATE</strong> latin1_swedish_ci)) <strong>AS</strong> Weight;  

See the following output where weights A=41=a, B=42=b, and C=43=c are equals:

MySQL Collation

If we take a binary string ‘AaBbCc’ then its weights can differ for the given letters:

<strong>SELECT</strong> HEX(WEIGHT_STRING(<strong>BINARY</strong> 'AaBbCc')) <strong>AS</strong> Weight;  

It will give output as follows where weights A=41, a=61, B=42, b=62, C=43, and c=63 are different:

MySQL Collation

Setting Character Set and Collation

MySQL allows us to configure character set and collation in four ways, which are given below:

  1. Setting at the Server Level
  2. Setting at the Database Level
  3. Setting at the Table Level
  4. Setting at the Column Level

Setting Character Set and Collation at Server Level

We know that MySQL uses latin1 as the default character set and default collation as latin1_swedish_ci. MySQL also allows us to change these default settings at server start-up. If we set only a character set at the server start-up, MySQL will use the specified character set’s default collation. If we set both character set and collation explicitly, MySQL will use these settings for all databases created in the database server. See the below statement that specifies the utf8 as character set and utf8_unicode_cs as collation for the server via command-line tool:

mysqld --character-set-server = utf8 --collation-server = utf8_unicode_ci  

Setting Character Set and Collation at Database Level

When a database is created in MySQL, there is no need to set the character set and collation. It is because MySQL will use the default character set and collation of the server for the database creation. We can modify the default settings at the database level while creating a database using CREATE DATABASE or ALTER DATABASE statement.

Below is the syntax of creating a database using CREATE DATABASE Statement:

  1. CREATE DATABASE database_name  
  2. CHARACTER SET character_set_name;  
  3. COLLATE collation_name  

The below is the syntax of creating database using ALTER DATABASE Statement:

  1. ALTER DATABASE database_name  
  2. CHARACTER SET character_set_name  
  3. COLLATE collation_name;  

When we specify the character set and collation at the database level, MySQL uses it for all tables created within the database.

Example

This example explains how to set the character set and collation at the database level.  

CREATE DATABASE demo_db  
CHARACTER SET utf8  
COLLATE utf8_unicode_ci;  

In the above statement, we specify the character set and collation for the database explicitly. Therefore, it does not use default settings at the server level.

Setting Character Set and Collation at Table Level

A database would contain tables with character sets and collations. These character sets and collations are different from the default settings. When a table is created with a CREATE TABLE statement or alter the table’s structure with the ALTER TABLE statement, we can specify the default character set and collation for a specific table.

Below is the syntax of creating a database using the ALTER TABLE Statement.

CREATE TABLE table_name(  
   ...  
)  
CHARACTER SET character_set_name  
COLLATE collation_name  

The following is the syntax of creating a database using the ALTER TABLE Statement:

ALTER TABLE table_name(  
  ...  
)  
CHARACTER SET character_set_name  
COLLATE collation_name  

Example

This example explains how to set the character set and collation at table level.

USE demodb;   
CREATE TABLE demotable1(  
Column1 char(25)  
);  

In the above example, we have not specified the character set and collation at the table level. Therefore, it uses the database character set and collation. Here, table demotable1 has utf8 as a character set and utf8_unicode_ci as collation.

If we want to change the character set to latin1 and collation to latin1_german1_ci for the above table, use the below statement:

ALTER TABLE demptable1  
CHARACTER SET latin1  
COLLATE latin1_german1_ci;  

Setting Character Set and Collation at Column Level

A column can be of different types such as CHAR, VARCHAR, or TEXT. It can have its own character set and collation, which is different from the table’s default settings. We can change the character set, and collation setting for the column in the column’s definition using the CREATE TABLE or ALTER TABLE statements as follows:

column_name [column_type] (length)  
CHARACTER SET character_set_name  
COLLATE collation_name 

Example

This example explains how to set the character set and collation at the column level.

ALTER TABLE demotable1  
MODIFY column1 VARCHAR(25)  
CHARACTER SET utf8mb4  
COLLATION utf8mb4_0900_ai_ci;  

In the above example, we have changed the character set and collation at the column level determined in the table creation. Therefore, it uses the utf8mb4 as a character set and utf8mb4_0900_ai_ci as a collation.

Key Points in Setting Character Set and Collation

The following are the key points or rules for setting the character set and collation:

  • We can use both character set and collation in MySQL if we set them explicitly.
  • If we set the only character set without specifying the collation, MySQL uses the character set’s default collation.
  • If we set only a collation without specifying the character set, MySQL uses the character set associated with the collation.
  • MySQL uses the default character set and collation if we do not specify both character set and collation.

How to set Character set and collation at the Database Level?

If you want to set the character set and collation at the database level, we use the below syntax.

CREATE DATABASE<databaseName>
CHARACTER SET<characterset_name>;
COLLATE<collation_name>

To alter the default setting of the character set and collate use the below syntax: –

ALTER DATABASE<databaseName>
CHARACTER SET<characterset_name>
COLLATE<collation_name>;

How to set Character set and collation at Table Level?

If you want to set the character set and collation at table level, we use the below syntax.

CREATE TABLE TEST_TABLE(
COLUMN1 <DATA_TYPE>,
COLUMN2 <DATA_TYPE>,
COLUMN3 <DATA_TYPE>,
.
.
COLUMNN <DATA_TYPE>,
)
CHARACTER SET<characterset_name>
COLLATE<collation_name>;

To alter the default setting of the character set and collate use the below syntax: –

ALTER TABLE TEST_TABLE(
COLUMN1 <DATA_TYPE>,
COLUMN2 <DATA_TYPE>,
COLUMN3 <DATA_TYPE>,
.
.
COLUMNN <DATA_TYPE>,
)
CHARACTER SET<characterset_name>
COLLATE<collation_name>;

How to set Character set and collation at Column Level?

If you want to set the character set and collation at table level, we use the below syntax. A column of type CHAR , VARCHAR or TEXT can have its own character set and collation that is different from the default character set and collation of the table.

You can either specify a character set and a collation for the column in the column’s definition of either CREATE TABLE or ALTER TABLE statement as follows:

CREATE TABLE TEST_TABLE(
COLUMN1 [CHAR | VARCHAR | TEXT](length)
CHARACTER SET<characterset_name>
COLLATE<collation_name>,
COLUMN2 [CHAR | VARCHAR | TEXT](length)
CHARACTER SET<characterset_name>
COLLATE<collation_name>,
.
.
COLUMN_N [CHAR | VARCHAR | TEXT](length)
CHARACTER SET<characterset_name>
COLLATE<collation_name>,
);

To alter the default setting of the character set and collate use the below syntax: –

ALTER TABLE TABLE_NAME MODIFY
COLUMN1 [CHAR | VARCHAR | TEXT](length)
CHARACTER SET<characterset_name>
COLLATE<collation_name>,
COLUMN2 [CHAR | VARCHAR | TEXT](length)
CHARACTER SET<characterset_name>
COLLATE<collation_name>,
.
.
COLUMN_N [CHAR | VARCHAR | TEXT](length)
CHARACTER SET<characterset_name>
COLLATE<collation_name>,
);

How to set Character set and collation at Database Level?

If you want to set the character set and collation at the database level, we use the below syntax.

Code:

CREATE DATABASE test
CHARACTER SET ascii
COLLATE ascii_general_ci;

To alter the default setting of the character set and collate use the below syntax: –

Code:

ALTER DATABASE test
CHARACTER SET UTF8
COLLATE utf8_bin;

How to set Character set and Collation at Table Level?

If you want to set the character set and collation at table level, we use the below syntax.

Code:

CREATE TABLE TEST_TABLE(
STD_ID INT,
STD_NAME VARCHAR(20),
STD_MARKS INT,
STD_AVERAGE FLOAT(5,2)
)
CHARACTER SET UTF8
COLLATE utf8_bin;

To alter the default setting of the character set and collate use the below syntax: –

Code:

ALTER TABLE TEST_TABLE
CHARACTER SET latin1 COLLATE latin1_swedish_ci;

Output:

set and collate

How to set Character set and collation at Column Level?

Code:

CREATE TABLE TEST_TABLE(
STD_ID INT,
STD_NAME VARCHAR(20)CHARACTER SET UTF8 COLLATE utf8_bin,
STD_MARKS INT,
STD_AVERAGE FLOAT(5,2)
);

To alter the default setting of the character set and collate use the below syntax: –

Code:

ALTER TABLE TEST_TABLE MODIFY
STD_NAME VARCHAR(20) CHARACTER SET latin1 COLLATE latin1_swedish_ci;

Output:

Test table

Conclusion

Things that need to put in mind from the above session: –

  • MySQL collation is nothing but a set of rules that are used to compare the characters in a particular character set.
  • The character set and collations can be set at four levels and they are at: – server level, database level, table level, and column level.
  • Some rules that need to put in mind are that: – Two character sets can’t have the same collation. Instead, each character set can have one or more different collations.
Introduction to MySQL Collation
Show Buttons
Hide Buttons