What is the utf8mb4_0900_ai_ci collation?

What is the meaning of the MySQL collation utf8mb4_0900_ai_ci?

  • utf8mb4 means that each character is stored as a maximum of 4 bytes in the UTF-8 encoding scheme.
  • 0900 refers to the Unicode Collation Algorithm version. (The Unicode Collation Algorithm is the method used to compare two Unicode strings that conforms to the requirements of the Unicode Standard).
  • ai refers accent insensitivity. That is, there is no difference between e, è, é, ê and ë when sorting.
  • ci refers to case insensitivity. This is, there is no difference between p and P when sorting.

utf8mb4 has become the default character set, with utf8mb4_0900_ai_ci as the default collation in MySQL 8.0.1 and later. Previously, utf8mb4_general_ci was the default collation. Because the utf8mb4_0900_ai_ci collation is now the default, new tables have the ability to store characters outside the Basic Multilingual Plane by default. Emojis can now be stored by default. If accent sensitivity and case sensitivity are required, you may use utf8mb4_0900_as_cs instead.

If you are interested in the details, the MySQL developers have explained the motivation behind the switch to utf8mb4_0900_ai_ci as the default collation in this article: New collations in MySQL 8.0.0.