[experimental] MaterializedMySQL
This database engine is experimental. To use it, set allow_experimental_database_materialized_mysql to 1 in your configuration files or by using the SET command:
SET allow_experimental_database_materialized_mysql=1
Creates a ClickHouse database with all the tables existing in MySQL, and all the data in those tables. The ClickHouse server works as MySQL replica. It reads binlog and performs DDL and DML queries.
Creating a Database
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
[TABLE OVERRIDE table1 (...), TABLE OVERRIDE table2 (...)]
Engine Parameters
host:port— MySQL server endpoint.database— MySQL database name.user— MySQL user.password— User password.
Engine Settings
max_rows_in_buffer
max_rows_in_buffer — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: 65 505.
max_bytes_in_buffer
max_bytes_in_buffer — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: 1 048 576.
max_flush_data_time
max_flush_data_time — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: 1000.
max_wait_time_when_mysql_unavailable
max_wait_time_when_mysql_unavailable — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: 1000.
allows_query_when_mysql_lost
allows_query_when_mysql_lost — Allows to query a materialized table when MySQL is lost. Default: 0 (false).
allow_startup_database_without_connection_to_mysql
allow_startup_database_without_connection_to_mysql — Allow to create and attach database without available connection to MySQL. Default: 0 (false).
materialized_mysql_tables_list
materialized_mysql_tables_list — a comma-separated list of mysql database tables, which will be replicated by MaterializedMySQL database engine. Default value: empty list — means whole tables will be replicated.
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
SETTINGS
allows_query_when_mysql_lost=true,
max_wait_time_when_mysql_unavailable=10000;
Settings on MySQL-server Side
For the correct work of MaterializedMySQL, there are few mandatory MySQL-side configuration settings that must be set:
default_authentication_plugin
default_authentication_plugin = mysql_native_password since MaterializedMySQL can only authorize with this method.
gtid_mode
gtid_mode = on since GTID based logging is a mandatory for providing correct MaterializedMySQL replication.
While turning on gtid_mode you should also specify enforce_gtid_consistency = on.
Virtual Columns
When working with the MaterializedMySQL database engine, ReplacingMergeTree tables are used with virtual _sign and _version columns.
_version
_version — Transaction counter. Type UInt64.
_sign
_sign — Deletion mark. Type Int8. Possible values:
- `1` — Row is not deleted,
- `-1` — Row is deleted.
Data Types Support
| MySQL | ClickHouse |
|---|---|
| TINY | Int8 |
| SHORT | Int16 |
| INT24 | Int32 |
| LONG | UInt32 |
| LONGLONG | UInt64 |
| FLOAT | Float32 |
| DOUBLE | Float64 |
| DECIMAL, NEWDECIMAL | Decimal |
| DATE, NEWDATE | Date |
| DATETIME, TIMESTAMP | DateTime |
| DATETIME2, TIMESTAMP2 | DateTime64 |
| YEAR | UInt16 |
| TIME | Int64 |
| ENUM | Enum |
| STRING | String |
| VARCHAR, VAR_STRING | String |
| BLOB | String |
| GEOMETRY | String |
| BINARY | FixedString |
| BIT | UInt64 |
| SET | UInt64 |
Nullable is supported.
The data of TIME type in MySQL is converted to microseconds in ClickHouse.
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws an exception and stops replication.
Specifics and Recommendations
Compatibility Restrictions
Apart of the data types limitations there are few restrictions comparing to MySQL databases, that should be resolved before replication will be possible:
Each table in
MySQLshould containPRIMARY KEY.Replication for tables, those are containing rows with
ENUMfield values out of range (specified inENUMsignature) will not work.
DDL Queries
MySQL DDL queries are converted into the corresponding ClickHouse DDL queries (ALTER, CREATE, DROP, RENAME). If ClickHouse cannot parse some DDL query, the query is ignored.
Data Replication
MaterializedMySQL does not support direct INSERT, DELETE and UPDATE queries. However, they are supported in terms of data replication:
MySQL
INSERTquery is converted intoINSERTwith_sign=1.MySQL
DELETEquery is converted intoINSERTwith_sign=-1.MySQL
UPDATEquery is converted intoINSERTwith_sign=-1andINSERTwith_sign=1if the primary key has been changed, orINSERTwith_sign=1if not.
Selecting from MaterializedMySQL Tables
SELECT query from MaterializedMySQL tables has some specifics:
If
_versionis not specified in theSELECTquery, the FINAL modifier is used, so only rows withMAX(_version)are returned for each primary key value.If
_signis not specified in theSELECTquery,WHERE _sign=1is used by default. So the deleted rows are not included into the result set.The result includes columns comments in case they exist in MySQL database tables.
Index Conversion
MySQL PRIMARY KEY and INDEX clauses are converted into ORDER BY tuples in ClickHouse tables.
ClickHouse has only one physical order, which is determined by ORDER BY clause. To create a new physical order, use
materialized views.
Notes
- Rows with
_sign=-1are not deleted physically from the tables. - Cascade
UPDATE/DELETEqueries are not supported by theMaterializedMySQLengine, as they are not visible in the MySQL binlog. - Replication can be easily broken.
- Manual operations on database and tables are forbidden.
MaterializedMySQLis affected by the optimize_on_insert setting. Data is merged in the corresponding table in theMaterializedMySQLdatabase when a table in the MySQL server changes.
Table Overrides
Table overrides can be used to customize the ClickHouse DDL queries, allowing you to make schema optimizations for your application. This is especially useful for controlling partitioning, which is important for the overall performance of MaterializedMySQL.
These are the schema conversion manipulations you can do with table overrides for MaterializedMySQL:
- Modify column type. Must be compatible with the original type, or replication will fail. For example, you can modify a UInt32 column to UInt64, but you can not modify a String column to Array(String).
- Modify column TTL.
- Modify column compression codec.
- Add ALIAS columns.
- Add skipping indexes. Note that you need to enable
use_skip_indexes_if_finalsetting to make them work (MaterializedMySQL is usingSELECT ... FINALby default) - Add projections. Note that projection optimizations are
disabled when using
SELECT ... FINAL(which MaterializedMySQL does by default), so their utility is limited here.INDEX ... TYPE hypothesisas [described in the v21.12 blog post]](https://clickhouse.com/blog/en/2021/clickhouse-v21.12-released/) may be more useful in this case. - Modify PARTITION BY
- Modify ORDER BY
- Modify PRIMARY KEY
- Add SAMPLE BY
- Add table TTL
CREATE DATABASE db_name ENGINE = MaterializedMySQL(...)
[SETTINGS ...]
[TABLE OVERRIDE table_name (
[COLUMNS (
[col_name [datatype] [ALIAS expr] [CODEC(...)] [TTL expr], ...]
[INDEX index_name expr TYPE indextype[(...)] GRANULARITY val, ...]
[PROJECTION projection_name (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]), ...]
)]
[ORDER BY expr]
[PRIMARY KEY expr]
[PARTITION BY expr]
[SAMPLE BY expr]
[TTL expr]
), ...]
Example:
CREATE DATABASE db_name ENGINE = MaterializedMySQL(...)
TABLE OVERRIDE table1 (
COLUMNS (
userid UUID,
category LowCardinality(String),
timestamp DateTime CODEC(Delta, Default)
)
PARTITION BY toYear(timestamp)
),
TABLE OVERRIDE table2 (
COLUMNS (
client_ip String TTL created + INTERVAL 72 HOUR
)
SAMPLE BY ip_hash
)
The COLUMNS list is sparse; existing columns are modified as specified, extra ALIAS columns are added. It is not
possible to add ordinary or MATERIALIZED columns. Modified columns with a different type must be assignable from the
original type. There is currently no validation of this or similar issues when the CREATE DATABASE query executes, so
extra care needs to be taken.
You may specify overrides for tables that do not exist yet.
It is easy to break replication with table overrides if not used with care. For example:
- If an ALIAS column is added with a table override, and a column with the same name is later added to the source MySQL table, the converted ALTER TABLE query in ClickHouse will fail and replication stops.
- It is currently possible to add overrides that reference nullable columns where not-nullable are required, such as in
ORDER BYorPARTITION BY. This will cause CREATE TABLE queries that will fail, also causing replication to stop.
Examples of Use
Queries in MySQL:
mysql> CREATE DATABASE db;
mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
mysql> DELETE FROM db.test WHERE a=1;
mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
mysql> UPDATE db.test SET c='Wow!', b=222;
mysql> SELECT * FROM test;
┌─a─┬───b─┬─c────┐
│ 2 │ 222 │ Wow! │
└───┴─────┴──────┘
Database in ClickHouse, exchanging data with the MySQL server:
The database and the table created:
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***');
SHOW TABLES FROM mysql;
┌─name─┐
│ test │
└──────┘
After inserting data:
SELECT * FROM mysql.test;
┌─a─┬──b─┐
│ 1 │ 11 │
│ 2 │ 22 │
└───┴────┘
After deleting data, adding the column and updating:
SELECT * FROM mysql.test;
┌─a─┬───b─┬─c────┐
│ 2 │ 222 │ Wow! │
└───┴─────┴──────┘