2023 Changelog
Table of Contents
ClickHouse release v23.12, 2023-12-28
ClickHouse release v23.11, 2023-12-06
ClickHouse release v23.10, 2023-11-02
ClickHouse release v23.9, 2023-09-28
ClickHouse release v23.8 LTS, 2023-08-31
ClickHouse release v23.7, 2023-07-27
ClickHouse release v23.6, 2023-06-30
ClickHouse release v23.5, 2023-06-08
ClickHouse release v23.4, 2023-04-26
ClickHouse release v23.3 LTS, 2023-03-30
ClickHouse release v23.2, 2023-02-23
ClickHouse release v23.1, 2023-01-25
Changelog for 2022
ClickHouse release 23.12, 2023-12-28
Backward Incompatible Change
- Fix check for non-deterministic functions in TTL expressions. Previously, you could create a TTL expression with non-deterministic functions in some cases, which could lead to undefined behavior later. This fixes #37250. Disallow TTL expressions that don't depend on any columns of a table by default. It can be allowed back by
SET allow_suspicious_ttl_expressions = 1orSET compatibility = '23.11'. Closes #37286. #51858 (Alexey Milovidov). - The MergeTree setting
clean_deleted_rowsis deprecated, it has no effect anymore. TheCLEANUPkeyword for theOPTIMIZEis not allowed by default (it can be unlocked with theallow_experimental_replacing_merge_with_cleanupsetting). #58267 (Alexander Tokmakov). This fixes #57930. This closes #54988. This closes #54570. This closes #50346. This closes #47579. The feature has to be removed because it is not good. We have to remove it as quickly as possible, because there is no other option. #57932 (Alexey Milovidov).
New Feature
- Implement Refreshable Materialized Views, requested in #33919. #56946 (Michael Kolupaev, Michael Guzov).
- Introduce
PASTE JOIN, which allows users to join tables withoutONclause simply by row numbers. Example:SELECT * FROM (SELECT number AS a FROM numbers(2)) AS t1 PASTE JOIN (SELECT number AS a FROM numbers(2) ORDER BY a DESC) AS t2. #57995 (Yarik Briukhovetskyi). - The
ORDER BYclause now supports specifyingALL, meaning that ClickHouse sorts by all columns in theSELECTclause. Example:SELECT col1, col2 FROM tab WHERE [...] ORDER BY ALL. #57875 (zhongyuankai). - Added a new mutation command
ALTER TABLE <table> APPLY DELETED MASK, which allows to enforce applying of mask written by lightweight delete and to remove rows marked as deleted from disk. #57433 (Anton Popov). - A handler
/binaryopens a visual viewer of symbols inside the ClickHouse binary. #58211 (Alexey Milovidov). - Added a new SQL function
sqidto generate Sqids (https://sqids.org/), example:SELECT sqid(125, 126). #57512 (Robert Schulze). - Add a new function
seriesPeriodDetectFFTto detect series period using FFT. #57574 (Bhavna Jindal). - Add an HTTP endpoint for checking if Keeper is ready to accept traffic. #55876 (Konstantin Bogdanov).
- Add 'union' mode for schema inference. In this mode the resulting table schema is the union of all files schemas (so schema is inferred from each file). The mode of schema inference is controlled by a setting
schema_inference_modewith two possible values -defaultandunion. Closes #55428. #55892 (Kruglov Pavel). - Add new setting
input_format_csv_try_infer_numbers_from_stringsthat allows to infer numbers from strings in CSV format. Closes #56455. #56859 (Kruglov Pavel). - When the number of databases or tables exceeds a configurable threshold, show a warning to the user. #57375 (凌涛).
- Dictionary with
HASHED_ARRAY(andCOMPLEX_KEY_HASHED_ARRAY) layout supportsSHARDSsimilarly toHASHED. #57544 (vdimir). - Add asynchronous metrics for total primary key bytes and total allocated primary key bytes in memory. #57551 (Bharat Nallan).
- Add
SHA512_256function. #57645 (Bharat Nallan). - Add
FORMAT_BYTESas an alias forformatReadableSize. #57592 (Bharat Nallan). - Allow passing optional session token to the
s3table function. #57850 (Shani Elharrar). - Introduce a new setting
http_make_head_request. If it is turned off, the URL table engine will not do a HEAD request to determine the file size. This is needed to support inefficient, misconfigured, or not capable HTTP servers. #54602 (Fionera). - It is now possible to refer to ALIAS column in index (non-primary-key) definitions (issue #55650). Example:
CREATE TABLE tab(col UInt32, col_alias ALIAS col + 1, INDEX idx (col_alias) TYPE minmax) ENGINE = MergeTree ORDER BY col;. #57546 (Robert Schulze). - Added a new setting
readonlywhich can be used to specify an S3 disk is read only. It can be useful to create a table on a disk ofs3_plaintype, while having read only access to the underlying S3 bucket. #57977 (Pengyuan Bian). - The primary key analysis in MergeTree tables will now be applied to predicates that include the virtual column
_part_offset(optionally with_part). This feature can serve as a special kind of a secondary index. #58224 (Amos Bird).
Performance Improvement
- Extract non-intersecting parts ranges from MergeTree table during FINAL processing. That way we can avoid additional FINAL logic for this non-intersecting parts ranges. In case when amount of duplicate values with same primary key is low, performance will be almost the same as without FINAL. Improve reading performance for MergeTree FINAL when
do_not_merge_across_partitions_select_finalsetting is set. #58120 (Maksim Kita). - Made copy between s3 disks using a s3-server-side copy instead of copying through the buffer. Improves
BACKUP/RESTOREoperations andclickhouse-disks copycommand. #56744 (MikhailBurdukov). - Hash JOIN respects setting
max_joined_block_size_rowsand do not produce large blocks forALL JOIN. #56996 (vdimir). - Release memory for aggregation earlier. This may avoid unnecessary external aggregation. #57691 (Nikolai Kochetov).
- Improve performance of string serialization. #57717 (Maksim Kita).
- Support trivial count optimization for
Merge-engine tables. #57867 (skyoct). - Optimized aggregation in some cases. #57872 (Anton Popov).
- The
hasAnyfunction can now take advantage of the full-text skipping indices. #57878 (Jpnock). - Function
if(cond, then, else)(and its aliascond ? then : else) were optimized to use branch-free evaluation. #57885 (zhanglistar). - MergeTree automatically derive
do_not_merge_across_partitions_select_finalsetting if partition key expression contains only columns from primary key expression. #58218 (Maksim Kita). - Speedup
MINandMAXfor native types. #58231 (Raúl Marín). - Implement
SLRUcache policy for filesystem cache. #57076 (Kseniia Sumarokova). - The limit for the number of connections per endpoint for background fetches was raised from
15to the value ofbackground_fetches_pool_sizesetting. - MergeTree-level settingreplicated_max_parallel_fetches_for_hostbecame obsolete - MergeTree-level settingsreplicated_fetches_http_connection_timeout,replicated_fetches_http_send_timeoutandreplicated_fetches_http_receive_timeoutare moved to the Server-level. - Settingkeep_alive_timeoutis added to the list of Server-level settings. #57523 (Nikita Mikhaylov). - Make querying
system.filesystem_cachenot memory intensive. #57687 (Kseniia Sumarokova). - Reduce memory usage on strings deserialization. #57787 (Maksim Kita).
- More efficient constructor for Enum - it makes sense when Enum has a boatload of values. #57887 (Duc Canh Le).
- An improvement for reading from the filesystem cache: always use
preadmethod. #57970 (Nikita Taranov). - Add optimization for AND notEquals chain in logical expression optimizer. This optimization is only available with the experimental Analyzer enabled. #58214 (Kevin Mingtarja).
Improvement
- Support for soft memory limit in Keeper. It will refuse requests if the memory usage is close to the maximum. #57271 (Han Fei). #57699 (Han Fei).
- Make inserts into distributed tables handle updated cluster configuration properly. When the list of cluster nodes is dynamically updated, the Directory Monitor of the distribution table will update it. #42826 (zhongyuankai).
- Do not allow creating a replicated table with inconsistent merge parameters. #56833 (Duc Canh Le).
- Show uncompressed size in
system.tables. #56618. #57186 (Chen Lixiang). - Add
skip_unavailable_shardsas a setting forDistributedtables that is similar to the corresponding query-level setting. Closes #43666. #57218 (Gagan Goel). - The function
substring(aliases:substr,mid) can now be used withEnumtypes. Previously, the first function argument had to be a value of typeStringorFixedString. This improves compatibility with 3rd party tools such as Tableau via MySQL interface. #57277 (Serge Klochkov). - Function
formatnow supports arbitrary argument types (instead of onlyStringandFixedStringarguments). This is important to calculateSELECT format('The {0} to all questions is {1}', 'answer', 42). #57549 (Robert Schulze). - Allows to use the
date_truncfunction with a case-insensitive first argument. Both cases are now supported:SELECT date_trunc('day', now())andSELECT date_trunc('DAY', now()). #57624 (Yarik Briukhovetskyi). - Better hints when a table doesn't exist. #57342 (Bharat Nallan).
- Allow to overwrite
max_partition_size_to_dropandmax_table_size_to_dropserver settings in query time. #57452 (Jordi Villar). - Slightly better inference of unnamed tupes in JSON formats. #57751 (Kruglov Pavel).
- Add support for read-only flag when connecting to Keeper (fixes #53749). #57479 (Mikhail Koviazin).
- Fix possible distributed sends stuck due to "No such file or directory" (during recovering a batch from disk). Fix possible issues with
error_countfromsystem.distribution_queue(in case ofdistributed_directory_monitor_max_sleep_time_ms>5min). Introduce profile event to track async INSERT failures -DistributedAsyncInsertionFailures. #57480 (Azat Khuzhin). - Support PostgreSQL generated columns and default column values in
MaterializedPostgreSQL(experimental feature). Closes #40449. #57568 (Kseniia Sumarokova). - Allow to apply some filesystem cache config settings changes without server restart. #57578 (Kseniia Sumarokova).
- Properly handling PostgreSQL table structure with empty array. #57618 (Mike Kot).
- Expose the total number of errors occurred since last server restart as a
ClickHouseErrorMetric_ALLmetric. #57627 (Nikita Mikhaylov). - Allow nodes in the configuration file with
from_env/from_zkreference and non empty element with replace=1. #57628 (Azat Khuzhin). - A table function
fuzzJSONwhich allows generating a lot of malformed JSON for fuzzing. #57646 (Julia Kartseva). - Allow IPv6 to UInt128 conversion and binary arithmetic. #57707 (Yakov Olkhovskiy).
- Add a setting for
async inserts deduplication cache- how long we wait for cache update. Deprecate settingasync_block_ids_cache_min_update_interval_ms. Now cache is updated only in case of conflicts. #57743 (alesapin). sleep()function now can be cancelled withKILL QUERY. #57746 (Vitaly Baranov).- Forbid
CREATE TABLE ... AS SELECTqueries forReplicatedtable engines in the experimentalReplicateddatabase because they are not supported. Reference #35408. #57796 (Nikolay Degterinsky). - Fix and improve transforming queries for external databases, to recursively obtain all compatible predicates. #57888 (flynn).
- Support dynamic reloading of the filesystem cache size. Closes #57866. #57897 (Kseniia Sumarokova).
- Correctly support
system.stack_tracefor threads with blocked SIGRTMIN (these threads can exist in low-quality external libraries such as Apache rdkafka). #57907 (Azat Khuzhin). Aand also send signal to the threads only if it is not blocked to avoid waitingstorage_system_stack_trace_pipe_read_timeout_mswhen it does not make any sense. #58136 (Azat Khuzhin). - Tolerate keeper failures in the quorum inserts' check. #57986 (Raúl Marín).
- Add max/peak RSS (
MemoryResidentMax) into system.asynchronous_metrics. #58095 (Azat Khuzhin). - This PR allows users to use s3-style links (
https://ands3://) without mentioning region if it's not default. Also find the correct region if the user mentioned the wrong one. #58148 (Yarik Briukhovetskyi). clickhouse-format --obfuscatewill know about Settings, MergeTreeSettings, and time zones and keep their names unchanged. #58179 (Alexey Milovidov).- Added explicit
finalize()function inZipArchiveWriter. Simplify too complicated code inZipArchiveWriter. This fixes #58074. #58202 (Vitaly Baranov). - Make caches with the same path use the same cache objects. This behaviour existed before, but was broken in 23.4. If such caches with the same path have different set of cache settings, an exception will be thrown, that this is not allowed. #58264 (Kseniia Sumarokova).
- Parallel replicas (experimental feature): friendly settings #57542 (Igor Nikonov).
- Parallel replicas (experimental feature): announcement response handling improvement #57749 (Igor Nikonov).
- Parallel replicas (experimental feature): give more respect to
min_number_of_marksinParallelReplicasReadingCoordinator#57763 (Nikita Taranov). - Parallel replicas (experimental feature): disable parallel replicas with IN (subquery) #58133 (Igor Nikonov).
- Parallel replicas (experimental feature): add profile event 'ParallelReplicasUsedCount' #58173 (Igor Nikonov).
- Non POST requests such as HEAD will be readonly similar to GET. #58060 (San).
- Add
bytes_uncompressedcolumn tosystem.part_log#58167 (Jordi Villar). - Add base backup name to
system.backupsandsystem.backup_logtables #58178 (Pradeep Chhetri). - Add support for specifying query parameters in the command line in clickhouse-local #58210 (Pradeep Chhetri).
Build/Testing/Packaging Improvement
- Randomize more settings #39663 (Anton Popov).
- Randomize disabled optimizations in CI #57315 (Raúl Marín).
- Allow usage of Azure-related table engines/functions on macOS. #51866 (Alexey Milovidov).
- ClickHouse Fast Test now uses Musl instead of GLibc. #57711 (Alexey Milovidov). The fully-static Musl build is available to download from the CI.
- Run ClickBench for every commit. This closes #57708. #57712 (Alexey Milovidov).
- Remove the usage of a harmful C/POSIX
selectfunction from external libraries. #57467 (Igor Nikonov). - Settings only available in ClickHouse Cloud will be also present in the open-source ClickHouse build for convenience. #57638 (Nikita Mikhaylov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fixed a possibility of sorting order breakage in TTL GROUP BY #49103 (Nikita Mikhaylov).
- Fix: split
lttbbucket strategy, first bucket and last bucket should only contain single point #57003 (FFish). - Fix possible deadlock in the
Templateformat during sync after error #57004 (Kruglov Pavel). - Fix early stop while parsing a file with skipping lots of errors #57006 (Kruglov Pavel).
- Prevent dictionary's ACL bypass via the
dictionarytable function #57362 (Salvatore Mesoraca). - Fix another case of a "non-ready set" error found by Fuzzer. #57423 (Nikolai Kochetov).
- Fix several issues regarding PostgreSQL
array_ndimsusage. #57436 (Ryan Jacobs). - Fix RWLock inconsistency after write lock timeout #57454 (Vitaly Baranov). Fix RWLock inconsistency after write lock timeout (again) #57733 (Vitaly Baranov).
- Fix: don't exclude ephemeral column when building pushing to view chain #57461 (Yakov Olkhovskiy).
- MaterializedPostgreSQL (experimental issue): fix issue #41922, add test for #41923 #57515 (Kseniia Sumarokova).
- Ignore ON CLUSTER clause in grant/revoke queries for management of replicated access entities. #57538 (MikhailBurdukov).
- Fix crash in clickhouse-local #57553 (Nikolay Degterinsky).
- A fix for Hash JOIN. #57564 (vdimir).
- Fix possible error in PostgreSQL source #57567 (Kseniia Sumarokova).
- Fix type correction in Hash JOIN for nested LowCardinality. #57614 (vdimir).
- Avoid hangs of
system.stack_traceby correctly prohibiting parallel reading from it. #57641 (Azat Khuzhin). - Fix an error for aggregation of sparse columns with
any(...) RESPECT NULL#57710 (Azat Khuzhin). - Fix unary operators parsing #57713 (Nikolay Degterinsky).
- Fix dependency loading for the experimental table engine
MaterializedPostgreSQL. #57754 (Kseniia Sumarokova). - Fix retries for disconnected nodes for BACKUP/RESTORE ON CLUSTER #57764 (Vitaly Baranov).
- Fix result of external aggregation in case of partially materialized projection #57790 (Anton Popov).
- Fix merge in aggregation functions with
*Mapcombinator #57795 (Anton Popov). - Disable
system.kafka_consumersbecause it has a bug. #57822 (Azat Khuzhin). - Fix LowCardinality keys support in Merge JOIN. #57827 (vdimir).
- A fix for
InterpreterCreateQueryrelated to the sample block. #57855 (Maksim Kita). addresses_exprwere ignored for named collections from PostgreSQL. #57874 (joelynch).- Fix invalid memory access in BLAKE3 (Rust) #57876 (Raúl Marín). Then it was rewritten from Rust to C++ for better memory-safety. #57994 (Raúl Marín).
- Normalize function names in
CREATE INDEX#57906 (Alexander Tokmakov). - Fix handling of unavailable replicas before first request happened #57933 (Nikita Taranov).
- Fix literal alias misclassification #57988 (Chen768959).
- Fix invalid preprocessing on Keeper #58069 (Antonio Andelic).
- Fix integer overflow in the
Pocolibrary, related toUTF32Encoding#58073 (Andrey Fedotov). - Fix parallel replicas (experimental feature) in presence of a scalar subquery with a big integer value #58118 (Alexey Milovidov).
- Fix
accurateCastOrNullfor out-of-rangeDateTime#58139 (Andrey Zvonov). - Fix possible
PARAMETER_OUT_OF_BOUNDerror during subcolumns reading from a wide part in MergeTree #58175 (Kruglov Pavel). - Fix a slow-down of CREATE VIEW with an enormous number of subqueries #58220 (Tao Wang).
- Fix parallel parsing for JSONCompactEachRow #58181 (Alexey Milovidov). #58250 (Kruglov Pavel).
ClickHouse release 23.11, 2023-12-06
Backward Incompatible Change
- The default ClickHouse server configuration file has enabled
access_management(user manipulation by SQL queries) andnamed_collection_control(manipulation of named collection by SQL queries) for thedefaultuser by default. This closes #56482. #56619 (Alexey Milovidov). - Multiple improvements for
RESPECT NULLS/IGNORE NULLSfor window functions. If you use them as aggregate functions and store the states of aggregate functions with these modifiers, they might become incompatible. #57189 (Raúl Marín). - Remove optimization
optimize_move_functions_out_of_any. #57190 (Raúl Marín). - Formatters
%l/%k/%cin functionparseDateTimeare now able to parse hours/months without leading zeros, e.g.select parseDateTime('2023-11-26 8:14', '%F %k:%i')now works. Setparsedatetime_parse_without_leading_zeros = 0to restore the previous behavior which required two digits. FunctionformatDateTimeis now also able to print hours/months without leading zeros. This is controlled by settingformatdatetime_format_without_leading_zerosbut off by default to not break existing use cases. #55872 (Azat Khuzhin). - You can no longer use the aggregate function
avgWeightedwith arguments of typeDecimal. Workaround: convert arguments toFloat64. This closes #43928. This closes #31768. This closes #56435. If you have used this function inside materialized views or projections withDecimalarguments, contact support@clickhouse.com. Fixed error in aggregate functionsumMapand made it slower around 1.5..2 times. It does not matter because the function is garbage anyway. This closes #54955. This closes #53134. This closes #55148. Fix a bug in functiongroupArraySample- it used the same random seed in case more than one aggregate state is generated in a query. #56350 (Alexey Milovidov).
New Feature
- Added server setting
async_load_databasesfor asynchronous loading of databases and tables. Speeds up the server start time. Applies to databases withOrdinary,AtomicandReplicatedengines. Their tables load metadata asynchronously. Query to a table increases the priority of the load job and waits for it to be done. Added a new tablesystem.asynchronous_loaderfor introspection. #49351 (Sergei Trifonov). - Add system table
blob_storage_log. It allows auditing all the data written to S3 and other object storages. #52918 (vdimir). - Use statistics to order prewhere conditions better. #53240 (Han Fei).
- Added support for compression in the Keeper's protocol. It can be enabled on the ClickHouse side by using this flag
use_compressioninsidezookeepersection. Keep in mind that only ClickHouse Keeper supports compression, while Apache ZooKeeper does not. Resolves #49507. #54957 (SmitaRKulkarni). - Introduce the feature
storage_metadata_write_full_object_key. If it is set astruethen metadata files are written with the new format. With that format ClickHouse stores full remote object key in the metadata file which allows better flexibility and optimization. #55566 (Sema Checherinda). - Add new settings and syntax to protect named collections' fields from being overridden. This is meant to prevent a malicious user from obtaining unauthorized access to secrets. #55782 (Salvatore Mesoraca).
- Add
hostnamecolumn to all system log tables - it is useful if you make the system tables replicated, shared, or distributed. #55894 (Bharat Nallan). - Add
CHECK ALL TABLESquery. #56022 (vdimir). - Added function
fromDaysSinceYearZerowhich is similar to MySQL'sFROM_DAYS. E.g.SELECT fromDaysSinceYearZero(739136)returns2023-09-08. #56088 (Joanna Hulboj). - Add an external Python tool to view backups and to extract information from them without using ClickHouse. #56268 (Vitaly Baranov).
- Implement a new setting called
preferred_optimize_projection_name. If it is set to a non-empty string, the specified projection would be used if possible instead of choosing from all the candidates. #56309 (Yarik Briukhovetskyi). - Add 4-letter command for yielding/resigning leadership (https://github.com/ClickHouse/ClickHouse/issues/56352). #56354 (Pradeep Chhetri). #56620 (Pradeep Chhetri).
- Added a new SQL function,
arrayRandomSample(arr, k)which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g.SELECT arrayReduce('groupArraySample(3)', range(10)). #56416 (Robert Schulze). - Added support for
Float16type data to use in.npyfiles. Closes #56344. #56424 (Yarik Briukhovetskyi). - Added a system view
information_schema.statisticsfor better compatibility with Tableau Online. #56425 (Serge Klochkov). - Add
system.symbolstable useful for introspection of the binary. #56548 (Alexey Milovidov). - Configurable dashboards. Queries for charts are now loaded using a query, which by default uses a new
system.dashboardstable. #56771 (Sergei Trifonov). - Introduce
fileClustertable function - it is useful if you mount a shared filesystem (NFS and similar) into theuser_filesdirectory. #56868 (Andrey Zvonov). - Add
_sizevirtual column with file size in bytes tos3/file/hdfs/url/azureBlobStorageengines. #57126 (Kruglov Pavel). - Expose the number of errors for each error code occurred on a server since last restart from the Prometheus endpoint. #57209 (Nikita Mikhaylov).
- ClickHouse keeper reports its running availability zone at
/keeper/availability-zonepath. This can be configured via<availability_zone><value>us-west-1a</value></availability_zone>. #56715 (Jianfei Hu). - Make ALTER materialized_view MODIFY QUERY non experimental and deprecate
allow_experimental_alter_materialized_view_structuresetting. Fixes #15206. #57311 (alesapin). - Setting
join_algorithmrespects specified order #51745 (vdimir). - Add support for the well-known Protobuf types in the Protobuf format. #56741 (János Benjamin Antal).
Performance Improvement
- Adaptive timeouts for interacting with S3. The first attempt is made with low send and receive timeouts. #56314 (Sema Checherinda).
- Increase the default value of
max_concurrent_queriesfrom 100 to 1000. This makes sense when there is a large number of connecting clients, which are slowly sending or receiving data, so the server is not limited by CPU, or when the number of CPU cores is larger than 100. Also, enable the concurrency control by default, and set the desired number of query processing threads in total as twice the number of CPU cores. It improves performance in scenarios with a very large number of concurrent queries. #46927 (Alexey Milovidov). - Support parallel evaluation of window functions. Fixes #34688. #39631 (Dmitry Novik).
Numberstable engine (of thesystem.numberstable) now analyzes the condition to generate the needed subset of data, like table's index. #50909 (JackyWoo).- Improved the performance of filtering by
IN (...)condition forMergetable engine. #54905 (Nikita Taranov). - An improvement which takes place when the filesystem cache is full and there are big reads. #55158 (Kseniia Sumarokova).
- Add ability to disable checksums for S3 to avoid excessive pass over the file (this is controlled by the setting
s3_disable_checksum). #55559 (Azat Khuzhin). - Now we read synchronously from remote tables when data is in page cache (like we do for local tables). It is faster, it doesn't require synchronisation inside the thread pool, and doesn't hesitate to do
seek-s on local FS, and reduces CPU wait. #55841 (Nikita Taranov). - Optimization for getting value from
map,arrayElement. It will bring about 30% speedup. - reduce the reserved memory - reduce theresizecall. #55957 (lgbo). - Optimization of multi-stage filtering with AVX-512. The performance experiments of the OnTime dataset on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) show that this change could bring the improvements of 7.4%, 5.9%, 4.7%, 3.0%, and 4.6% to the QPS of the query Q2, Q3, Q4, Q5 and Q6 respectively while having no impact on others. #56079 (Zhiguo Zhou).
- Limit the number of threads busy inside the query profiler. If there are more - they will skip profiling. #56105 (Alexey Milovidov).
- Decrease the amount of virtual function calls in window functions. #56120 (Maksim Kita).
- Allow recursive Tuple field pruning in ORC data format to speed up scaning. #56122 (李扬).
- Trivial count optimization for
Npydata format: queries likeselect count() from 'data.npy'will work much more fast because of caching the results. #56304 (Yarik Briukhovetskyi). - Queries with aggregation and a large number of streams will use less amount of memory during the plan's construction. #57074 (Alexey Milovidov).
- Improve performance of executing queries for use cases with many users and highly concurrent queries (>2000 QPS) by optimizing the access to ProcessList. #57106 (Andrej Hoos).
- Trivial improvement on array join, reuse some intermediate results. #57183 (李扬).
- There are cases when stack unwinding was slow. Not anymore. #57221 (Alexey Milovidov).
- Now we use default read pool for reading from external storage when
max_streams = 1. It is beneficial when read prefetches are enabled. #57334 (Nikita Taranov). - Keeper improvement: improve memory-usage during startup by delaying log preprocessing. #55660 (Antonio Andelic).
- Improved performance of glob matching for
FileandHDFSstorages. #56141 (Andrey Zvonov). - Posting lists in experimental full text indexes are now compressed which reduces their size by 10-30%. #56226 (Harry Lee).
- Parallelise
BackupEntriesCollectorin backups. #56312 (Kseniia Sumarokova).
Improvement
- Add a new
MergeTreesettingadd_implicit_sign_column_constraint_for_collapsing_engine(disabled by default). When enabled, it adds an implicit CHECK constraint forCollapsingMergeTreetables that restricts the value of theSigncolumn to be only -1 or 1. #56701. #56986 (Kevin Mingtarja). - Enable adding new disk to storage configuration without restart. #56367 (Duc Canh Le).
- Support creating and materializing index in the same alter query, also support "modify TTL" and "materialize TTL" in the same query. Closes #55651. #56331 (flynn).
- Add a new table function named
fuzzJSONwith rows containing perturbed versions of the source JSON string with random variations. #56490 (Julia Kartseva). - Engine
Mergefilters the records according to the row policies of the underlying tables, so you don't have to create another row policy on aMergetable. #50209 (Ilya Golshtein). - Add a setting
max_execution_time_leafto limit the execution time on shard for distributed query, andtimeout_overflow_mode_leafto control the behaviour if timeout happens. #51823 (Duc Canh Le). - Add ClickHouse setting to disable tunneling for HTTPS requests over HTTP proxy. #55033 (Arthur Passos).
- Set
background_fetches_pool_sizeto 16, background_schedule_pool_size to 512 that is better for production usage with frequent small insertions. #54327 (Denny Crane). - While read data from a csv format file, and at end of line is
\r, which not followed by\n, then we will enconter the exception as followsCannot parse CSV format: found \r (CR) not followed by \n (LF). Line must end by \n (LF) or \r\n (CR LF) or \n\r.In clickhouse, the csv end of line must be\nor\r\nor\n\r, so the\rmust be followed by\n, but in some situation, the csv input data is abnormal, like above,\ris at end of line. #54340 (KevinyhZou). - Update Arrow library to release-13.0.0 that supports new encodings. Closes #44505. #54800 (Kruglov Pavel).
- Improve performance of ON CLUSTER queries by removing heavy system calls to get all network interfaces when looking for local ip address in the DDL entry hosts list. #54909 (Duc Canh Le).
- Fixed accounting of memory allocated before attaching a thread to a query or a user. #56089 (Nikita Taranov).
- Add support for
LARGE_LISTin Apache Arrow formats. #56118 (edef). - Allow manual compaction of
EmbeddedRocksDBviaOPTIMIZEquery. #56225 (Azat Khuzhin). - Add ability to specify BlockBasedTableOptions for
EmbeddedRocksDBtables. #56264 (Azat Khuzhin). SHOW COLUMNSnow displays MySQL's equivalent data type name when the connection was made through the MySQL protocol. Previously, this was the case when settinguse_mysql_types_in_show_columns = 1. The setting is retained but made obsolete. #56277 (Robert Schulze).- Fixed possible
The local set of parts of table doesn't look like the set of parts in ZooKeepererror if server was restarted just afterTRUNCATEorDROP PARTITION. #56282 (Alexander Tokmakov). - Fixed handling of non-const query strings in functions
formatQuery/formatQuerySingleLine. Also addedOrNullvariants of both functions that return a NULL when a query cannot be parsed instead of throwing an exception. #56327 (Robert Schulze). - Allow backup of materialized view with dropped inner table instead of failing the backup. #56387 (Kseniia Sumarokova).
- Queries to
system.replicasinitiate requests to ZooKeeper when certain columns are queried. When there are thousands of tables these requests might produce a considerable load on ZooKeeper. If there are multiple simultaneous queries tosystem.replicasthey do same requests multiple times. The change is to "deduplicate" requests from concurrent queries. #56420 (Alexander Gololobov). - Fix translation to MySQL compatible query for querying external databases. #56456 (flynn).
- Add support for backing up and restoring tables using
KeeperMapengine. #56460 (Antonio Andelic). - 404 response for CompleteMultipartUpload has to be rechecked. Operation could be done on server even if client got timeout or other network errors. The next retry of CompleteMultipartUpload receives 404 response. If the object key exists that operation is considered as successful. #56475 (Sema Checherinda).
- Enable the HTTP OPTIONS method by default - it simplifies requesting ClickHouse from a web browser. #56483 (Alexey Milovidov).
- The value for
dns_max_consecutive_failureswas changed by mistake in #46550 - this is reverted and adjusted to a better value. Also, increased the HTTP keep-alive timeout to a reasonable value from production. #56485 (Alexey Milovidov). - Load base backups lazily (a base backup won't be loaded until it's needed). Also add some log message and profile events for backups. #56516 (Vitaly Baranov).
- Setting
query_cache_store_results_of_queries_with_nondeterministic_functions(with valuesfalseortrue) was marked obsolete. It was replaced by settingquery_cache_nondeterministic_function_handling, a three-valued enum that controls how the query cache handles queries with non-deterministic functions: a) throw an exception (default behavior), b) save the non-deterministic query result regardless, or c) ignore, i.e. don't throw an exception and don't cache the result. #56519 (Robert Schulze). - Rewrite equality with
is nullcheck in JOIN ON section. Experimental Analyzer only. #56538 (vdimir). - Function
concatnow supports arbitrary argument types (instead of only String and FixedString arguments). This makes it behave more similar to MySQLconcatimplementation. For example,SELECT concat('ab', 42)now returnsab42. #56540 (Serge Klochkov). - Allow getting cache configuration from 'named_collection' section in config or from SQL created named collections. #56541 (Kseniia Sumarokova).
- PostgreSQL database engine: Make the removal of outdated tables less aggressive with unsuccessful postgres connection. #56609 (jsc0218).
- It took too much time to connnect to PG when URL is not right, so the relevant query stucks there and get cancelled. #56648 (jsc0218).
- Keeper improvement: disable compressed logs by default in Keeper. #56763 (Antonio Andelic).
- Add config setting
wait_dictionaries_load_at_startup. #56782 (Vitaly Baranov). - There was a potential vulnerability in previous ClickHouse versions: if a user has connected and unsuccessfully tried to authenticate with the "interserver secret" method, the server didn't terminate the connection immediately but continued to receive and ignore the leftover packets from the client. While these packets are ignored, they are still parsed, and if they use a compression method with another known vulnerability, it will lead to exploitation of it without authentication. This issue was found with ClickHouse Bug Bounty Program by https://twitter.com/malacupa. #56794 (Alexey Milovidov).
- Fetching a part waits when that part is fully committed on remote replica. It is better not send part in PreActive state. In case of zero copy this is mandatory restriction. #56808 (Sema Checherinda).
- Fix possible postgresql logical replication conversion error when using experimental
MaterializedPostgreSQL. #53721 (takakawa). - Implement user-level setting
alter_move_to_space_execute_asyncwhich allow to execute queriesALTER TABLE ... MOVE PARTITION|PART TO DISK|VOLUMEasynchronously. The size of pool for background executions is controlled bybackground_move_pool_size. Default behavior is synchronous execution. Fixes #47643. #56809 (alesapin). - Able to filter by engine when scanning system.tables, avoid unnecessary (potentially time-consuming) connection. #56813 (jsc0218).
- Show
total_bytesandtotal_rowsin system tables for RocksDB storage. #56816 (Aleksandr Musorin). - Allow basic commands in ALTER for TEMPORARY tables. #56892 (Sergey).
- LZ4 compression. Buffer compressed block in a rare case when out buffer capacity is not enough for writing compressed block directly to out's buffer. #56938 (Sema Checherinda).
- Add metrics for the number of queued jobs, which is useful for the IO thread pool. #56958 (Alexey Milovidov).
- Add a setting for PostgreSQL table engine setting in the config file. Added a check for the setting Added documentation around the additional setting. #56959 (Peignon Melvyn).
- Function
concatcan now be called with a single argument, e.g.,SELECT concat('abc'). This makes its behavior more consistent with MySQL's concat implementation. #57000 (Serge Klochkov). - Signs all
x-amz-*headers as required by AWS S3 docs. #57001 (Arthur Passos). - Function
fromDaysSinceYearZero(alias:FROM_DAYS) can now be used with unsigned and signed integer types (previously, it had to be an unsigned integer). This improve compatibility with 3rd party tools such as Tableau Online. #57002 (Serge Klochkov). - Add
system.s3queue_logto default config. #57036 (Kseniia Sumarokova). - Change the default for
wait_dictionaries_load_at_startupto true, and use this setting only ifdictionaries_lazy_loadis false. #57133 (Vitaly Baranov). - Check dictionary source type on creation even if
dictionaries_lazy_loadis enabled. #57134 (Vitaly Baranov). - Plan-level optimizations can now be enabled/disabled individually. Previously, it was only possible to disable them all. The setting which previously did that (
query_plan_enable_optimizations) is retained and can still be used to disable all optimizations. #57152 (Robert Schulze). - The server's exit code will correspond to the exception code. For example, if the server cannot start due to memory limit, it will exit with the code 241 = MEMORY_LIMIT_EXCEEDED. In previous versions, the exit code for exceptions was always 70 = Poco::Util::ExitCode::EXIT_SOFTWARE. #57153 (Alexey Milovidov).
- Do not demangle and symbolize stack frames from
functionalC++ header. #57201 (Mike Kot). - HTTP server page
/dashboardnow supports charts with multiple lines. #57236 (Sergei Trifonov). - The
max_memory_usage_in_clientcommand line option supports a string value with a suffix (K, M, G, etc). Closes #56879. #57273 (Yarik Briukhovetskyi). - Bumped Intel QPL (used by codec
DEFLATE_QPL) from v1.2.0 to v1.3.1 . Also fixed a bug in case of BOF (Block On Fault) = 0, changed to handle page faults by falling back to SW path. #57291 (jasperzhu). - Increase default
replicated_deduplication_windowof MergeTree settings from 100 to 1k. #57335 (sichenzhao). - Stop using
INCONSISTENT_METADATA_FOR_BACKUPthat much. If possible prefer to continue scanning instead of stopping and starting the scanning for backup from the beginning. #57385 (Vitaly Baranov).
Build/Testing/Packaging Improvement
- Add SQLLogic test. #56078 (Han Fei).
- Make
clickhouse-localandclickhouse-clientavailable under short names (ch,chl,chc) for usability. #56634 (Alexey Milovidov). - Optimized build size further by removing unused code from external libraries. #56786 (Alexey Milovidov).
- Add automatic check that there are no large translation units. #56559 (Alexey Milovidov).
- Lower the size of the single-binary distribution. This closes #55181. #56617 (Alexey Milovidov).
- Information about the sizes of every translation unit and binary file after each build will be sent to the CI database in ClickHouse Cloud. This closes #56107. #56636 (Alexey Milovidov).
- Certain files of "Apache Arrow" library (which we use only for non-essential things like parsing the arrow format) were rebuilt all the time regardless of the build cache. This is fixed. #56657 (Alexey Milovidov).
- Avoid recompiling translation units depending on the autogenerated source file about version. #56660 (Alexey Milovidov).
- Tracing data of the linker invocations will be sent to the CI database in ClickHouse Cloud. #56725 (Alexey Milovidov).
- Use DWARF 5 debug symbols for the clickhouse binary (was DWARF 4 previously). #56770 (Michael Kolupaev).
- Add a new build option
SANITIZE_COVERAGE. If it is enabled, the code is instrumented to track the coverage. The collected information is available inside ClickHouse with: (1) a new functioncoveragethat returns an array of unique addresses in the code found after the previous coverage reset; (2)SYSTEM RESET COVERAGEquery that resets the accumulated data. This allows us to compare the coverage of different tests, including differential code coverage. Continuation of #20539. #56102 (Alexey Milovidov). - Some of the stack frames might not be resolved when collecting stacks. In such cases the raw address might be helpful. #56267 (Alexander Gololobov).
- Add an option to disable
libssh. #56333 (Alexey Milovidov). - Enable temporary_data_in_cache in S3 tests in CI. #48425 (vdimir).
- Set the max memory usage for clickhouse-client (
1G) in the CI. #56873 (Nikita Mikhaylov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix exerimental Analyzer - insertion from select with subquery referencing insertion table should process only insertion block. #50857 (Yakov Olkhovskiy).
- Fix a bug in
str_to_mapfunction. #56423 (Arthur Passos). - Keeper
reconfig: add timeout before yielding/taking leadership #53481 (Mike Kot). - Fix incorrect header in grace hash join and filter pushdown #53922 (vdimir).
- Select from system tables when table based on table function. #55540 (MikhailBurdukov).
- RFC: Fix "Cannot find column X in source stream" for Distributed queries with LIMIT BY #55836 (Azat Khuzhin).
- Fix 'Cannot read from file:' while running client in a background #55976 (Kruglov Pavel).
- Fix clickhouse-local exit on bad send_logs_level setting #55994 (Kruglov Pavel).
- Bug fix explain ast with parameterized view #56004 (SmitaRKulkarni).
- Fix a crash during table loading on startup #56232 (Nikolay Degterinsky).
- Fix ClickHouse-sourced dictionaries with an explicit query #56236 (Nikolay Degterinsky).
- Fix segfault in signal handler for Keeper #56266 (Antonio Andelic).
- Fix incomplete query result for UNION in view() function. #56274 (Nikolai Kochetov).
- Fix inconsistency of "cast('0' as DateTime64(3))" and "cast('0' as Nullable(DateTime64(3)))" #56286 (李扬).
- Fix rare race condition related to Memory allocation failure #56303 (alesapin).
- Fix restore from backup with
flatten_nestedanddata_type_default_nullable#56306 (Kseniia Sumarokova). - Fix crash in case of adding a column with type Object(JSON) #56307 (Nikita Mikhaylov).
- Fix crash in filterPushDown #56380 (vdimir).
- Fix restore from backup with mat view and dropped source table #56383 (Kseniia Sumarokova).
- Fix segfault during Kerberos initialization #56401 (Nikolay Degterinsky).
- Fix buffer overflow in T64 #56434 (Alexey Milovidov).
- Fix nullable primary key in final (2) #56452 (Amos Bird).
- Fix ON CLUSTER queries without database on initial node #56484 (Nikolay Degterinsky).
- Fix startup failure due to TTL dependency #56489 (Nikolay Degterinsky).
- Fix ALTER COMMENT queries ON CLUSTER #56491 (Nikolay Degterinsky).
- Fix ALTER COLUMN with ALIAS #56493 (Nikolay Degterinsky).
- Fix empty NAMED COLLECTIONs #56494 (Nikolay Degterinsky).
- Fix two cases of projection analysis. #56502 (Amos Bird).
- Fix handling of aliases in query cache #56545 (Robert Schulze).
- Fix conversion from
Nullable(Enum)toNullable(String)#56644 (Nikolay Degterinsky). - More reliable log handling in Keeper #56670 (Antonio Andelic).
- Fix configuration merge for nodes with substitution attributes #56694 (Konstantin Bogdanov).
- Fix duplicate usage of table function input(). #56695 (Nikolai Kochetov).
- Fix: RabbitMQ OpenSSL dynamic loading issue #56703 (Igor Nikonov).
- Fix crash in GCD codec in case when zeros present in data #56704 (Nikita Mikhaylov).
- Fix 'mutex lock failed: Invalid argument' in clickhouse-local during insert into function #56710 (Kruglov Pavel).
- Fix Date text parsing in optimistic path #56765 (Kruglov Pavel).
- Fix crash in FPC codec #56795 (Alexey Milovidov).
- DatabaseReplicated: fix DDL query timeout after recovering a replica #56796 (Alexander Tokmakov).
- Fix incorrect nullable columns reporting in MySQL binary protocol #56799 (Serge Klochkov).
- Support Iceberg metadata files for metastore tables #56810 (Kruglov Pavel).
- Fix TSAN report under transform #56817 (Raúl Marín).
- Fix SET query and SETTINGS formatting #56825 (Nikolay Degterinsky).
- Fix failure to start due to table dependency in joinGet #56828 (Nikolay Degterinsky).
- Fix flattening existing Nested columns during ADD COLUMN #56830 (Nikolay Degterinsky).
- Fix allow cr end of line for csv #56901 (KevinyhZou).
- Fix
tryBase64Decodewith invalid input #56913 (Robert Schulze). - Fix generating deep nested columns in CapnProto/Protobuf schemas #56941 (Kruglov Pavel).
- Prevent incompatible ALTER of projection columns #56948 (Amos Bird).
- Fix sqlite file path validation #56984 (San).
- S3Queue: fix metadata reference increment #56990 (Kseniia Sumarokova).
- S3Queue minor fix #56999 (Kseniia Sumarokova).
- Fix file path validation for DatabaseFileSystem #57029 (San).
- Fix
fuzzBitswithARRAY JOIN#57033 (Antonio Andelic). - Fix Nullptr dereference in partial merge join with joined_subquery_re… #57048 (vdimir).
- Fix race condition in RemoteSource #57052 (Raúl Marín).
- Implement
bitHammingDistancefor big integers #57073 (Alexey Milovidov). - S3-style links bug fix #57075 (Yarik Briukhovetskyi).
- Fix JSON_QUERY function with multiple numeric paths #57096 (KevinyhZou).
- Fix buffer overflow in Gorilla codec #57107 (Nikolay Degterinsky).
- Close interserver connection on any exception before authentication #57142 (Antonio Andelic).
- Fix segfault after ALTER UPDATE with Nullable MATERIALIZED column #57147 (Nikolay Degterinsky).
- Fix incorrect JOIN plan optimization with partially materialized normal projection #57196 (Amos Bird).
- Ignore comments when comparing column descriptions #57259 (Antonio Andelic).
- Fix
ReadonlyReplicametric for all cases #57267 (Antonio Andelic). - Background merges correctly use temporary data storage in the cache #57275 (vdimir).
- Keeper fix for changelog and snapshots #57299 (Antonio Andelic).
- Ignore finished ON CLUSTER tasks if hostname changed #57339 (Alexander Tokmakov).
- MergeTree mutations reuse source part index granularity #57352 (Maksim Kita).
- FS cache: add a limit for background download #57424 (Kseniia Sumarokova).
ClickHouse release 23.10, 2023-11-02
Backward Incompatible Change
- There is no longer an option to automatically remove broken data parts. This closes #55174. #55184 (Alexey Milovidov). #55557 (Jihyuk Bok).
- The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. #55186 (Alexey Milovidov).
- Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. #55189 (Alexey Milovidov).
- Rename directory monitor concept into background INSERT. All the settings
*directory_monitor*had been renamed todistributed_background_insert*. Backward compatibility should be preserved (since old settings had been added as an alias). #55978 (Azat Khuzhin). - Do not interpret the
send_timeoutset on the client side as thereceive_timeouton the server side and vise-versa. #56035 (Azat Khuzhin). - Comparison of time intervals with different units will throw an exception. This closes #55942. You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. #56090 (Alexey Milovidov).
- Rewrited the experimental
S3Queuetable engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Addedsystem.s3queueandsystem.s3queue_logtables. Closes #54998. #54422 (Kseniia Sumarokova). - Arbitrary paths on HTTP endpoint are no longer interpreted as a request to the
/queryendpoint. #55521 (Konstantin Bogdanov).
New Feature
- Add function
arrayFold(accumulator, x1, ..., xn -> expression, initial, array1, ..., arrayn)which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. #49794 (Lirikl). - Support for
Npyformat.SELECT * FROM file('example_array.npy', Npy). #55982 (Yarik Briukhovetskyi). - If a table has a space-filling curve in its key, e.g.,
ORDER BY mortonEncode(x, y), the conditions on its arguments, e.g.,x >= 10 AND x <= 20 AND y >= 20 AND y <= 30can be used for indexing. A settinganalyze_index_with_space_filling_curvesis added to enable or disable this analysis. This closes #41195. Continuation of #4538. Continuation of #6286. Continuation of #28130. Continuation of #41753. #55642 (Alexey Milovidov). - A new setting called
force_optimize_projection_name, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes #55331. #56134 (Yarik Briukhovetskyi). - Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. #54730 (Anton Popov).
- Added aggregation function
lttbwhich uses the Largest-Triangle-Three-Buckets algorithm for downsampling data for visualization. #53145 (Sinan). - Query
CHECK TABLEhas better performance and usability (sends progress updates, cancellable). Support checking particular part withCHECK TABLE ... PART 'part_name'. #53404 (vdimir). - Added function
jsonMergePatch. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. #54364 (Memo). - The second part of Kusto Query Language dialect support. Phase 1 implementation has been merged. #42510 (larryluogit).
- Added a new SQL function,
arrayRandomSample(arr, k)which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". #54391 (itayisraelov). - Introduce
-ArgMin/-ArgMaxaggregate combinators which allow to aggregate by min/max values only. One use case can be found in #54818. This PR also reorganize combinators into dedicated folder. #54947 (Amos Bird). - Allow to drop cache for Protobuf format with
SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]. #55064 (Aleksandr Musorin). - Add external HTTP Basic authenticator. #55199 (Aleksei Filatov).
- Added function
byteSwapwhich reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. #55211 (Priyansh Agrawal). - Added function
formatQuerywhich returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added functionformatQuerySingleLinewhich does the same but the returned string will not contain linebreaks. #55239 (Salvatore Mesoraca). - Added
DWARFinput format that reads debug symbols from an ELF executable/library/object file. #55450 (Michael Kolupaev). - Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns
_errorand_raw_message(for NATS and RabbitMQ),_raw_record(for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settingsnats_handle_error_modefor NATS,rabbitmq_handle_error_modefor RabbitMQ,handle_error_modefor FileLog similar tokafka_handle_error_mode. If it's set todefault, en exception will be thrown when ClickHouse fails to parse a record, if it's set tostream, erorr and raw record will be saved into virtual columns. Closes #36035. #55477 (Kruglov Pavel). - Keeper client improvement: add
get_all_children_number commandthat returns number of all children nodes under a specific path. #55485 (guoxiaolong). - Keeper client improvement: add
get_direct_children_numbercommand that returns number of direct children nodes under a path. #55898 (xuzifu666). - Add statement
SHOW SETTING setting_namewhich is a simpler version of existing statementSHOW SETTINGS. #55979 (Maksim Kita). - Added fields
substreamsandfilenamesto thesystem.parts_columnstable. #55108 (Anton Popov). - Add support for
SHOW MERGESquery. #55815 (megao). - Introduce a setting
create_table_empty_primary_key_by_defaultfor defaultORDER BY (). #55899 (Srikanth Chekuri).
Performance Improvement
- Add option
query_plan_preserve_num_streams_after_window_functionsto preserve the number of streams after evaluating window functions to allow parallel stream processing. #50771 (frinkr). - Release more streams if data is small. #53867 (Jiebin Sun).
- RoaringBitmaps being optimized before serialization. #55044 (UnamedRus).
- Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. #55069 (Harry Lee).
- Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. #55121 (Maksim Kita).
- Improved the performance of inverted index creation by 30%. This was achieved by replacing
std::unordered_mapwithabsl::flat_hash_map. #55210 (Harry Lee). - Support ORC filter push down (rowgroup level). #55330 (李扬).
- Improve performance of external aggregation with a lot of temporary files. #55489 (Maksim Kita).
- Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. #55654 (Alexey Milovidov).
- Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses #55653. #55683 (Amos Bird).
- Cache CAST function in set during execution to improve the performance of function
INwhen set element type doesn't exactly match column type. #55712 (Duc Canh Le). - Performance improvement for
ColumnVector::insertManyandColumnVector::insertManyFrom. #55714 (frinkr). - Optimized Map subscript operations by predicting the next row's key position and reduce the comparisons. #55929 (lgbo).
- Support struct fields pruning in Parquet (in previous versions it didn't work in some cases). #56117 (lgbo).
- Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. #51692 (Raúl Marín).
- Optimized external aggregation memory consumption in case many temporary files were generated. #54798 (Nikita Taranov).
- Distributed queries executed in
async_socket_for_remotemode (default) now respectmax_threadslimit. Previously, some queries could create excessive threads (up tomax_distributed_connections), causing server performance issues. #53504 (filimonov). - Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. #54828 (Duc Canh Le).
- Experimental inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (
densityparameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parametermax_rows_per_postings_list(default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. #55616 (Harry Lee). - Improve write performance to
EmbeddedRocksDBtables. #55732 (Duc Canh Le). - Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of
TOO_MANY_PARTSerrors. #55526 (Nikita Mikhaylov). - Reduced memory consumption during loading of hierarchical dictionaries. #55838 (Nikita Taranov).
- All dictionaries support setting
dictionary_use_async_executor. #55839 (vdimir). - Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. #55947 (Raúl Marín).
- On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in
DB::KeeperStorage::getSessionsWithWatchesCount. The fix is to avoid traversing heavywatchesandlist_watchessets. #56054 (Alexander Gololobov). - Add setting
optimize_trivial_approximate_count_queryto usecountapproximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. #55806 (Duc Canh Le).
Improvement
- Functions
toDayOfWeek(MySQL alias:DAYOFWEEK),toYearWeek(YEARWEEK) andtoWeek(WEEK) now supportsStringarguments. This makes its behavior consistent with MySQL's behavior. #55589 (Robert Schulze). - Introduced setting
date_time_overflow_behaviorwith possible valuesignore,throw,saturatethat controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. #55696 (Andrey Zvonov). - Implement query parameters support for
ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}. Merges #49516. Closes #49449. #55604 (alesapin). - Print processor ids in a prettier manner in EXPLAIN. #48852 (Vlad Seliverstov).
- Creating a direct dictionary with a lifetime field will be rejected at create time (as the lifetime does not make sense for direct dictionaries). Fixes: #27861. #49043 (Rory Crispin).
- Allow parameters in queries with partitions like
ALTER TABLE t DROP PARTITION. Closes #49449. #49516 (Nikolay Degterinsky). - Add a new column
xidforsystem.zookeeper_connection. #50702 (helifu). - Display the correct server settings in
system.server_settingsafter configuration reload. #53774 (helifu). - Add support for mathematical minus
−character in queries, similar to-. #54100 (Alexey Milovidov). - Add replica groups to the experimental
Replicateddatabase engine. Closes #53620. #54421 (Nikolay Degterinsky). - It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. #54770 (Sema Checherinda).
- Add load balancing mode
hostname_levenshtein_distance. #54826 (JackyWoo). - Improve hiding secrets in logs. #55089 (Vitaly Baranov).
- For now the projection analysis will be performed only on top of query plan. The setting
query_plan_optimize_projectionbecame obsolete (it was enabled by default long time ago). #55112 (Nikita Mikhaylov). - When function
untupleis now called on a tuple with named elements and itself has an alias (e.g.select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). #55123 (garcher22). - Added setting
describe_include_virtual_columns, which allows to include virtual columns of table into result ofDESCRIBEquery. Added settingdescribe_compact_output. If it is set totrue,DESCRIBEquery returns only names and types of columns without extra information. #55129 (Anton Popov). - Sometimes
OPTIMIZEwithoptimize_throw_if_noop=1may fail with an errorunknown reasonwhile the real cause of it - different projections in different parts. This behavior is fixed. #55130 (Nikita Mikhaylov). - Allow to have several
MaterializedPostgreSQLtables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is a backward-incompatible change), but can be turned on with settingmaterialized_postgresql_use_unique_replication_consumer_identifier. Closes #54918. #55145 (Kseniia Sumarokova). - Allow to parse negative
DateTime64andDateTimewith fractional part from short strings. #55146 (Andrey Zvonov). - To improve compatibility with MySQL, 1.
information_schema.tablesnow includes the new fieldtable_rows, and 2.information_schema.columnsnow includes the new fieldextra. #55215 (Robert Schulze). - Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. #55240 (Salvatore Mesoraca).
- Support rename table without keyword
TABLElikeRENAME db.t1 to db.t2. #55373 (凌涛). - Add
internal_replicationtosystem.clusters. #55377 (Konstantin Morozov). - Select remote proxy resolver based on request protocol, add proxy feature docs and remove
DB::ProxyConfiguration::Protocol::ANY. #55430 (Arthur Passos). - Avoid retrying keeper operations on INSERT after table shutdown. #55519 (Azat Khuzhin).
SHOW COLUMNSnow correctly reports typeFixedStringasBLOBif settinguse_mysql_types_in_show_columnsis on. Also added two new settings,mysql_map_string_to_text_in_show_columnsandmysql_map_fixed_string_to_text_in_show_columnsto switch the output for typesStringandFixedStringasTEXTorBLOB. #55617 (Serge Klochkov).- During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. #55645 (alesapin).
- The advanced dashboard now supports draggable charts on touch devices. This closes #54206. #55649 (Alexey Milovidov).
- Use the default query format if declared when outputting exception with
http_write_exception_in_output_format. #55739 (Raúl Marín). - Provide a better message for common MATERIALIZED VIEW pitfalls. #55826 (Raúl Marín).
- If you dropped the current database, you will still be able to run some queries in
clickhouse-localand switch to another database. This makes the behavior consistent withclickhouse-client. This closes #55834. #55853 (Alexey Milovidov). - Functions
(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)now support string-encoded date arguments, e.g.SELECT addDays('2023-10-22', 1). This increases compatibility with MySQL and is needed by Tableau Online. #55869 (Robert Schulze). - The setting
apply_deleted_maskwhen disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. #55952 (Alexander Gololobov). - Allow skipping
nullvalues when serailizing Tuple to json objects, which makes it possible to keep compatibility with Spark'sto_jsonfunction, which is also useful for gluten. #55956 (李扬). - Functions
(add|sub)Datenow support string-encoded date arguments, e.g.SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE). The same support for string-encoded date arguments is added to the plus and minus operators, e.g.SELECT '2023-10-23' + INTERVAL 1 DAY. This increases compatibility with MySQL and is needed by Tableau Online. #55960 (Robert Schulze). - Allow unquoted strings with CR (
\r) in CSV format. Closes #39930. #56046 (Kruglov Pavel). - Allow to run
clickhouse-keeperusing embedded config. #56086 (Maksim Kita). - Set limit of the maximum configuration value for
queued.min.messagesto avoid problem with start fetching data with Kafka. #56121 (Stas Morozov). - Fixed a typo in SQL function
minSampleSizeContinous(renamedminSampleSizeContinuous). Old name is preserved for backward compatibility. This closes: #56139. #56143 (Dorota Szeremeta). - Print path for broken parts on disk before shutting down the server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. #56181 (Duc Canh Le).
Build/Testing/Packaging Improvement
- If the database in Docker is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). #50724 (Alexander Nikolaev).
- Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checking out the submodules. #51435 (Ilya Yatsishin).
- An error was occuring when building ClickHouse with the AVX series of instructions enabled globally (which isn't recommended). The reason is that snappy does not enable
SNAPPY_HAVE_X86_CRC32. #55049 (monchickey). - Solve issue with launching standalone
clickhouse-keeperfromclickhouse-serverpackage. #55226 (Mikhail f. Shiryaev). - In the tests, RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. #55424 (Ilya Yatsishin).
- Modified the error message difference between openssl and boringssl to fix the functional test. #55975 (MeenaRenganathan22).
- Use upstream repo for apache datasketches. #55787 (Nikita Taranov).
Bug Fix (user-visible misbehavior in an official stable release)
- Skip hardlinking inverted index files in mutation #47663 (cangyin).
- Fixed bug of
matchfunction (regex) with pattern containing alternation produces incorrect key condition. Closes #53222. #54696 (Yakov Olkhovskiy). - Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN #51746 (Nikolai Kochetov).
- Support missed experimental
Object(Nullable(json))subcolumns in query. #54052 (zps). - Re-add fix for
accurateCastOrNull#54629 (Salvatore Mesoraca). - Fix detecting
DEFAULTfor columns of a Distributed table created without AS #55060 (Vitaly Baranov). - Proper cleanup in case of exception in ctor of ShellCommandSource #55103 (Alexander Gololobov).
- Fix deadlock in LDAP assigned role update #55119 (Julian Maicher).
- Suppress error statistics update for internal exceptions #55128 (Robert Schulze).
- Fix deadlock in backups #55132 (alesapin).
- Fix storage Iceberg files retrieval #55144 (Kseniia Sumarokova).
- Fix partition pruning of extra columns in set. #55172 (Amos Bird).
- Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity #55202 (Duc Canh Le).
- Fix for background download in fs cache #55252 (Kseniia Sumarokova).
- Avoid possible memory leaks in compressors in case of missing buffer finalization #55262 (Azat Khuzhin).
- Fix functions execution over sparse columns #55275 (Azat Khuzhin).
- Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree #55276 (Azat Khuzhin).
- Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy #55309 (alesapin).
- Fix a crash in MergeSortingPartialResultTransform (due to zero chunks after
remerge) #55335 (Azat Khuzhin). - Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception #55338 (Azat Khuzhin).
- Fix trash optimization (up to a certain extent) #55353 (Alexey Milovidov).
- Fix leak in StorageHDFS #55370 (Azat Khuzhin).
- Fix parsing of arrays in cast operator #55417 (Anton Popov).
- Fix filtering by virtual columns with OR filter in query #55418 (Azat Khuzhin).
- Fix MongoDB connection issues #55419 (Nikolay Degterinsky).
- Fix MySQL interface boolean representation #55427 (Serge Klochkov).
- Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting #55479 (Serge Klochkov).
- Make
use_mysql_types_in_show_columnsaffect onlySHOW COLUMNS#55481 (Robert Schulze). - Fix stack symbolizer parsing
DW_FORM_ref_addrincorrectly and sometimes crashing #55483 (Michael Kolupaev). - Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor #55516 (Kruglov Pavel).
- Fix Query Parameters not working with custom HTTP handlers #55521 (Konstantin Bogdanov).
- Fix checking of non handled data for Values format #55527 (Azat Khuzhin).
- Fix 'Invalid cursor state' in odbc interacting with MS SQL Server #55558 (vdimir).
- Fix max execution time and 'break' overflow mode #55577 (Alexander Gololobov).
- Fix crash in QueryNormalizer with cyclic aliases #55602 (vdimir).
- Disable wrong optimization and add a test #55609 (Alexey Milovidov).
- Merging #52352 #55621 (Alexey Milovidov).
- Add a test to avoid incorrect decimal sorting #55662 (Amos Bird).
- Fix progress bar for s3 and azure Cluster functions with url without globs #55666 (Kruglov Pavel).
- Fix filtering by virtual columns with OR filter in query (resubmit) #55678 (Azat Khuzhin).
- Fixes and improvements for Iceberg storage #55695 (Kruglov Pavel).
- Fix data race in CreatingSetsTransform (v2) #55786 (Azat Khuzhin).
- Throw exception when parsing illegal string as float if precise_float_parsing is true #55861 (李扬).
- Disable predicate pushdown if the CTE contains stateful functions #55871 (Raúl Marín).
- Fix normalize ASTSelectWithUnionQuery, as it was stripping
FORMATfrom the query #55887 (flynn). - Try to fix possible segfault in Native ORC input format #55891 (Kruglov Pavel).
- Fix window functions in case of sparse columns. #55895 (János Benjamin Antal).
- fix: StorageNull supports subcolumns #55912 (FFish).
- Do not write retriable errors for Replicated mutate/merge into error log #55944 (Azat Khuzhin).
- Fix
SHOW DATABASES LIMIT <N>#55962 (Raúl Marín). - Fix autogenerated Protobuf schema with fields with underscore #55974 (Kruglov Pavel).
- Fix dateTime64ToSnowflake64() with non-default scale #55983 (Robert Schulze).
- Fix output/input of Arrow dictionary column #55989 (Kruglov Pavel).
- Fix fetching schema from schema registry in AvroConfluent #55991 (Kruglov Pavel).
- Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table #55995 (Michael Kolupaev).
- Fix incorrect free space accounting for least_used JBOD policy #56030 (Azat Khuzhin).
- Fix missing scalar issue when evaluating subqueries inside table functions #56057 (Amos Bird).
- Fix wrong query result when http_write_exception_in_output_format=1 #56135 (Kruglov Pavel).
- Fix schema cache for fallback JSON->JSONEachRow with changed settings #56172 (Kruglov Pavel).
- Add error handler to odbc-bridge #56185 (Yakov Olkhovskiy).
ClickHouse release 23.9, 2023-09-28
Backward Incompatible Change
- Remove the
status_infoconfiguration option and dictionaries status from the default Prometheus handler. #54090 (Alexey Milovidov). - The experimental parts metadata cache is removed from the codebase. #54215 (Alexey Milovidov).
- Disable setting
input_format_json_try_infer_numbers_from_stringsby default, so we don't try to infer numbers from strings in JSON formats by default to avoid possible parsing errors when sample data contains strings that looks like a number. #55099 (Kruglov Pavel).
New Feature
- Improve schema inference from JSON formats: 1) Now it's possible to infer named Tuples from JSON objects without experimental JSON type under a setting
input_format_json_try_infer_named_tuples_from_objectsin JSON formats. Previously without experimental type JSON we could only infer JSON objects as Strings or Maps, now we can infer named Tuple. Resulting Tuple type will conain all keys of objects that were read in data sample during schema inference. It can be useful for reading structured JSON data without sparse objects. The setting is enabled by default. 2) Allow parsing JSON array into a column with type String under settinginput_format_json_read_arrays_as_strings. It can help reading arrays with values with different types. 3) Allow to use type String for JSON keys with unkown types (null/[]/{}) in sample data under settinginput_format_json_infer_incomplete_types_as_strings. Now in JSON formats we can read any value into String column and we can avoid getting errorCannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Mapsduring schema inference by using type String for unknown types, so the data will be read successfully. #54427 (Kruglov Pavel). - Added IO scheduling support for remote disks. Storage configuration for disk types
s3,s3_plain,hdfsandazure_blob_storagecan now containread_resourceandwrite_resourceelements holding resource names. Scheduling policies for these resources can be configured in a separate server configuration sectionresources. Queries can be marked using settingworkloadand classified using server configuration sectionworkload_classifiersto achieve diverse resource scheduling goals. More details in the docs. #47009 (Sergei Trifonov). Added "bandwidth_limit" IO scheduling node type. It allows you to specifymax_speedandmax_burstconstraints on traffic passing though this node. #54618 (Sergei Trifonov). - Added new type of authentication based on SSH keys. It works only for the native TCP protocol. #41109 (George Gamezardashvili).
- Added a new column
_block_numberfor MergeTree tables. #44532. #47532 (SmitaRKulkarni). - Add
IF EMPTYclause forDROP TABLEqueries. #48915 (Pavel Novitskiy). - SQL functions
toString(datetime, timezone)andformatDateTime(datetime, format, timezone)now support non-constant timezone arguments. #53680 (Yarik Briukhovetskyi). - Add support for
ALTER TABLE MODIFY COMMENT. Note: something similar was added by an external contributor a long time ago, but the feature did not work at all and only confused users. This closes #36377. #51304 (Alexey Milovidov). Note: this command does not propagate between replicas, so the replicas of a table could have different comments. - Added
GCDa.k.a. "greatest common denominator" as a new data compression codec. The codec computes the GCD of all column values, and then divides each value by the GCD. The GCD codec is a data preparation codec (similar to Delta and DoubleDelta) and cannot be used stand-alone. It works with data integer, decimal and date/time type. A viable use case for the GCD codec are column values that change (increase/decrease) in multiples of the GCD, e.g. 24 - 28 - 16 - 24 - 8 - 24 (assuming GCD = 4). #53149 (Alexander Nam). - Two new type aliases
DECIMAL(P)(as shortcut forDECIMAL(P, 0)andDECIMAL(as shortcut forDECIMAL(10, 0)) were added. This makes ClickHouse more compatible with MySQL's SQL dialect. #53328 (Val Doroshchuk). - Added a new system log table
backup_logto track allBACKUPandRESTOREoperations. #53638 (Victor Krasnov). - Added a format setting
output_format_markdown_escape_special_characters(default: false). The setting controls whether special characters like!,#,$etc. are escaped (i.e. prefixed by a backslash) in theMarkdownoutput format. #53860 (irenjj). - Add function
decodeHTMLComponent. #54097 (Bharat Nallan). - Added
peak_threads_usageto query_log table. #54335 (Alexey Gerasimchuck). - Add
SHOW FUNCTIONSsupport to clickhouse-client. #54337 (Julia Kartseva). - Added function
toDaysSinceYearZerowith aliasTO_DAYS(for compatibility with MySQL) which returns the number of days passed since0001-01-01(in Proleptic Gregorian Calendar). #54479 (Robert Schulze). FunctiontoDaysSinceYearZeronow supports arguments of typeDateTimeandDateTime64. #54856 (Serge Klochkov). - Added functions
YYYYMMDDtoDate,YYYYMMDDtoDate32,YYYYMMDDhhmmssToDateTimeandYYYYMMDDhhmmssToDateTime64. They convert a date or date with time encoded as integer (e.g. 20230911) into a native date or date with time. As such, they provide the opposite functionality of existing functionsYYYYMMDDToDate,YYYYMMDDToDateTime,YYYYMMDDhhmmddToDateTime,YYYYMMDDhhmmddToDateTime64. #54509 (Quanfa Fu) (Robert Schulze). - Add several string distance functions, including
byteHammingDistance,editDistance. #54935 (flynn). - Allow specifying the expiration date and, optionally, the time for user credentials with
VALID UNTIL datetimeclause. #51261 (Nikolay Degterinsky). - Allow S3-style URLs for table functions
s3,gcs,oss. URL is automatically converted to HTTP. Example:'s3://clickhouse-public-datasets/hits.csv'is converted to'https://clickhouse-public-datasets.s3.amazonaws.com/hits.csv'. #54931 (Yarik Briukhovetskyi). - Add new setting
print_pretty_type_namesto print pretty deep nested types like Tuple/Maps/Arrays. #55095 (Kruglov Pavel).
Performance Improvement
- Speed up reading from S3 by enabling prefetches by default. #53709 (Alexey Milovidov).
- Do not implicitly read PK and version columns in lonely parts if unnecessary for queries with FINAL. #53919 (Duc Canh Le).
- Optimize group by constant keys. Will optimize queries with group by
_file/_pathafter https://github.com/ClickHouse/ClickHouse/pull/53529. #53549 (Kruglov Pavel). - Improve performance of sorting for
Decimalcolumns. Improve performance of insertion intoMergeTreeif ORDER BY contains aDecimalcolumn. Improve performance of sorting when data is already sorted or almost sorted. #35961 (Maksim Kita). - Improve performance for huge query analysis. Fixes #51224. #51469 (frinkr).
- An optimization to rewrite
COUNT(DISTINCT ...)and variousuniqvariants tocountif it is selected from a subquery with GROUP BY. #52082 #52645 (JackyWoo). - Remove manual calls to
mmap/mremap/munmapand delegate all this work tojemalloc- and it slightly improves performance. #52792 (Nikita Taranov). - Fixed high in CPU consumption when working with NATS. #54399 (Vasilev Pyotr).
- Since we use separate instructions for executing
toStringwith datetime argument, it is possible to improve performance a bit for non-datetime arguments and have some parts of the code cleaner. Follows up #53680. #54443 (Yarik Briukhovetskyi). - Instead of serializing json elements into a
std::stringstream, this PR try to put the serialization result intoColumnStringdireclty. #54613 (lgbo). - Enable ORDER BY optimization for reading data in corresponding order from a MergeTree table in case that the table is behind a view. #54628 (Vitaly Baranov).
- Improve JSON SQL functions by reusing
GeneratorJSONPathand removing several shared pointers. #54735 (lgbo). - Keeper tries to batch flush requests for better performance. #53049 (Antonio Andelic).
- Now
clickhouse-clientprocesses files in parallel in case ofINFILE 'glob_expression'. Closes #54218. #54533 (Max K.). - Allow to use primary key for IN function where primary key column types are different from
INfunction right side column types. Example:SELECT id FROM test_table WHERE id IN (SELECT '5'). Closes #48936. #54544 (Maksim Kita). - Hash JOIN tries to shrink internal buffers consuming half of maximal available memory (set by
max_bytes_in_join). #54584 (vdimir). - Respect
max_block_sizefor array join to avoid possible OOM. Close #54290. #54664 (