PostgreSQL vs ClickHouse: Filtering and Sorting
Introduction
Using a database-per-service approach in a microservice architecture is standard practice for good reason. It allows each microservice to be independently maintainable and deployable. The downside appears when a client needs a single datagrid assembled from multiple services, with sorting and filtering across any column.
In that model, one service has to fetch data from the others, assemble the combined result set in memory, then sort, filter, and page before returning a response. That can work at small scale, but it does not scale well as row counts grow.
The two alternatives explored here are a shared PostgreSQL database with logical schemas and a cross-schema view for read-heavy grid queries, and a ClickHouse read-only projection that combines data across services for analytical-style reads.
I set up performance tests for all three approaches in https://github.com/markorskip/olap-vs-api-performance-test. The benchmark stack uses Spring Boot, PostgreSQL, and ClickHouse. I tested datasets of 1K, 10K, 100K, 1M, 10M, and 100M rows. I was not able to load 1 billion rows on my laptop.
I ran everything locally in Docker and verified the results both through the browser and with Python scripts.
Benchmark Summary
This document summarizes the benchmark runs captured from the local Docker-based benchmark stack on 2026-03-19. Each dataset size was loaded into PostgreSQL using the repo’s dataset presets, ClickHouse was refreshed from PostgreSQL after each load, and query latency was measured through POST /api/grid/orders/benchmark with 3 iterations using page=0, size=100, sortBy=orderedAt, and sortDirection=desc.
For test practicality, the harness used a single PostgreSQL instance even for the API aggregation scenario. That does not materially change what scenario 1 is measuring in this article. The primary cost in that scenario is the application-layer fan-out, in-memory join, sort, filter, and pagination performed by the grid service. Whether the source data sits in one physical PostgreSQL instance or in several service-owned PostgreSQL instances is not the main variable in this benchmark, because the aggregation still happens in the microservice rather than in the database.
The data model in this benchmark is intentionally synthetic. Orders, customers, and products are only example entities used to create a repeatable joined grid workload. The goal is not to model a specific production domain, but to test how different architectural read patterns behave under the same logical access pattern as row counts grow.
The reason for this benchmark is architectural rather than purely technical. The current application behavior is an API aggregation pattern: the grid service reads orders from one source, calls other services for related customer and product data, then joins, sorts, and pages the result in application memory. That matches the current microservice-friendly approach, but it becomes expensive as data volume grows because every query has to coordinate multiple data sources and materialize more rows inside the application.
This project benchmarks that current approach against two alternatives that map directly to the architectural decision under consideration:
API aggregation in memory: keep the current approach, with aggregation and sorting in the application layerSingle PostgreSQL database with logical schemas: move read-heavy grid queries into one operational database while preserving domain separation through schemasClickHouse read-only projection: keep operational ownership elsewhere, but expose the grid from a read-optimized analytical store refreshed from source data
In practical terms, the question is not just which query is faster. The question is whether the current API aggregation pattern should remain the primary read path, whether those reads should be consolidated into one PostgreSQL database with logical separation, or whether they should move to a dedicated read-only ClickHouse projection. The benchmark is meant to show where the current API aggregation model stops scaling, where a schema-separated PostgreSQL approach starts to degrade, and where a ClickHouse projection becomes worth the refresh and storage overhead.
Architecture Diagrams
The diagrams below show the conceptual deployment models each benchmark scenario represents. The benchmark harness itself is simplified as described above.
1. API Aggregation In Memory
Each microservice owns its own PostgreSQL database. The grid service calls each service, assembles the combined result in memory, then sorts, filters, and pages before returning the response.
flowchart LR
Client["Client"]
GridService["Grid Service"]
JoinStep["Assemble, sort, filter, and page in memory"]
OrdersService["Orders Service"]
CustomersService["Customers Service"]
ProductsService["Products Service"]
OrdersDb[("Orders PostgreSQL")]
CustomersDb[("Customers PostgreSQL")]
ProductsDb[("Products PostgreSQL")]
Client -->|datagrid request| GridService
GridService -->|fetch orders| OrdersService
GridService -->|fetch customers| CustomersService
GridService -->|fetch products| ProductsService
OrdersService --> OrdersDb
CustomersService --> CustomersDb
ProductsService --> ProductsDb
OrdersService --> GridService
CustomersService --> GridService
ProductsService --> GridService
GridService --> JoinStep
JoinStep -->|paged response| Client
2. Shared PostgreSQL With Logical Schemas And A View
Each microservice still owns its data model, but ownership is enforced by schema boundaries inside one shared PostgreSQL database. The grid service queries a database view that joins across schemas and lets PostgreSQL handle sorting and filtering.
flowchart LR
Client["Client"]
GridService["Grid Service"]
OrdersService["Orders Service"]
CustomersService["Customers Service"]
ProductsService["Products Service"]
Client -->|datagrid request| GridService
subgraph SharedDb["Shared PostgreSQL Database"]
direction TB
GridView["Datagrid View"]
subgraph Schemas["Logical Schemas"]
direction LR
OrdersSchema["orders schema"]
CustomersSchema["customers schema"]
ProductsSchema["products schema"]
end
GridView --> OrdersSchema
GridView --> CustomersSchema
GridView --> ProductsSchema
end
OrdersService --> OrdersSchema
CustomersService --> CustomersSchema
ProductsService --> ProductsSchema
GridService -->|query view with sort, filter, paging| GridView
GridService -->|paged response| Client
3. ClickHouse Read-Only Projection
Each microservice keeps its own PostgreSQL database. A dedicated ClickHouse read service maintains a denormalized projection refreshed from those source databases and serves the datagrid from ClickHouse.
flowchart LR
Client["Client"]
GridReadService["ClickHouse Grid Service"]
SyncJob["Projection Refresh Pipeline"]
OrdersService["Orders Service"]
CustomersService["Customers Service"]
ProductsService["Products Service"]
OrdersDb[("Orders PostgreSQL")]
CustomersDb[("Customers PostgreSQL")]
ProductsDb[("Products PostgreSQL")]
ClickHouseDb[("ClickHouse Projection")]
Client -->|datagrid request| GridReadService
GridReadService -->|query with sort, filter, paging| ClickHouseDb
GridReadService -->|paged response| Client
OrdersService --> OrdersDb
CustomersService --> CustomersDb
ProductsService --> ProductsDb
OrdersDb --> SyncJob
CustomersDb --> SyncJob
ProductsDb --> SyncJob
SyncJob --> ClickHouseDb
Storage values are reported using the current application metrics:
- PostgreSQL: whole database size via
pg_database_size(current_database()) - ClickHouse: active data size via
sum(bytes_on_disk)fromsystem.parts
Query Time Results
| Dataset | API Aggregation In Memory Avg (ms) | Single PostgreSQL Database With Logical Schemas Avg (ms) | ClickHouse Read-Only Projection Avg (ms) | Notes |
|---|---|---|---|---|
| 1K | 25.00 | 2.33 | 21.67 | All three patterns completed |
| 10K | 142.33 | 3.67 | 24.33 | All three patterns completed |
| 100K | N/A | 18.33 | 25.67 | API aggregation returned HTTP 500 |
| 1M | N/A | 52.33 | 23.33 | API aggregation returned HTTP 500 |
| 10M | N/A | 868.33 | 52.33 | API aggregation skipped as impractical |
| 100M | N/A | 15897.67 | 68.67 | API aggregation skipped as impractical |
Database Size Results
| Dataset | PostgreSQL DB Size (MiB) | ClickHouse Active Data Size (MiB) | Load Time (s) | ClickHouse Refresh Time (s) |
|---|---|---|---|---|
| 1K | 8.24 | 0.03 | 0.31 | 0.12 |
| 10K | 11.72 | 0.35 | 0.40 | 0.13 |
| 100K | 45.41 | 3.32 | 1.57 | 0.24 |
| 1M | 170.68 | 27.74 | 8.32 | 1.36 |
| 10M | 1626.60 | 353.33 | 91.65 | 16.70 |
| 100M | 15975.97 | 4035.30 | 1182.29 | 300.21 |
Performance Charts
Small-Scale Latency Comparison
Series order: API aggregation in memory, single PostgreSQL database with logical schemas, ClickHouse read-only projection.
xychart-beta
title "Small-Scale Query Latency"
x-axis ["1K", "10K"]
y-axis "Average latency (ms)" 0 --> 160
bar [25, 142]
bar [2, 4]
bar [22, 24]
Full-Scale Latency Comparison
Series order: single PostgreSQL database with logical schemas, ClickHouse read-only projection.
xychart-beta
title "Full-Scale Query Latency"
x-axis ["1K", "10K", "100K", "1M", "10M", "100M"]
y-axis "Average latency (ms)" 0 --> 17000
line [2, 4, 18, 52, 868, 15898]
line [22, 24, 26, 23, 52, 69]
Database Size Comparison
Series order: PostgreSQL, ClickHouse.
xychart-beta
title "Database Size by Dataset"
x-axis ["1K", "10K", "100K", "1M", "10M", "100M"]
y-axis "Size (MiB)" 0 --> 17000
line [8, 12, 45, 171, 1627, 15976]
line [0, 0, 3, 28, 353, 4035]
Summary
The single PostgreSQL database with logical schemas was the fastest option through 100K rows, but it degraded sharply after that point. The ClickHouse read-only projection was slightly slower at the smallest sizes, then overtook PostgreSQL by 1M rows and stayed comparatively flat all the way to 100M.
The API aggregation in memory approach only completed at 1K and 10K, failed with HTTP 500 by 100K, and was not attempted at 10M and 100M because the design requires materializing and joining the entire dataset in memory. That failure mode is itself a useful result because it shows the practical upper bound of the current aggregation model.
From an architecture perspective, the current API aggregation approach appears viable only at small scale. A single PostgreSQL database with logical schemas looks like the lowest-friction replacement if the workload stays in the 1K to 1M range and query latency in the tens of milliseconds is acceptable. If the target is predictable performance at 10M to 100M rows for grid-style queries, the ClickHouse read-only projection is the stronger read path, provided the system can tolerate refresh latency and the operational complexity of maintaining a second store.
On storage, ClickHouse remained materially smaller than PostgreSQL across every tested size. At 100M, PostgreSQL reached about 15.60 GiB of database size while ClickHouse active data was about 3.94 GiB, with a 100M PostgreSQL load taking about 19.7 minutes and the corresponding ClickHouse refresh taking about 5.0 minutes.