Slides from Citus Con 2023.
Describes our experience with Citus&JSON for real-time analytics, including topics like common caveats and workarounds, maintenance practices, and interaction with the users.
best new games on Facebook in 2014. • 2015. Knights and Brides won Facebook’s best web game award • 2020. VIZOR ranked 12th in TOP 30 EMEA Headquartered Overall publishers by Revenue (iOS App Store & Google Play). According to data.ai. • 2021. VIZOR ranked 12th in TOP 30 Overall EMEA Headquartered publishers WorldWide by Consumer spend (iOS App Store and Google Play). According to data.ai.
USING (user_id) INNER JOIN profiles.volatile vol ON ( event."user_id" = vol."user_id" AND vol.updated_time <= event.time AND event.time < vol.up_to_time);
does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately known as TOAST (or “the best thing since sliced bread”). The TOAST infrastructure is also used to improve handling of large data values in-memory. docs
handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words. docs
SELECT * from citus_add_node('worker-3', 5432); -- Two screens later... SELECT * from citus_add_node('worker-n', 5432); -- Blazingly fast SELECT * FROM some_event WHERE data ->> 'some_key' = 'some_value';
NULL CONSTRAINT pk_permanent PRIMARY KEY, -- ... ); CREATE INDEX IF NOT EXISTS volatile_profiles_user_id_time_index ON profiles.volatile (user_id, update_time, up_to_time);
SET some_key = data_to_inject.some_key FROM (SELECT ... FROM events."some_event" AS event ...) data_to_inject WHERE event_to_inject.user_id = data_to_inject.user_id AND event_to_inject.time = data_to_inject.time;
events.some_event event_to_inject SET some_key = data_to_inject.some_key FROM (SELECT ... FROM events."some_event" AS event ...) data_to_inject WHERE event_to_inject.user_id = data_to_inject.user_id AND event_to_inject.time = data_to_inject.time; END;
UNION ALL SELECT user_id, time, data ->> 'some_key' AS some_key FROM events.some_event AS event INNER JOIN profiles.permanent perm USING (user_id) WHERE event.time > current_date
->> 'some_key') AS some_key_avg FROM events.some_event GROUP BY user_id UNION ALL SELECT * FROM dblink('tenant_two', $$ SELECT ... FROM events.some_event ...;$$) AS t(user_id TEXT, some_key_avg NUMERIC);
problematic_query RECORD; BEGIN FOR problematic_query IN (SELECT pid, query FROM (SELECT now() - query_start AS run_time, query, pid FROM pg_stat_activity WHERE usename IN (...) sub WHERE run_time >= INTERVAL '20 MINUTES') LOOP RAISE WARNING 'Killing pid % that was running %', problematic_query.pid, problematic_query.query; PERFORM pg_terminate_backend(problematic_query.pid); END LOOP; END; $proc$; SELECT cron.schedule('*/5 * * * *', $$CALL admin.kill_long_analytical_queries()$$);
-c "SELECT pid FROM pg_stat_activity WHERE usename!='postgres'" ) | head -20 ) for PID in ${PIDS} do QUERY_ID=$( psql postgres -qt -c "SELECT COALESCE(query_id, 0) FROM pg_stat_activity WHERE pid=${PID}" | sed 's/ //g' ) MEM=$( ps -o size --no-headers ${PID} ) if [ -n "${MEM}" ] then echo '{"key":"processes_postgres_memory_usage_kilobytes", "pid":"'$ {PID}'", "query_id":"'${QUERY_ID}'", "value":'${MEM}'},' fi done
total_exec_time FROM pg_stat_statements t1 JOIN pg_database t2 ON (t1.dbid = t2.oid) JOIN pg_roles t3 ON (t1.userid = t3.oid) WHERE t3.rolname IN (...) AND total_exec_time > 0;
= 60000; COPY long_query_examples FROM '/var/lib/postgresql/14/data/log/postgresql-<day_of_week>.csv' WITH CSV WHERE command_tag IN ('SELECT', 'REFRESH MATERIALIZED VIEW', 'CALL') AND message LIKE '%%duration:%%';
'2020-01-01' AND time < '2020-02-01') SELECT '*******' AS user_id_masked, count(er.user_id) AS "amount" FROM sc INNER JOIN events."another_event" er ON sc.data ->> 'country' = er.data ->> 'country' GROUP BY er.user_id;
NOT NULL ) PARTITION BY RANGE (logdate); SELECT create_distributed_table('measurement', 'city_id'); CREATE TABLE measurement_y2006m02 PARTITION OF measurement FOR VALUES FROM ('2006-02-01') TO ('2006-03-01') PARTITION BY RANGE (peaktemp); [0A000] ERROR: distributing multi-level partitioned tables is not supported