- Jun 22, 2023
-
- Jun 21, 2023
-
-
Avi Kivity authored
* seastar 32ab15cda6...29a0e64513 (1): > reactor: change shares for default IO class from 1 to 200 Fixes #13753. In 5.3: 37e6e652
-
- Jun 15, 2023
-
-
Botond Dénes authored
Backport the following improvements for test.py topology tests for CI stability: - https://github.com/scylladb/scylladb/pull/12652 - https://github.com/scylladb/scylladb/pull/12630 - https://github.com/scylladb/scylladb/pull/12619 - https://github.com/scylladb/scylladb/pull/12686 - picked from https://github.com/scylladb/scylladb/pull/12726: 9ceb6aba - picked from https://github.com/scylladb/scylladb/pull/12173: fc604844 - https://github.com/scylladb/scylladb/pull/12765 - https://github.com/scylladb/scylladb/pull/12804 - https://github.com/scylladb/scylladb/pull/13342 - https://github.com/scylladb/scylladb/pull/13589 - picked from https://github.com/scylladb/scylladb/pull/13135: 7309a1bd - picked from https://github.com/scylladb/scylladb/pull/13134: 21b505e6, a4411e9e, c1d0ee2b, 8e3392c6, 794d0e40, e407956e - https://github.com/scylladb/scylladb/pull/13271 - https://github.com/scylladb/scylladb/pull/13399 - picked from https://github.com/scylladb/scylladb/pull/12699: 3508a4e4, 08d754e1, 62a945cc, 041ee3ff - https://github.com/scylladb/scylladb/pull/13438 (but skipped the test_mutation_schema_change.py fix since I didn't backport this new test) - https://github.com/scylladb/scylladb/pull/13427 - https://github.com/scylladb/scylladb/pull/13756 - https://github.com/scylladb/scylladb/pull/13789 - https://github.com/scylladb/scylladb/pull/13933 (but skipped the test_snapshot.py fix since I didn't backport this new test) Closes #14215 * github.com:scylladb/scylladb: test: pylib: fix `read_barrier` implementation test: pylib: random_tables: perform read barrier in `verify_schema` test: issue a read barrier before checking ring consistency Merge 'scylla_cluster.py: fix read_last_line' from Gusev Petr test/pylib: ManagerClient helpers to wait for... test: pylib: Add a way to create cql connections with particular coordinators test/pylib: get gossiper alive endpoints test/topology: default replication factor 3 test/pylib: configurable replication factor scylla_cluster.py: optimize node logs reading test/pylib: RandomTables.add_column with value column scylla_cluster.py: add start flag to server_add ServerInfo: drop host_id scylla_cluster.py: add config to server_add scylla_cluster.py: add expected_error to server_start scylla_cluster.py: ScyllaServer.start, refactor error reporting scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed test: improve logging in ScyllaCluster test: topology smp test with custom cluster test/pylib: topology: support clusters of initial size 0 Merge 'test/pylib: split and refactor topology tests' from Alecco Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun test: Increase START_TIMEOUT test/pylib: one-shot error injection helper test: topology: wait for token ring/group 0 consistency after decommission test: topology: verify that group 0 and token ring are consistent Merge 'pytest: start after ungraceful stop' from Alecco Merge 'test.py: improve test failure handling' from Kamil Braun
-
- Jun 14, 2023
-
-
Pavel Emelyanov authored
This includes seastar update titled 'Merge 'Split rpc::server stop into two parts'' * br-5.2-backport-ms-shutdown: messaging_service: Shutdown rpc server on shutdown messaging_service: Generalize stop_servers() messaging_service: Restore indentation after previous patch messaging_service: Coroutinize stop() messaging_service: Coroutinize stop_servers() Update seastar submodule refs: #14031
-
Pavel Emelyanov authored
The RPC server now has a lighter .shutdown() method that just does what m.s. shutdown() needs, so call it. On stop call regular stop to finalize the stopping process Signed-off-by:
Pavel Emelyanov <xemul@scylladb.com>
-
Pavel Emelyanov authored
Make it do_with_servers() and make it accept method to call and message to print. This gives the ability to reuse this helper in next patch Signed-off-by:
Pavel Emelyanov <xemul@scylladb.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@scylladb.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@scylladb.com>
-
Pavel Emelyanov authored
Signed-off-by:
Pavel Emelyanov <xemul@scylladb.com>
-
Pavel Emelyanov authored
* seastar 8c86e6de...32ab15cd (1): > rpc: Introduce server::shutdown() Signed-off-by:
Pavel Emelyanov <xemul@scylladb.com>
-
- Jun 13, 2023
-
-
Anna Stuchlik authored
Fixes https://github.com/scylladb/scylladb/issues/14097 This commit removes support for Ubuntu 18 from platform support for ScyllaDB Enterprise 2023.1. The update is in sync with the change made for ScyllaDB 5.2. This commit must be backported to branch-5.2 and branch-5.3. Closes #14118 (cherry picked from commit b7022cd7)
-
Raphael S. Carvalho authored
After c7826aa9, sstable runs are cleaned up together. The procedure which executes cleanup was holding reference to all input sstables, such that it could later retry the same cleanup job on failure. Turns out it was not taking into account that incremental compaction will exhaust the input set incrementally. Therefore cleanup is affected by the 100% space overhead. To fix it, cleanup will now have the input set updated, by removing the sstables that were already cleaned up. On failure, cleanup will retry the same job with the remaining sstables that weren't exhausted by incremental compaction. New unit test reproduces the failure, and passes with the fix. Fixes #14035. Signed-off-by:
Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14038 (cherry picked from commit 23443e05) Signed-off-by:
Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14193
-
- Jun 12, 2023
-
-
Kamil Braun authored
The previous implementation didn't actually do a read barrier, because the statement failed on an early prepare/validate step which happened before read barrier was even performed. Change it to a statement which does not fail and doesn't perform any schema change but requires a read barrier. This breaks one test which uses `RandomTables.verify_schema()` when only one node is alive, but `verify_schema` performs a read barrier. Unbreak it by skipping the read barrier in this case (it makes sense in this particular test). Closes #13933 (cherry picked from commit 64dc76db) Backport note: skipped the test_snapshot.py change, as the test doesn't exist on this branch.
-
Kamil Braun authored
`RandomTables.verify_schema` is often called in topology tests after performing a schema change. It compares the schema tables fetched from some node to the expected latest schema stored by the `RandomTables` object. However there's no guarantee that the latest schema change has already propagated to the node which we query. We could have performed the schema change on a different node and the change may not have been applied yet on all nodes. To fix that, pick a specific node and perform a read barrier on it, then use that node to fetch the schema tables. Fixes #13788 Closes #13789 (cherry picked from commit 3f3dcf45)
-
Konstantin Osipov authored
Raft replication doesn't guarantee that all replicas see identical Raft state at all times, it only guarantees the same order of events on all replicas. When comparing raft state with gossip state on a node, first issue a read barrier to ensure the node has the latest raft state. To issue a read barrier it is sufficient to alter a non-existing state: in order to validate the DDL the node needs to sync with the leader and fetch its latest group0 state. Fixes #13518 (flaky topology test). Closes #13756 (cherry picked from commit e7c9ca56)
-
Kamil Braun authored
This is a follow-up to #13399, the patch addresses the issues mentioned there: * linesep can be split between blocks; * linesep can be part of UTF-8 sequence; * avoid excessively long lines, limit to 256 chars; * the logic of the function made simpler and more maintainable. Closes #13427 * github.com:scylladb/scylladb: pylib_test: add tests for read_last_line pytest: add pylib_test directory scylla_cluster.py: fix read_last_line scylla_cluster.py: move read_last_line to util.py (cherry picked from commit 70f2b093)
-
Alejo Sanchez authored
server to see other servers after start/restart When starting/restarting a server, provide a way to wait for the server to see at least n other servers. Also leave the implementation methods available for manual use and update previous tests, one to wait for a specific server to be seen, and one to wait for a specific server to not be seen (down). Fixes #13147 Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13438 (cherry picked from commit 11561a73) Backport note: skipped the test_mutation_schema_change.py fix as the test doesn't exist on this branch.
-
Tomasz Grabiec authored
Usage: await manager.driver_connect(server=servers[0]) manager.cql.execute(f"...", execution_profile='whitelist') (cherry picked from commit 041ee3ff)
-
Alejo Sanchez authored
Helper to get list of gossiper alive endpoints from REST API. Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> (cherry picked from commit 62a945cc)
-
Alejo Sanchez authored
For most tests there will be nodes down, increase replication factor to 3 to avoid having problems for partitions belonging to down nodes. Use replication factor 1 for raft upgrade tests. (cherry picked from commit 08d754e1)
-
Alejo Sanchez authored
Make replication factor configurable for the RandomTables helper. Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> (cherry picked from commit 3508a4e4)
-
Petr Gusev authored
There are two occasions in scylla_cluster where we read the node logs, and in both of them we read the entire file in memory. This is not efficient and may cause an OOM. In the first case we need the last line of the log file, so we seek at the end and move backwards looking for a new line symbol. In the second case we look through the log file to find the expected_error. The readlines() method returns a Python list object, which means it reads the entire file in memory. It's sufficient to just remove it since iterating over the file instance already yields lines lazily one by one. This is a follow-up for #13134. Closes #13399 (cherry picked from commit 09636b20)
-
Alejo Sanchez authored
When adding extra columns in a test, make them value column. Name them with the "v_" prefix and use the value column number counter. Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13271 (cherry picked from commit 81b40c10)
-
Petr Gusev authored
Sometimes when creating a node it's useful to just install it and not start. For example, we may want to try to start it later with expected error. The ScyllaServer.install method has been made exception safe, if an exception occurs, it reverts to the original state. This allows to not duplicate the try/except logic in two of its call sites. (cherry picked from commit e407956e)
-
Petr Gusev authored
We are going to allow the ScyllaCluster.add_server function not to start the server if the caller has requested that with a special parameter. The host_id can only be obtained from a running node, so add_server won't be able to return it in this case. I've grepped the tests for host_id and there doesn't seem to be any reference to it in the code. (cherry picked from commit 794d0e40)
-
Petr Gusev authored
Sometimes when creating a node it's useful to pass a custom node config. (cherry picked from commit 8e3392c6)
-
Petr Gusev authored
Sometimes it's useful to check that the node has failed to start for a particular reason. If server_start can't find expected_error in the node's log or if the node has started without errors, it throws an exception. (cherry picked from commit c1d0ee2b)
-
Petr Gusev authored
Extract the function that encapsulates all the error reporting logic. We are going to use it in several other places to implement expected_error feature. (cherry picked from commit a4411e9e)
-
Petr Gusev authored
The ScyllaServer expects cmd to be None if the Scylla process is not running. Otherwise, if start failed and the test called update_config, the latter will try to send a signal to a non-existent process via cmd. (cherry picked from commit 21b505e6)
-
Konstantin Osipov authored
Print IP addresses and cluster identifiers in more log messages, it helps debugging. (cherry picked from commit 7309a1bd)
-
Alejo Sanchez authored
Instead of decommission of initial cluster, use custom cluster. Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13589 (cherry picked from commit ce87aedd)
-
Alejo Sanchez authored
To allow tests with custom clusters, allow configuration of initial cluster size of 0. Add a proof-of-concept test to be removed later. Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13342 (cherry picked from commit e3b46250)
-
Nadav Har'El authored
Move long running topology tests out of `test_topology.py` and into their own files, so they can be run in parallel. While there, merge simple schema tests. Closes #12804 * github.com:scylladb/scylladb: test/topology: rename topology test file test/topology: lint and type for topology tests test/topology: move topology ip tests to own file test/topology: move topology test remove garbaje... test/topology: move topology rejoin test to own file test/topology: merge topology schema tests and... test/topology: isolate topology smp params test test/topology: move topology helpers to common file (cherry picked from commit a24600a6)
-
Botond Dénes authored
Recently we enabled RBNO by default in all topology operations. This made the operations a bit slower (repair-based topology ops are a bit slower than classic streaming - they do more work), and in debug mode with large number of concurrent tests running, they might timeout. The timeout for bootstrap was already increased before, do the same for decommission/removenode. The previously used timeout was 300 seconds (this is the default used by aiohttp library when it makes HTTP requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which is 1000 seconds. Closes #12765 * github.com:scylladb/scylladb: test/pylib: use larger timeout for decommission/removenode test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT (cherry picked from commit e55f475d)
-
Alejo Sanchez authored
Existing helper with async context manager only worked for non one-shot error injections. Fix it and add another helper for one-shot without a context manager. Fix tests using the previous helper. Signed-off-by:
Alejo Sanchez <alejo.sanchez@scylladb.com> (cherry picked from commit 9ceb6aba)
-
Kamil Braun authored
There was a check for immediate consistency after a decommission operation has finished in one of the tests, but it turns out that also after decommission it might take some time for token ring to be updated on other nodes. Replace the check with a wait. Also do the wait in another test that performs a sequence of decommissions. We won't attempt to start another decommission until every node learns that the previously decommissioned node has left. Closes #12686 (cherry picked from commit 40142a51)
-
Kamil Braun authored
After topology changes like removing a node, verify that the set of group 0 members and token ring members is the same. Modify `get_token_ring_host_ids` to only return NORMAL members. The previous version which used the `/storage_service/host_id` endpoint might have returned non-NORMAL members as well. Fixes: #12153 Closes #12619 (cherry picked from commit fa9cf81a)
-
Kamil Braun authored
If a server is stopped suddenly (i.e. not graceful), schema tables might be in inconsistent state. Add a test case and enable Scylla configuration option (force_schema_commit_log) to handle this. Fixes #12218 Closes #12630 * github.com:scylladb/scylladb: pytest: test start after ungraceful stop test.py: enable force_schema_commit_log (cherry picked from commit 5eadea30)
-
Nadav Har'El authored
Improve logging by printing the cluster at the end of each test. Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure. Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test. Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do. Closes #12652 * github.com:scylladb/scylladb: test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters test/topology: don't drop random_tables keyspace after a failed test test/pylib: mark cluster as dirty after a failed test test: pylib, topology: don't perform operations after test on a dirty cluster test/pylib: print cluster at the end of test (cherry picked from commit 2653865b)
-