Skip to content
Snippets Groups Projects
  1. Jun 22, 2023
  2. Jun 21, 2023
  3. Jun 15, 2023
  4. Jun 14, 2023
  5. Jun 13, 2023
  6. Jun 12, 2023
    • Kamil Braun's avatar
      test: pylib: fix `read_barrier` implementation · f4115528
      Kamil Braun authored
      The previous implementation didn't actually do a read barrier, because
      the statement failed on an early prepare/validate step which happened
      before read barrier was even performed.
      
      Change it to a statement which does not fail and doesn't perform any
      schema change but requires a read barrier.
      
      This breaks one test which uses `RandomTables.verify_schema()` when only
      one node is alive, but `verify_schema` performs a read barrier. Unbreak
      it by skipping the read barrier in this case (it makes sense in this
      particular test).
      
      Closes #13933
      
      (cherry picked from commit 64dc76db)
      Backport note: skipped the test_snapshot.py change, as the test doesn't
      exist on this branch.
      f4115528
    • Kamil Braun's avatar
      test: pylib: random_tables: perform read barrier in `verify_schema` · 9c941aba
      Kamil Braun authored
      `RandomTables.verify_schema` is often called in topology tests after
      performing a schema change. It compares the schema tables fetched from
      some node to the expected latest schema stored by the `RandomTables`
      object.
      
      However there's no guarantee that the latest schema change has already
      propagated to the node which we query. We could have performed the
      schema change on a different node and the change may not have been
      applied yet on all nodes.
      
      To fix that, pick a specific node and perform a read barrier on it, then
      use that node to fetch the schema tables.
      
      Fixes #13788
      
      Closes #13789
      
      (cherry picked from commit 3f3dcf45)
      9c941aba
    • Konstantin Osipov's avatar
      test: issue a read barrier before checking ring consistency · 094bcac3
      Konstantin Osipov authored
      Raft replication doesn't guarantee that all replicas see
      identical Raft state at all times, it only guarantees the
      same order of events on all replicas.
      
      When comparing raft state with gossip state on a node, first
      issue a read barrier to ensure the node has the latest raft state.
      
      To issue a read barrier it is sufficient to alter a non-existing
      state: in order to validate the DDL the node needs to sync with the
      leader and fetch its latest group0 state.
      
      Fixes #13518 (flaky topology test).
      
      Closes #13756
      
      (cherry picked from commit e7c9ca56)
      094bcac3
    • Kamil Braun's avatar
      Merge 'scylla_cluster.py: fix read_last_line' from Gusev Petr · e49a531a
      Kamil Braun authored
      This is a follow-up to #13399, the patch
      addresses the issues mentioned there:
      * linesep can be split between blocks;
      * linesep can be part of UTF-8 sequence;
      * avoid excessively long lines, limit to 256 chars;
      * the logic of the function made simpler and more maintainable.
      
      Closes #13427
      
      * github.com:scylladb/scylladb:
        pylib_test: add tests for read_last_line
        pytest: add pylib_test directory
        scylla_cluster.py: fix read_last_line
        scylla_cluster.py: move read_last_line to util.py
      
      (cherry picked from commit 70f2b093)
      e49a531a
    • Alejo Sanchez's avatar
      test/pylib: ManagerClient helpers to wait for... · bcf99a37
      Alejo Sanchez authored
      
      server to see other servers after start/restart
      
      When starting/restarting a server, provide a way to wait for the server
      to see at least n other servers.
      
      Also leave the implementation methods available for manual use and
      update previous tests, one to wait for a specific server to be seen, and
      one to wait for a specific server to not be seen (down).
      
      Fixes #13147
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      
      Closes #13438
      
      (cherry picked from commit 11561a73)
      Backport note: skipped the test_mutation_schema_change.py fix as the
      test doesn't exist on this branch.
      bcf99a37
    • Tomasz Grabiec's avatar
      test: pylib: Add a way to create cql connections with particular coordinators · fe4af957
      Tomasz Grabiec authored
      Usage:
      
        await manager.driver_connect(server=servers[0])
        manager.cql.execute(f"...", execution_profile='whitelist')
      
      (cherry picked from commit 041ee3ff)
      fe4af957
    • Alejo Sanchez's avatar
      test/pylib: get gossiper alive endpoints · ac5dff7d
      Alejo Sanchez authored
      
      Helper to get list of gossiper alive endpoints from REST API.
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      (cherry picked from commit 62a945cc)
      ac5dff7d
    • Alejo Sanchez's avatar
      test/topology: default replication factor 3 · ad99456a
      Alejo Sanchez authored
      For most tests there will be nodes down, increase replication factor to
      3 to avoid having problems for partitions belonging to down nodes.
      
      Use replication factor 1 for raft upgrade tests.
      
      (cherry picked from commit 08d754e1)
      ad99456a
    • Alejo Sanchez's avatar
      test/pylib: configurable replication factor · 937e890f
      Alejo Sanchez authored
      
      Make replication factor configurable for the RandomTables helper.
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      (cherry picked from commit 3508a4e4)
      937e890f
    • Petr Gusev's avatar
      scylla_cluster.py: optimize node logs reading · 12eec5bb
      Petr Gusev authored
      There are two occasions in scylla_cluster
      where we read the node logs, and in both of
      them we read the entire file in memory.
      This is not efficient and may cause an OOM.
      
      In the first case we need the last line of the
      log file, so we seek at the end and move backwards
      looking for a new line symbol.
      
      In the second case we look through the
      log file to find the expected_error.
      The readlines() method returns a Python
      list object, which means it reads the entire
      file in memory. It's sufficient to just remove
      it since iterating over the file instance
      already yields lines lazily one by one.
      
      This is a follow-up for #13134.
      
      Closes #13399
      
      (cherry picked from commit 09636b20)
      12eec5bb
    • Alejo Sanchez's avatar
      test/pylib: RandomTables.add_column with value column · 59847389
      Alejo Sanchez authored
      
      When adding extra columns in a test, make them value column. Name them
      with the "v_" prefix and use the value column number counter.
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      
      Closes #13271
      
      (cherry picked from commit 81b40c10)
      59847389
    • Petr Gusev's avatar
      scylla_cluster.py: add start flag to server_add · 7a8c5db5
      Petr Gusev authored
      Sometimes when creating a node it's useful
      to just install it and not start. For example,
      we may want to try to start it later with
      expected error.
      
      The ScyllaServer.install method has been made
      exception safe, if an exception occurs, it
      reverts to the original state. This allows
      to not duplicate the try/except logic
      in two of its call sites.
      
      (cherry picked from commit e407956e)
      7a8c5db5
    • Petr Gusev's avatar
      ServerInfo: drop host_id · 15ea5bf5
      Petr Gusev authored
      We are going to allow the
      ScyllaCluster.add_server function not to
      start the server if the caller has requested
      that with a special parameter. The host_id
      can only be obtained from a running node, so
      add_server won't be able to return it in
      this case. I've grepped the tests for host_id
      and there doesn't seem to be any
      reference to it in the code.
      
      (cherry picked from commit 794d0e40)
      15ea5bf5
    • Petr Gusev's avatar
      scylla_cluster.py: add config to server_add · 3ab61075
      Petr Gusev authored
      Sometimes when creating a node it's useful
      to pass a custom node config.
      
      (cherry picked from commit 8e3392c6)
      3ab61075
    • Petr Gusev's avatar
      scylla_cluster.py: add expected_error to server_start · 1959eddf
      Petr Gusev authored
      Sometimes it's useful to check that the node has failed
      to start for a particular reason. If server_start can't
      find expected_error in the node's log or if the
      node has started without errors, it throws an exception.
      
      (cherry picked from commit c1d0ee2b)
      1959eddf
    • Petr Gusev's avatar
      scylla_cluster.py: ScyllaServer.start, refactor error reporting · 43525aec
      Petr Gusev authored
      Extract the function that encapsulates all the error
      reporting logic. We are going to use it in several
      other places to implement expected_error feature.
      
      (cherry picked from commit a4411e9e)
      43525aec
    • Petr Gusev's avatar
      scylla_cluster.py: fix ScyllaServer.start, reset cmd if start failed · 930c4e65
      Petr Gusev authored
      The ScyllaServer expects cmd to be None if the
      Scylla process is not running. Otherwise, if start failed
      and the test called update_config, the latter will
      try to send a signal to a non-existent process via cmd.
      
      (cherry picked from commit 21b505e6)
      930c4e65
    • Konstantin Osipov's avatar
      test: improve logging in ScyllaCluster · d2caaef1
      Konstantin Osipov authored
      Print IP addresses and cluster identifiers in more log messages,
      it helps debugging.
      
      (cherry picked from commit 7309a1bd)
      d2caaef1
    • Alejo Sanchez's avatar
      test: topology smp test with custom cluster · 6474edd6
      Alejo Sanchez authored
      
      Instead of decommission of initial cluster, use custom cluster.
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      
      Closes #13589
      
      (cherry picked from commit ce87aedd)
      6474edd6
    • Alejo Sanchez's avatar
      test/pylib: topology: support clusters of initial size 0 · b39cdadf
      Alejo Sanchez authored
      
      To allow tests with custom clusters, allow configuration of initial
      cluster size of 0.
      
      Add a proof-of-concept test to be removed later.
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      
      Closes #13342
      
      (cherry picked from commit e3b46250)
      b39cdadf
    • Nadav Har'El's avatar
      Merge 'test/pylib: split and refactor topology tests' from Alecco · 7b60cdda
      Nadav Har'El authored
      Move long running topology tests out of  `test_topology.py` and into their own files, so they can be run in parallel.
      
      While there, merge simple schema tests.
      
      Closes #12804
      
      * github.com:scylladb/scylladb:
        test/topology: rename topology test file
        test/topology: lint and type for topology tests
        test/topology: move topology ip tests to own file
        test/topology: move topology test remove garbaje...
        test/topology: move topology rejoin test to own file
        test/topology: merge topology schema tests and...
        test/topology: isolate topology smp params test
        test/topology: move topology helpers to common file
      
      (cherry picked from commit a24600a6)
      7b60cdda
    • Botond Dénes's avatar
      Merge 'test/pylib: use larger timeout for decommission/removenode' from Kamil Braun · ea80fe20
      Botond Dénes authored
      Recently we enabled RBNO by default in all topology operations. This
      made the operations a bit slower (repair-based topology ops are a bit
      slower than classic streaming - they do more work), and in debug mode
      with large number of concurrent tests running, they might timeout.
      
      The timeout for bootstrap was already increased before, do the same for
      decommission/removenode. The previously used timeout was 300 seconds
      (this is the default used by aiohttp library when it makes HTTP
      requests), now use the TOPOLOGY_TIMEOUT constant from ScyllaServer which
      is 1000 seconds.
      
      Closes #12765
      
      * github.com:scylladb/scylladb:
        test/pylib: use larger timeout for decommission/removenode
        test/pylib: scylla_cluster: rename START_TIMEOUT to TOPOLOGY_TIMEOUT
      
      (cherry picked from commit e55f475d)
      ea80fe20
    • Asias He's avatar
      test: Increase START_TIMEOUT · f90fe6f3
      Asias He authored
      It is observed that CI machine is slow to run the test. Increase the
      timeout of adding servers.
      
      (cherry picked from commit fc604844)
      f90fe6f3
    • Alejo Sanchez's avatar
      test/pylib: one-shot error injection helper · 6e2c5473
      Alejo Sanchez authored
      
      Existing helper with async context manager only worked for non one-shot
      error injections. Fix it and add another helper for one-shot without a
      context manager.
      
      Fix tests using the previous helper.
      
      Signed-off-by: default avatarAlejo Sanchez <alejo.sanchez@scylladb.com>
      (cherry picked from commit 9ceb6aba)
      6e2c5473
    • Kamil Braun's avatar
      test: topology: wait for token ring/group 0 consistency after decommission · 91aa2cd8
      Kamil Braun authored
      There was a check for immediate consistency after a decommission
      operation has finished in one of the tests, but it turns out that also
      after decommission it might take some time for token ring to be updated
      on other nodes. Replace the check with a wait.
      
      Also do the wait in another test that performs a sequence of
      decommissions. We won't attempt to start another decommission until
      every node learns that the previously decommissioned node has left.
      
      Closes #12686
      
      (cherry picked from commit 40142a51)
      91aa2cd8
    • Kamil Braun's avatar
      test: topology: verify that group 0 and token ring are consistent · 05c3f7ec
      Kamil Braun authored
      After topology changes like removing a node, verify that the set of
      group 0 members and token ring members is the same.
      
      Modify `get_token_ring_host_ids` to only return NORMAL members. The
      previous version which used the `/storage_service/host_id` endpoint
      might have returned non-NORMAL members as well.
      
      Fixes: #12153
      
      Closes #12619
      
      (cherry picked from commit fa9cf81a)
      05c3f7ec
    • Kamil Braun's avatar
      Merge 'pytest: start after ungraceful stop' from Alecco · 3aa73e8b
      Kamil Braun authored
      If a server is stopped suddenly (i.e. not graceful), schema tables might
      be in inconsistent state. Add a test case and enable Scylla
      configuration option (force_schema_commit_log) to handle this.
      
      Fixes #12218
      
      Closes #12630
      
      * github.com:scylladb/scylladb:
        pytest: test start after ungraceful stop
        test.py: enable force_schema_commit_log
      
      (cherry picked from commit 5eadea30)
      3aa73e8b
    • Nadav Har'El's avatar
      Merge 'test.py: improve test failure handling' from Kamil Braun · a0ba3b33
      Nadav Har'El authored
      Improve logging by printing the cluster at the end of each test.
      
      Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure.
      
      Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test.
      
      Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do.
      
      Closes #12652
      
      * github.com:scylladb/scylladb:
        test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
        test/topology: don't drop random_tables keyspace after a failed test
        test/pylib: mark cluster as dirty after a failed test
        test: pylib, topology: don't perform operations after test on a dirty cluster
        test/pylib: print cluster at the end of test
      
      (cherry picked from commit 2653865b)
      a0ba3b33
Loading