Commit graph

233 commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
0923fd0419 Locking updates for v6.20:
Lock debugging:
 
  - Implement compiler-driven static analysis locking context
    checking, using the upcoming Clang 22 compiler's context
    analysis features. (Marco Elver)
 
    We removed Sparse context analysis support, because prior to
    removal even a defconfig kernel produced 1,700+ context
    tracking Sparse warnings, the overwhelming majority of which
    are false positives. On an allmodconfig kernel the number of
    false positive context tracking Sparse warnings grows to
    over 5,200... On the plus side of the balance actual locking
    bugs found by Sparse context analysis is also rather ... sparse:
    I found only 3 such commits in the last 3 years. So the
    rate of false positives and the maintenance overhead is
    rather high and there appears to be no active policy in
    place to achieve a zero-warnings baseline to move the
    annotations & fixers to developers who introduce new code.
 
    Clang context analysis is more complete and more aggressive
    in trying to find bugs, at least in principle. Plus it has
    a different model to enabling it: it's enabled subsystem by
    subsystem, which results in zero warnings on all relevant
    kernel builds (as far as our testing managed to cover it).
    Which allowed us to enable it by default, similar to other
    compiler warnings, with the expectation that there are no
    warnings going forward. This enforces a zero-warnings baseline
    on clang-22+ builds. (Which are still limited in distribution,
    admittedly.)
 
    Hopefully the Clang approach can lead to a more maintainable
    zero-warnings status quo and policy, with more and more
    subsystems and drivers enabling the feature. Context tracking
    can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y
    (default disabled), but this will generate a lot of false positives.
 
    ( Having said that, Sparse support could still be added back,
      if anyone is interested - the removal patch is still
      relatively straightforward to revert at this stage. )
 
 Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng)
 
   - Add support for Atomic<i8/i16/bool> and replace most Rust native
     AtomicBool usages with Atomic<bool>
 
   - Clean up LockClassKey and improve its documentation
 
   - Add missing Send and Sync trait implementation for SetOnce
 
   - Make ARef Unpin as it is supposed to be
 
   - Add __rust_helper to a few Rust helpers as a preparation for
     helper LTO
 
   - Inline various lock related functions to avoid additional
     function calls.
 
 WW mutexes:
 
   - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz)
 
 Misc fixes and cleanups:
 
   - rcu: Mark lockdep_assert_rcu_helper() __always_inline
     (Arnd Bergmann)
 
   - locking/local_lock: Include more missing headers (Peter Zijlstra)
 
   - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap)
 
   - rust: sync: Replace `kernel::c_str!` with C-Strings
     (Tamir Duberstein)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmIXiURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gH+A/9GX5UmU6+HuDfDrCtXm9GDve6wkwahvcW
 jLDxOYjs764I2BhyjZnjKjyF5zw60hbykem7Wcf5EV2YH30nM4XRgEWVJfkr1UAI
 Pra415X4DdOzZ6qYQIpO8Udt1LtR7BMSaXITVLJaLicxEoOVtq3SKxjqyhCFs7UW
 MfJdqleB+RMLqq3LlzgB4l43eKk1xyeHh+oQwI0RSxuIpVZme3p4TObnCKjIWnK7
 Ihd+dkgC852WBjANgNL7F/sd5UsF5QX3wjtOrLhMKvkIgTPdXln0g398pivjN/G/
 Kpnw18SFeb159JfJu8eMotsYvVnQ0D5aOcTBfL4qvOHCImhpcu2s6ik9BcXqt2yT
 8IiuWk9xEM3Ok+I/I4ClT5cf5GYpyigV2QsXxn+IjDX5Na8v4zlHh0r8SElP8fOt
 7dpQx7iw8UghAib3AzA3suN78Oh39m8l5BNobj7LAjnqOQcVvoPo4o7/48ntuH7A
 38EucFrXfxQBMfGbMwvxEmgYuX7MyVfQLaPE06MHy1BkZkffT8Um38TB0iNtZmtf
 WUx01yLKWYspehlwFi319uVI4/Zp7FnTfqa5uKv1oSXVdL9vZojSXUzrgDV7FVqT
 Z4xAAw/kwNHpUG7y0zNOqd6PukovG1t+CjbLvK+eHPwc5c0vEGG2oTRAfEvvP1z/
 kesYDmCyJnk=
 =N1gA
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Lock debugging:

   - Implement compiler-driven static analysis locking context checking,
     using the upcoming Clang 22 compiler's context analysis features
     (Marco Elver)

     We removed Sparse context analysis support, because prior to
     removal even a defconfig kernel produced 1,700+ context tracking
     Sparse warnings, the overwhelming majority of which are false
     positives. On an allmodconfig kernel the number of false positive
     context tracking Sparse warnings grows to over 5,200... On the plus
     side of the balance actual locking bugs found by Sparse context
     analysis is also rather ... sparse: I found only 3 such commits in
     the last 3 years. So the rate of false positives and the
     maintenance overhead is rather high and there appears to be no
     active policy in place to achieve a zero-warnings baseline to move
     the annotations & fixers to developers who introduce new code.

     Clang context analysis is more complete and more aggressive in
     trying to find bugs, at least in principle. Plus it has a different
     model to enabling it: it's enabled subsystem by subsystem, which
     results in zero warnings on all relevant kernel builds (as far as
     our testing managed to cover it). Which allowed us to enable it by
     default, similar to other compiler warnings, with the expectation
     that there are no warnings going forward. This enforces a
     zero-warnings baseline on clang-22+ builds (Which are still limited
     in distribution, admittedly)

     Hopefully the Clang approach can lead to a more maintainable
     zero-warnings status quo and policy, with more and more subsystems
     and drivers enabling the feature. Context tracking can be enabled
     for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default
     disabled), but this will generate a lot of false positives.

     ( Having said that, Sparse support could still be added back,
       if anyone is interested - the removal patch is still
       relatively straightforward to revert at this stage. )

  Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng)

    - Add support for Atomic<i8/i16/bool> and replace most Rust native
      AtomicBool usages with Atomic<bool>

    - Clean up LockClassKey and improve its documentation

    - Add missing Send and Sync trait implementation for SetOnce

    - Make ARef Unpin as it is supposed to be

    - Add __rust_helper to a few Rust helpers as a preparation for
      helper LTO

    - Inline various lock related functions to avoid additional function
      calls

  WW mutexes:

    - Extend ww_mutex tests and other test-ww_mutex updates (John
      Stultz)

  Misc fixes and cleanups:

    - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd
      Bergmann)

    - locking/local_lock: Include more missing headers (Peter Zijlstra)

    - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap)

    - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir
      Duberstein)"

* tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits)
  locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK
  rcu: Mark lockdep_assert_rcu_helper() __always_inline
  compiler-context-analysis: Remove __assume_ctx_lock from initializers
  tomoyo: Use scoped init guard
  crypto: Use scoped init guard
  kcov: Use scoped init guard
  compiler-context-analysis: Introduce scoped init guards
  cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers
  seqlock: fix scoped_seqlock_read kernel-doc
  tools: Update context analysis macros in compiler_types.h
  rust: sync: Replace `kernel::c_str!` with C-Strings
  rust: sync: Inline various lock related methods
  rust: helpers: Move #define __rust_helper out of atomic.c
  rust: wait: Add __rust_helper to helpers
  rust: time: Add __rust_helper to helpers
  rust: task: Add __rust_helper to helpers
  rust: sync: Add __rust_helper to helpers
  rust: refcount: Add __rust_helper to helpers
  rust: rcu: Add __rust_helper to helpers
  rust: processor: Add __rust_helper to helpers
  ...
2026-02-10 12:28:44 -08:00
Ezrak1e
080e5563f8 dlm: validate length in dlm_search_rsb_tree
The len parameter in dlm_dump_rsb_name() is not validated and comes
from network messages. When it exceeds DLM_RESNAME_MAXLEN, it can
cause out-of-bounds write in dlm_search_rsb_tree().

Add length validation to prevent potential buffer overflow.

Signed-off-by: Ezrak1e <ezrakiez@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20 12:07:31 -06:00
Alexander Aring
1416bd508c dlm: fix recovery pending middle conversion
During a workload involving conversions between lock modes PR and CW,
lock recovery can create a "conversion deadlock" state between locks
that have been recovered.  When this occurs, kernel warning messages
are logged, e.g.

  "dlm: WARN: pending deadlock 1e node 0 2 1bf21"

  "dlm: receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e"

After this occurs, the deadlocked conversions both appear on the convert
queue of the resource being locked, and the conversion requests do not
complete.

Outside of recovery, conversions that would produce a deadlock are
resolved immediately, and return -EDEADLK.  The locks are not placed
on the convert queue in the deadlocked state.

To fix this problem, an lkb under conversion between PR/CW is rebuilt
during recovery on a new master's granted queue, with the currently
granted mode, rather than being rebuilt on the new master's convert
queue, with the currently granted mode and the newly requested mode.
The in-progress convert is then resent to the new master after
recovery, so the conversion deadlock will be processed outside of
the recovery context and handled as described above.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20 11:33:33 -06:00
Marco Elver
38f1311a22 compiler-context-analysis: Change __cond_acquires to take return value
While Sparse is oblivious to the return value of conditional acquire
functions, Clang's context analysis needs to know the return value
which indicates successful acquisition.

Add the additional argument, and convert existing uses.

Notably, Clang's interpretation of the value merely relates to the use
in a later conditional branch, i.e. 1 ==> context lock acquired in
branch taken if condition non-zero, and 0 ==> context lock acquired in
branch taken if condition is zero. Given the precise value does not
matter, introduce symbolic variants to use instead of either 0 or 1,
which should be more intuitive.

No functional change intended.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-10-elver@google.com
2026-01-05 16:43:29 +01:00
Alexander Aring
a8abcff174 dlm: move to rinfo for all middle conversion cases
Since commit f74dacb4c8 ("dlm: fix recovery of middle conversions")
we introduced additional debugging information if we hit the middle
conversion by using log_limit(). The DLM log_limit() functionality
requires a DLM debug option being enabled. As this case is so rarely and
excempt any potential introduced new issue with recovery we switching it
to log_rinfo() ad this is ratelimited under normal DLM loglevel.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2025-08-14 15:16:04 -05:00
Ingo Molnar
41cb08555c treewide, timers: Rename from_timer() to timer_container_of()
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-08 09:07:37 +02:00
Alexander Aring
a3672304ab dlm: fix error if active rsb is not hashed
If an active rsb is not hashed anymore and this could occur because we
releases and acquired locks we need to signal the followed code that
the lookup failed. Since the lookup was successful, but it isn't part of
the rsb hash anymore we need to signal it by setting error to -EBADR as
dlm_search_rsb_tree() does it.

Cc: stable@vger.kernel.org
Fixes: 5be323b0c6 ("dlm: move dlm_search_rsb_tree() out of lock")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-28 19:24:21 -05:00
Alexander Aring
94e6e889a7 dlm: fix error if inactive rsb is not hashed
If an inactive rsb is not hashed anymore and this could occur because we
releases and acquired locks we need to signal the followed code that the
lookup failed. Since the lookup was successful, but it isn't part of the
rsb hash anymore we need to signal it by setting error to -EBADR as
dlm_search_rsb_tree() does it.

Cc: stable@vger.kernel.org
Fixes: 01fdeca1cc ("dlm: use rcu to avoid an extra rsb struct lookup")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-28 19:24:21 -05:00
Alexander Aring
134129520b dlm: fix removal of rsb struct that is master and dir record
An rsb struct was not being removed in the case where it
was both the master and the dir record.  This case (master
and dir node) was missed in the condition for doing add_scan()
from deactivate_rsb().  Fixing this triggers a related WARN_ON
that needs to be fixed, and requires adjusting where two
del_scan() calls are made.

Fixes: c217adfc8c ("dlm: fix add_scan and del_scan usage")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-12-19 13:07:02 -06:00
Alexander Aring
f74dacb4c8 dlm: fix recovery of middle conversions
In one special case, recovery is unable to reliably rebuild
lock state by simply recreating lkb structs as sent from the
lock holders.  That case is when the lkb's include conversions
between PR and CW modes.

The recovery code has always recognized this special case,
but the implemention has always been broken, and would set
invalid modes in recovered lkb's.  Unpredictable or bogus
errors could then be returned for further locking calls on
these locks.

This bug has gone unnoticed for so long due to some
combination of:
- applications never or infrequently converting between PR/CW
- recovery not occuring during these conversions
- if the recovery bug does occur, the caller may not notice,
  depending on what further locking calls are made, e.g. if
  the lock is simply unlocked it may go unnoticed

However, a core analysis from a recent gfs2 bug report points
to this broken code.

PR = Protected Read
CW = Concurrent Write
PR and CW are incompatible
PR and PR are compatible
CW and CW are compatible

Example 1

node C, resource R
granted: PR node A
granted: PR node B
granted: NL node C
granted: NL node D

- A sends convert PR->CW to C
- C fails before A gets a reply
- recovery occurs

At this point, A does not know if it still holds
the lock in PR, or if its conversion to CW was granted:
- If A's conversion to CW was granted, then another
  node's CW lock may also have been granted.
- If A's conversion to CW was not granted, it still
  holds a PR lock, and other nodes may also hold PR locks.

So, the new master of R cannot simply recreate the lock
from A using granted mode PR and requested mode CW.
The new master must look at all the recovered locks to
determine the correct granted modes, and ensure that all
the recovered locks are recreated in compatible states.

The correct lock recovery steps in this example are:
- node D becomes the new master of R
- node B sends D its lkb, granted PR
- node A sends D its lkb, convert PR->CW
- D determines the correct lock state is:
  granted: PR node B
  convert: PR->CW node A

The lkb sent by each node was recreated without
any change on the new master node.

Example 2

node C, resource R
granted: PR node A
granted: NL node C
granted: NL node D
waiting: CW node B

- A sends convert PR->CW to C
- C grants the conversion to CW for A
- C grants the waiting request for CW to B
- C sends granted message to B, but fails
  before it can send the granted message to A
- B receives the granted message from C

At this point:
- A believes it is converting PR->CW
- B believes it is holding a CW lock

The correct lock recovery steps in this example are:
- node D becomes the new master of R
- node A sends D its lkb, convert PR->CW
- node B sends D its lkb, granted CW
- D determins the correct lock state is:
  granted: CW node B
  granted: CW node A

The lkb sent by B is recreated without change,
but the lkb sent by A is changed because the
granted mode was not compatible.

Fixes to make this work correctly:

recover_convert_waiter: should not make any changes
to a converting lkb that is still waiting for a reply
message.  It was previously setting grmode to IV, which
is invalid state, so the lkb would not be handled
correctly by other code.

receive_rcom_lock_args: was checking the wrong lkb field
(wait_type instead of status) to determine if the lkb is
being converted, and in need of inspection for this special
recovery.  It was also setting grmode to IV in the lkb,
causing it to be mishandled by other code.
Now, this function just puts the lkb, directly as sent,
onto the convert queue of the resource being recovered,
and corrects it in recover_conversion() later, if needed.

recover_conversion: the job of this function is to detect
and correct lkb states for the special PR/CW conversions.
The new code now checks for recovered lkbs on the granted
queue with grmode PR or CW, and takes the real grmode from
that.  Then it looks for lkbs on the convert queue with an
incompatible grmode (i.e. grmode PR when the real grmode is
CW, or v.v.)  These converting lkbs need to be fixed.
They are fixed by temporarily setting their grmode to NL,
so that grmodes are not incompatible and won't confuse other
locking code.  The converting lkb will then be granted at
the end of recovery, replacing the temporary NL grmode.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-11-15 13:39:36 -06:00
Alexander Aring
dfe5a6cc42 dlm: make add_to_waiters() that it can't fail
If add_to_waiters() fails we have a problem because the previous called
functions such as validate_lock_args() or validate_unlock_args() sets
specific lkb values that are set for a request, there exists no way back
to revert those changes. When there is a pending lock request the
original request arguments will be overwritten with unknown
consequences.

The good news are that I believe those cases that we fail in
add_to_waiters() can't happen or very unlikely to happen (only if the DLM
user does stupid API things), but if so we have the above mentioned
problem.

There are two conditions that will be removed here. The first one is the
-EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel()
and mstype == DLM_MSG_CANCEL).

The is_overlap_unlock() is missing for the normal UNLOCK case which is
moved to validate_unlock_args(). The is_overlap_cancel() already happens
in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of
validate_lock_args() we check on is_overlap() when it is not a new request,
on a new request the lkb is always new and does not have those values set.

The -EBUSY check can't happen in case as for non new lock requests (when
DLM_LKF_CONVERT is set) we already check in validate_lock_args() for
lkb_wait_type and is_overlap(). Then there is only
validate_unlock_args() that will never hit the default case because
dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04 10:31:31 -05:00
Alexander Aring
b98333c67d dlm: fix possible lkb_resource null dereference
This patch fixes a possible null pointer dereference when this function is
called from request_lock() as lkb->lkb_resource is not assigned yet,
only after validate_lock_args() by calling attach_lkb(). Another issue
is that a resource name could be a non printable bytearray and we cannot
assume to be ASCII coded.

The log functionality is probably never being hit when DLM is used in
normal way and no debug logging is enabled. The null pointer dereference
can only occur on a new created lkb that does not have the resource
assigned yet, it probably never hits the null pointer dereference but we
should be sure that other changes might not change this behaviour and we
actually can hit the mentioned null pointer dereference.

In this patch we just drop the printout of the resource name, the lkb id
is enough to make a possible connection to a resource name if this
exists.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04 10:31:31 -05:00
Alexander Aring
c846f732b9 dlm: move lkb xarray lookup out of lock
This patch moves the xarray lookup functionality for the lkb out of the
ls_lkbxa_lock read lock handling. We can do that as the xarray should be
possible to access lockless in case of reader like xa_load(). We confirm
under ls_lkbxa_lock that the lkb is still part of the data structure and
take a reference when its still part of ls_lkbxa to avoid being freed
after doing the lookup. To do a check if the lkb is still part of the
ls_lkbxa data structure we use a kref_read() as the last put will remove
it from the ls_lkbxa data structure and any reference taken means it is
still part of ls_lkbxa.

A similar approach was done with the DLM rsb rhashtable just with a flag
instead of the refcounter because the refcounter has a slightly
different meaning.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:15:08 -05:00
Alexander Aring
5be323b0c6 dlm: move dlm_search_rsb_tree() out of lock
The rhashtable structure is lockless for readers such as
rhashtable_lookup_fast(). It should be save to call this lookup
functionality out of holding ls_rsbtbl_lock to get the rsb pointer out
of the hash. This reduce the contention time of ls_rsbtbl_lock in some
cases. We still need to check if the rsb is part of the check as this
state can be changed while ls_rsbtbl_lock is not held. If its part of
the rhashtable data structure we take a reference to be sure it will not
be freed after we drop the ls_rsbtbl_lock read lock.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:15:08 -05:00
Alexander Aring
98ff7d95d9 dlm: use RSB_HASHED to avoid lookup twice
Since commit 01fdeca1cc ("dlm: use rcu to avoid an extra rsb struct
lookup") _dlm_master_lookup() is called under rcu lock that prevents
that the rsb structure is being freed. There was a missing change to
avoid an additional lookup and just check that the rsb is still part of
the ls_rsbtbl structure. This patch is doing such check instead of
lookup the rsb structure again.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:15:08 -05:00
Alexander Aring
d47b822974 dlm: warn about invalid nodeid comparsions
This patch adds a warn on if is_master() and dlm_is_removed() checks on
invalid nodeid states that are probably not what the caller wants to do
here. The is_master() function checking on r->res_nodeid is invalid when
it is set to -1, whereas the dlm_is_removed() has a different meaning
as "nodeid member" and also 0 is invalid.

We run into these cases and this patch changes those cases as we never
will run into them. There should be no functional changes as the
condition should return the same result. However this patch signals now
on caller level that there might be an "extra" case to handle here.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:15:07 -05:00
Alexander Aring
d3b3d2d8e1 dlm: remove unnecessary refcounts
This patch removes unnecessary refcounts that are obviously not
necessary because either when the pointer is passed as parameter or it
is part of a list we should already hold a reference to it.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:14:42 -05:00
Alexander Aring
1169788544 dlm: cleanup memory allocation helpers
This patch removes a unnecessary parameter from DLM memory allocation
helpers and reduce some functions by just directly reply the pointer
address of the allocated memory.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08 15:14:41 -05:00
Alexander Aring
01fdeca1cc dlm: use rcu to avoid an extra rsb struct lookup
Use rcu to free rsb structs, and hold the rcu read lock
while looking up rsb structs.  This allows us to avoid an
extra hash table lookup for an rsb.  A new rsb flag HASHED
is added which is set while the rsb is in the hash table.
This flag is checked in place of repeating the hash table
lookup.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10 15:26:03 -05:00
David Teigland
c217adfc8c dlm: fix add_scan and del_scan usage
Remove a few calls to add_scan() and del_scan() in cases where
the rsb is a dir record, so the rsb should never be placed on
the scan list at all.  Add WARN_ON to catch cases where this
is done.

Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10 15:21:28 -05:00
David Teigland
4f5957a980 dlm: change list and timer names
The old terminology of "toss" and "keep" is no longer an
accurate description of the rsb states and lists, so change
the names to "inactive" and "active".  The old names had
also been copied into the scanning code, which is changed
back to use the "scan" name.

- "active" rsb structs have lkb's attached, and are ref counted.
- "inactive" rsb structs have no lkb's attached, are not ref counted.
- "scan" list is for rsb's that can be freed after a timeout period.
- "slow" lists are for infrequent iterations through active or
   inactive rsb structs.
- inactive rsb structs that are directory records will not be put
  on the scan list, since they are not freed based on timeouts.
- inactive rsb structs that are not directory records will be
  put on the scan list to be freed, since they are not longer needed.

Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10 15:11:46 -05:00
Alexander Aring
f455eb8490 dlm: move lkb idr to xarray datastructure
According to kernel doc idr is deprecated and xarrays should be used
nowadays. This patch is moving the lkb idr implementation to xarrays.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-05-31 11:04:54 -05:00
Alexander Aring
1ffefc19c4 dlm: drop own rsb pre allocation mechanism
This patch drops the own written rsb pre allocation mechanism as this is
already done by using kmem caches, we don't need another layer on top of
that to running some pre allocation scheme.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-05-31 11:04:54 -05:00
Alexander Aring
a2155402bf dlm: don't kref_init rsbs created for toss list
This patch removes a kref_init() that isn't necessary because the rsb is
created for toss list. Under toss list the rsb should not have any
reference counting logic. If in theory the rsb gets to into keep list
then a kref_init() for res_ref will be initiated.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-05-31 11:04:54 -05:00
Alexander Aring
7b012732d0 dlm: fix sleep in atomic context
This patch changes the orphans mutex to a spinlock since commit
c288745f1d ("dlm: avoid blocking receive at the end of recovery") is
using a rwlock_t to lock the DLM message receive path and do_purge() can
be called while this lock is held that forbids to sleep.

We need to use spin_lock_bh() because also a user context that calls
dlm_user_purge() can call do_purge() and since commit 92d59adfaf
("dlm: do message processing in softirq context") the DLM message
receive path is done under softirq context.

Fixes: c288745f1d ("dlm: avoid blocking receive at the end of recovery")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/gfs2/9ad928eb-2ece-4ad9-a79c-d2bce228e4bc@moroto.mountain/
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-17 14:20:04 -05:00
Alexander Aring
15fd7e5517 dlm: use rwlock for lkbidr
Convert the lock for lkbidr to an rwlock.  Most idr lookups will use
the read lock.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:45:57 -05:00
Alexander Aring
e91313591b dlm: use rwlock for rsb hash table
The conversion to rhashtable introduced a hash table lock per lockspace,
in place of per bucket locks.  To make this more scalable, switch to
using a rwlock for hash table access.  The common case fast path uses
it as a read lock.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:45:31 -05:00
Alexander Aring
b1f2381c1a dlm: drop dlm_scand kthread and use timers
Currently the scand kthread acts like a garbage collection for expired
rsbs on toss list, to clean them up after a certain timeout. It triggers
every couple of seconds and iterates over the toss list while holding
ls_rsbtbl_lock for the whole hash bucket iteration.

To reduce the amount of time holding ls_rsbtbl_lock, we now handle the
disposal of expired rsbs using a per-lockspace timer that expires for the
earliest tossed rsb on the lockspace toss queue. This toss queue is
ordered according to the rsb res_toss_time with the earliest tossed rsb
as the first entry. The toss timer will only trylock() necessary locks,
since it is low priority garbage collection, and will rearm the timer
if trylock() fails. If the timer function does not find any expired
rsb's, it rearms the timer with the next earliest expired rsb.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:40:27 -05:00
Alexander Aring
6644925abf dlm: do not use ref counts for rsb in the toss state
In the past we had problems when an rsb had a reference counter greater
than one while in the toss state.  An rsb in the toss state is not
actively used for locking, and should not have any other references
apart from the single ref keeping it on the rsb hash.  Shift to freeing
rsb's directly rather than using kref_put to free them, since the ref
counting is not meant to be used in this state.  Add warnings if ref
counting is seen while an rsb is in the toss state.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:34:39 -05:00
Alexander Aring
6c648035cb dlm: switch to use rhashtable for rsbs
Replace our own hash table with the more advanced rhashtable
for keeping rsb structs.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:34:39 -05:00
Alexander Aring
93a693d19d dlm: add rsb lists for iteration
To prepare for using rhashtable, add two rsb lists for iterating
through rsb's in two uncommon cases where this is necesssary:
- when dumping rsb state from debugfs, now using seq_list.
- when looking at all rsb's during recovery.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 14:33:25 -05:00
Alexander Aring
2d90354027 dlm: merge toss and keep hash table lists into one list
There are several places where lock processing can perform two hash table
lookups, first in the "keep" list, and if not found, in the "toss" list.
This patch introduces a new rsb state flag "RSB_TOSS" to represent the
difference between the state of being on keep vs toss list, so that the
two lists can be combined.  This avoids cases of two lookups.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 13:49:13 -05:00
Alexander Aring
dcdaad05ca dlm: change to single hashtable lock
Prepare to replace our own hash table with rhashtable by replacing
the per-bucket locks in our own hash table with a single lock.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16 13:46:41 -05:00
Alexander Aring
578acf9a87 dlm: use spin_lock_bh for message processing
Use spin_lock_bh for all spinlocks involved in message processing,
in preparation for softirq message processing.  DLM lock requests
from user space involve dlm processing in user context, in addition
to the standard kernel context, necessitating bh variants.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:45:23 -05:00
Alexander Aring
308533b4b1 dlm: remove schedule in receive path
Remove an explicit schedule() call in the message processing path,
in preparation for softirq message processing.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
d52c9b8fef dlm: convert ls_recv_active from rw_semaphore to rwlock
Convert ls_recv_active rw_semaphore to an rwlock to avoid
sleeping, in preparation for softirq message processing.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
c288745f1d dlm: avoid blocking receive at the end of recovery
The end of the recovery process transitioned to normal message
processing by temporarily blocking the receiving context,
processing saved messages, then unblocking the receiving
context.  To avoid blocking the receiving context, the old
wait_queue and mutex are replaced by a new rwlock and the new
RECV_MSG_BLOCKED flag.  Received messages are added to the
list of saved messages, protected by the rwlock, until the
flag is cleared, which happens when all saved messages have
been processed.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
cc396e2355 dlm: convert res_lock to spinlock
Convert the rsb struct res_lock from a mutex to a spinlock
in preparation for processing messages in softirq context.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
097691dbad dlm: convert ls_waiters_mutex to spinlock
Convert the waiters mutex to a spinlock in prepration for
processing messages in softirq context.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
6b52ea7916 dlm: drop mutex use in waiters recovery
The waiters_mutex no longer needs to be used in the waiters recovery
functions dlm_recover_waiters_pre() and dlm_recover_waiters_pre().
During recovery, ordinary locking operations are paused, and the
recovery thread is the only context accessing the waiters list,
so the lock is not needed.

Access to the waiters list from debugfs functions is avoided by
taking the top level recovery lock in the debugfs dump function.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
3a747f4a2e dlm: move rsb root_list to ls_recover() stack
Move the rsb root_list from the lockspace to a stack variable since
it is now only used by the ls_recover() function.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
aff46e0f24 dlm: use a new list for recovery of master rsb names
Add a new "masters_list" for master rsb structs, with a new
rwlock. The new list is created and used during the recovery
process to send the master rsb names to new nodes. With this
change, the current "root_list" can be used without locking.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:44:49 -05:00
Alexander Aring
455597a55f dlm: switch to GFP_ATOMIC in dlm allocations
Replace GFP_NOFS with GFP_ATOMIC.  Also stop using idr_preload which
uses a non-bh spin_lock.  This is further preparation for softirq
message processing.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 11:43:49 -05:00
Alexander Aring
98808644b9 dlm: remove allocation parameter in msg allocation
Remove the context parameter for message allocations and
always use GFP_ATOMIC. This prepares for softirq message
processing.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09 09:58:14 -05:00
Alexander Aring
2bec1bbd55 dlm: remove callback reference counting
Get rid of the unnecessary refcounting on callback structs.
Copy interesting callback info into the lkb struct rather
than maintaining pointers to callback structs from the lkb.
This goes back to the way things were done prior to
commit 61bed0baa4 ("fs: dlm: use a non-static queue for callbacks").

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01 13:31:13 -05:00
Alexander Aring
986ae3c2a8 dlm: fix race between final callback and remove
This patch fixes the following issue:

node 1 is dir
node 2 is master
node 3 is other

1->2: unlock
2: put final lkb, rsb moved to toss
2->1: unlock_reply
1: queue lkb callback with EUNLOCK
2->1: remove
1: receive_remove ignored (rsb on keep because of queued lkb callback)
1: complete lkb callback, put_lkb, move rsb to toss
3->1: lookup
1->3: lookup_reply master=2
3->2: request
2->3: request_reply EBADR

In summary:
An unexpected lkb reference causes the rsb to remain on the wrong list.
The rsb being on the wrong list causes receive_remove to be ignored.
An ignored receive_remove causes inconsistent dir and master state.

This sequence requires an unusually long delay in delivering the unlock
callback, because the remove message from 2->1 usually happens after
some seconds.  So, it's not known exactly how frequently this sequence
occurs in pratice.  It's possible that the same end result could also
have another unknown cause.

The solution for this issue is to further separate callback state
from the lkb, so that an lkb reference (and from that, an rsb ref)
are not held while a callback remains queued.  Then, within the
unlock_reply, the lkb will be freed and the rsb moved to the toss
list. So, the receive_remove will not be ignored.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01 13:31:12 -05:00
David Teigland
c53309b912 dlm: add comments about forced waiters reset
When a lock is waiting for a reply for a remote operation, and recovery
interrupts this "waiters" state, the remote operation is voided by the
recovery, and no reply will be processed.  The lkb waiters state for the
remote operation is forcibly reset/cleared, so that the lock operation
can be restarted after recovery.  Improve the comments describing this.

Signed-off-by: David Teigland <teigland@redhat.com>
2024-03-15 17:11:39 -05:00
David Teigland
484b4f90c2 dlm: revert atomic_t lkb_wait_count
Revert "fs: dlm: handle lkb wait count as atomic_t"
This reverts commit 75a7d60134.

This counter does not need to be atomic.  As the comment in
the reverted commit mentions, the counter is protected by
the rsb lock.

Signed-off-by: David Teigland <teigland@redhat.com>
2024-03-15 14:39:28 -05:00