Merge drm/drm-next into drm-misc-next

Kickstart 6.9 development cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
This commit is contained in:
Maxime Ripard 2024-01-29 14:20:23 +01:00
commit 4db102dcb0
No known key found for this signature in database
GPG key ID: E3EF0D6F671851C5
11106 changed files with 525524 additions and 241183 deletions

View file

@ -82,11 +82,16 @@ ForEachMacros:
- '__for_each_thread'
- '__hlist_for_each_rcu'
- '__map__for_each_symbol_by_name'
- '__pci_bus_for_each_res0'
- '__pci_bus_for_each_res1'
- '__pci_dev_for_each_res0'
- '__pci_dev_for_each_res1'
- '__perf_evlist__for_each_entry'
- '__perf_evlist__for_each_entry_reverse'
- '__perf_evlist__for_each_entry_safe'
- '__rq_for_each_bio'
- '__shost_for_each_device'
- '__sym_for_each'
- 'apei_estatus_for_each_section'
- 'ata_for_each_dev'
- 'ata_for_each_link'
@ -105,13 +110,12 @@ ForEachMacros:
- 'bip_for_each_vec'
- 'bond_for_each_slave'
- 'bond_for_each_slave_rcu'
- 'bpf__perf_for_each_map'
- 'bpf__perf_for_each_map_named'
- 'bpf_for_each'
- 'bpf_for_each_reg_in_vstate'
- 'bpf_for_each_reg_in_vstate_mask'
- 'bpf_for_each_spilled_reg'
- 'bpf_object__for_each_map'
- 'bpf_object__for_each_program'
- 'bpf_object__for_each_safe'
- 'bpf_perf_object__for_each'
- 'btree_for_each_safe128'
- 'btree_for_each_safe32'
- 'btree_for_each_safe64'
@ -119,6 +123,7 @@ ForEachMacros:
- 'card_for_each_dev'
- 'cgroup_taskset_for_each'
- 'cgroup_taskset_for_each_leader'
- 'cpu_aggr_map__for_each_idx'
- 'cpufreq_for_each_efficient_entry_idx'
- 'cpufreq_for_each_entry'
- 'cpufreq_for_each_entry_idx'
@ -128,11 +133,14 @@ ForEachMacros:
- 'css_for_each_descendant_post'
- 'css_for_each_descendant_pre'
- 'damon_for_each_region'
- 'damon_for_each_region_from'
- 'damon_for_each_region_safe'
- 'damon_for_each_scheme'
- 'damon_for_each_scheme_safe'
- 'damon_for_each_target'
- 'damon_for_each_target_safe'
- 'damos_for_each_filter'
- 'damos_for_each_filter_safe'
- 'data__for_each_file'
- 'data__for_each_file_new'
- 'data__for_each_file_start'
@ -151,6 +159,8 @@ ForEachMacros:
- 'drm_client_for_each_connector_iter'
- 'drm_client_for_each_modeset'
- 'drm_connector_for_each_possible_encoder'
- 'drm_exec_for_each_locked_object'
- 'drm_exec_for_each_locked_object_reverse'
- 'drm_for_each_bridge_in_chain'
- 'drm_for_each_connector_iter'
- 'drm_for_each_crtc'
@ -162,22 +172,32 @@ ForEachMacros:
- 'drm_for_each_plane'
- 'drm_for_each_plane_mask'
- 'drm_for_each_privobj'
- 'drm_gem_for_each_gpuva'
- 'drm_gem_for_each_gpuva_safe'
- 'drm_gpuva_for_each_op'
- 'drm_gpuva_for_each_op_from_reverse'
- 'drm_gpuva_for_each_op_safe'
- 'drm_gpuvm_for_each_va'
- 'drm_gpuvm_for_each_va_range'
- 'drm_gpuvm_for_each_va_range_safe'
- 'drm_gpuvm_for_each_va_safe'
- 'drm_mm_for_each_hole'
- 'drm_mm_for_each_node'
- 'drm_mm_for_each_node_in_range'
- 'drm_mm_for_each_node_safe'
- 'dsa_switch_for_each_available_port'
- 'dsa_switch_for_each_cpu_port'
- 'dsa_switch_for_each_cpu_port_continue_reverse'
- 'dsa_switch_for_each_port'
- 'dsa_switch_for_each_port_continue_reverse'
- 'dsa_switch_for_each_port_safe'
- 'dsa_switch_for_each_user_port'
- 'dsa_tree_for_each_cpu_port'
- 'dsa_tree_for_each_user_port'
- 'dsa_tree_for_each_user_port_continue_reverse'
- 'dso__for_each_symbol'
- 'dsos__for_each_with_build_id'
- 'elf_hash_for_each_possible'
- 'elf_section__for_each_rel'
- 'elf_section__for_each_rela'
- 'elf_symtab__for_each_symbol'
- 'evlist__for_each_cpu'
- 'evlist__for_each_entry'
@ -186,12 +206,15 @@ ForEachMacros:
- 'evlist__for_each_entry_reverse'
- 'evlist__for_each_entry_safe'
- 'flow_action_for_each'
- 'for_each_acpi_consumer_dev'
- 'for_each_acpi_dev_match'
- 'for_each_active_dev_scope'
- 'for_each_active_drhd_unit'
- 'for_each_active_iommu'
- 'for_each_active_route'
- 'for_each_aggr_pgid'
- 'for_each_and_bit'
- 'for_each_andnot_bit'
- 'for_each_available_child_of_node'
- 'for_each_bench'
- 'for_each_bio'
@ -222,10 +245,13 @@ ForEachMacros:
- 'for_each_compatible_node'
- 'for_each_component_dais'
- 'for_each_component_dais_safe'
- 'for_each_conduit'
- 'for_each_console'
- 'for_each_console_srcu'
- 'for_each_cpu'
- 'for_each_cpu_and'
- 'for_each_cpu_andnot'
- 'for_each_cpu_or'
- 'for_each_cpu_wrap'
- 'for_each_dapm_widgets'
- 'for_each_dedup_cand'
@ -254,9 +280,11 @@ ForEachMacros:
- 'for_each_free_mem_range'
- 'for_each_free_mem_range_reverse'
- 'for_each_func_rsrc'
- 'for_each_group_device'
- 'for_each_gpiochip_node'
- 'for_each_group_evsel'
- 'for_each_group_evsel_head'
- 'for_each_group_member'
- 'for_each_group_member_head'
- 'for_each_hstate'
- 'for_each_if'
- 'for_each_inject_fn'
@ -273,6 +301,7 @@ ForEachMacros:
- 'for_each_lru'
- 'for_each_matching_node'
- 'for_each_matching_node_and_match'
- 'for_each_media_entity_data_link'
- 'for_each_mem_pfn_range'
- 'for_each_mem_range'
- 'for_each_mem_range_rev'
@ -281,6 +310,8 @@ ForEachMacros:
- 'for_each_memory'
- 'for_each_migratetype_order'
- 'for_each_missing_reg'
- 'for_each_mle_subelement'
- 'for_each_mod_mem_type'
- 'for_each_net'
- 'for_each_net_continue_reverse'
- 'for_each_net_rcu'
@ -288,6 +319,7 @@ ForEachMacros:
- 'for_each_netdev_continue'
- 'for_each_netdev_continue_rcu'
- 'for_each_netdev_continue_reverse'
- 'for_each_netdev_dump'
- 'for_each_netdev_feature'
- 'for_each_netdev_in_bond_rcu'
- 'for_each_netdev_rcu'
@ -308,6 +340,7 @@ ForEachMacros:
- 'for_each_node_with_cpus'
- 'for_each_node_with_property'
- 'for_each_nonreserved_multicast_dest_pgid'
- 'for_each_numa_hop_mask'
- 'for_each_of_allnodes'
- 'for_each_of_allnodes_from'
- 'for_each_of_cpu_node'
@ -326,6 +359,7 @@ ForEachMacros:
- 'for_each_online_cpu'
- 'for_each_online_node'
- 'for_each_online_pgdat'
- 'for_each_or_bit'
- 'for_each_path'
- 'for_each_pci_bridge'
- 'for_each_pci_dev'
@ -333,6 +367,7 @@ ForEachMacros:
- 'for_each_physmem_range'
- 'for_each_populated_zone'
- 'for_each_possible_cpu'
- 'for_each_present_blessed_reg'
- 'for_each_present_cpu'
- 'for_each_prime_number'
- 'for_each_prime_number_from'
@ -348,7 +383,8 @@ ForEachMacros:
- 'for_each_property_of_node'
- 'for_each_reg'
- 'for_each_reg_filtered'
- 'for_each_registered_fb'
- 'for_each_reloc'
- 'for_each_reloc_from'
- 'for_each_requested_gpio'
- 'for_each_requested_gpio_in_range'
- 'for_each_reserved_mem_range'
@ -357,10 +393,12 @@ ForEachMacros:
- 'for_each_rtd_components'
- 'for_each_rtd_cpu_dais'
- 'for_each_rtd_dais'
- 'for_each_sband_iftype_data'
- 'for_each_script'
- 'for_each_sec'
- 'for_each_set_bit'
- 'for_each_set_bit_from'
- 'for_each_set_bit_wrap'
- 'for_each_set_bitrange'
- 'for_each_set_bitrange_from'
- 'for_each_set_clump8'
@ -371,8 +409,8 @@ ForEachMacros:
- 'for_each_sgtable_dma_sg'
- 'for_each_sgtable_page'
- 'for_each_sgtable_sg'
- 'for_each_shell_test'
- 'for_each_sibling_event'
- 'for_each_sta_active_link'
- 'for_each_subelement'
- 'for_each_subelement_extid'
- 'for_each_subelement_id'
@ -380,10 +418,15 @@ ForEachMacros:
- 'for_each_subsystem'
- 'for_each_supported_activate_fn'
- 'for_each_supported_inject_fn'
- 'for_each_sym'
- 'for_each_test'
- 'for_each_thread'
- 'for_each_token'
- 'for_each_unicast_dest_pgid'
- 'for_each_valid_link'
- 'for_each_vif_active_link'
- 'for_each_vma'
- 'for_each_vma_range'
- 'for_each_vsi'
- 'for_each_wakeup_source'
- 'for_each_zone'
@ -392,10 +435,12 @@ ForEachMacros:
- 'func_for_each_insn'
- 'fwnode_for_each_available_child_node'
- 'fwnode_for_each_child_node'
- 'fwnode_for_each_parent_node'
- 'fwnode_graph_for_each_endpoint'
- 'gadget_for_each_ep'
- 'genradix_for_each'
- 'genradix_for_each_from'
- 'genradix_for_each_reverse'
- 'hash_for_each'
- 'hash_for_each_possible'
- 'hash_for_each_possible_rcu'
@ -439,14 +484,9 @@ ForEachMacros:
- 'in_dev_for_each_ifa_rcu'
- 'in_dev_for_each_ifa_rtnl'
- 'inet_bind_bucket_for_each'
- 'inet_lhash2_for_each_icsk'
- 'inet_lhash2_for_each_icsk_continue'
- 'inet_lhash2_for_each_icsk_rcu'
- 'interval_tree_for_each_double_span'
- 'interval_tree_for_each_span'
- 'intlist__for_each_entry'
- 'intlist__for_each_entry_safe'
- 'iopt_for_each_contig_area'
- 'kcore_copy__for_each_phdr'
- 'key_for_each'
- 'key_for_each_safe'
@ -483,28 +523,38 @@ ForEachMacros:
- 'list_for_each_from'
- 'list_for_each_prev'
- 'list_for_each_prev_safe'
- 'list_for_each_rcu'
- 'list_for_each_reverse'
- 'list_for_each_safe'
- 'llist_for_each'
- 'llist_for_each_entry'
- 'llist_for_each_entry_safe'
- 'llist_for_each_safe'
- 'lwq_for_each_safe'
- 'map__for_each_symbol'
- 'map__for_each_symbol_by_name'
- 'map_for_each_event'
- 'map_for_each_metric'
- 'maps__for_each_entry'
- 'maps__for_each_entry_safe'
- 'mas_for_each'
- 'mci_for_each_dimm'
- 'media_device_for_each_entity'
- 'media_device_for_each_intf'
- 'media_device_for_each_link'
- 'media_device_for_each_pad'
- 'media_entity_for_each_pad'
- 'media_pipeline_for_each_entity'
- 'media_pipeline_for_each_pad'
- 'mlx5_lag_for_each_peer_mdev'
- 'msi_domain_for_each_desc'
- 'msi_for_each_desc'
- 'mt_for_each'
- 'nanddev_io_for_each_page'
- 'netdev_for_each_lower_dev'
- 'netdev_for_each_lower_private'
- 'netdev_for_each_lower_private_rcu'
- 'netdev_for_each_mc_addr'
- 'netdev_for_each_synced_mc_addr'
- 'netdev_for_each_synced_uc_addr'
- 'netdev_for_each_uc_addr'
- 'netdev_for_each_upper_dev_rcu'
- 'netdev_hw_addr_list_for_each'
@ -529,6 +579,7 @@ ForEachMacros:
- 'perf_config_sections__for_each_entry'
- 'perf_config_set__for_each_entry'
- 'perf_cpu_map__for_each_cpu'
- 'perf_cpu_map__for_each_idx'
- 'perf_evlist__for_each_entry'
- 'perf_evlist__for_each_entry_reverse'
- 'perf_evlist__for_each_entry_safe'
@ -538,9 +589,7 @@ ForEachMacros:
- 'perf_hpp_list__for_each_format_safe'
- 'perf_hpp_list__for_each_sort_list'
- 'perf_hpp_list__for_each_sort_list_safe'
- 'perf_pmu__for_each_hybrid_pmu'
- 'ping_portaddr_for_each_entry'
- 'ping_portaddr_for_each_entry_rcu'
- 'perf_tool_event__for_each_event'
- 'plist_for_each'
- 'plist_for_each_continue'
- 'plist_for_each_entry'
@ -577,6 +626,7 @@ ForEachMacros:
- 'rq_for_each_segment'
- 'rq_list_for_each'
- 'rq_list_for_each_safe'
- 'sample_read_group__for_each'
- 'scsi_for_each_prot_sg'
- 'scsi_for_each_sg'
- 'sctp_for_each_hentry'
@ -584,10 +634,12 @@ ForEachMacros:
- 'sec_for_each_insn'
- 'sec_for_each_insn_continue'
- 'sec_for_each_insn_from'
- 'sec_for_each_sym'
- 'shdma_for_each_chan'
- 'shost_for_each_device'
- 'sk_for_each'
- 'sk_for_each_bound'
- 'sk_for_each_bound_bhash2'
- 'sk_for_each_entry_offset_rcu'
- 'sk_for_each_from'
- 'sk_for_each_rcu'
@ -609,6 +661,8 @@ ForEachMacros:
- 'tb_property_for_each'
- 'tcf_act_for_each_action'
- 'tcf_exts_for_each_action'
- 'ttm_resource_manager_for_each_res'
- 'twsk_for_each_bound_bhash2'
- 'udp_portaddr_for_each_entry'
- 'udp_portaddr_for_each_entry_rcu'
- 'usb_hub_for_each_child'

32
.editorconfig Normal file
View file

@ -0,0 +1,32 @@
# SPDX-License-Identifier: GPL-2.0-only
root = true
[{*.{awk,c,dts,dtsi,dtso,h,mk,s,S},Kconfig,Makefile,Makefile.*}]
charset = utf-8
end_of_line = lf
trim_trailing_whitespace = true
insert_final_newline = true
indent_style = tab
indent_size = 8
[*.{json,py,rs}]
charset = utf-8
end_of_line = lf
trim_trailing_whitespace = true
insert_final_newline = true
indent_style = space
indent_size = 4
# this must be below the general *.py to overwrite it
[tools/{perf,power,rcu,testing/kunit}/**.py,]
indent_style = tab
indent_size = 8
[*.yaml]
charset = utf-8
end_of_line = lf
trim_trailing_whitespace = unset
insert_final_newline = true
indent_style = space
indent_size = 2

1
.gitignore vendored
View file

@ -96,6 +96,7 @@ modules.order
#
!.clang-format
!.cocciconfig
!.editorconfig
!.get_maintainer.ignore
!.gitattributes
!.gitignore

View file

@ -117,6 +117,7 @@ Changbin Du <changbin.du@intel.com> <changbin.du@gmail.com>
Changbin Du <changbin.du@intel.com> <changbin.du@intel.com>
Chao Yu <chao@kernel.org> <chao2.yu@samsung.com>
Chao Yu <chao@kernel.org> <yuchao0@huawei.com>
Chester Lin <chester62515@gmail.com> <clin@suse.com>
Chris Chiu <chris.chiu@canonical.com> <chiu@endlessm.com>
Chris Chiu <chris.chiu@canonical.com> <chiu@endlessos.org>
Chris Lew <quic_clew@quicinc.com> <clew@codeaurora.org>
@ -190,6 +191,10 @@ Gao Xiang <xiang@kernel.org> <gaoxiang25@huawei.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@aol.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@linux.alibaba.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@redhat.com>
Geliang Tang <geliang.tang@linux.dev> <geliang.tang@suse.com>
Geliang Tang <geliang.tang@linux.dev> <geliangtang@xiaomi.com>
Geliang Tang <geliang.tang@linux.dev> <geliangtang@gmail.com>
Geliang Tang <geliang.tang@linux.dev> <geliangtang@163.com>
Georgi Djakov <djakov@kernel.org> <georgi.djakov@linaro.org>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <geraldsc@de.ibm.com>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <gerald.schaefer@de.ibm.com>
@ -265,6 +270,9 @@ Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Jernej Skrabec <jernej.skrabec@gmail.com> <jernej.skrabec@siol.net>
Jessica Zhang <quic_jesszhan@quicinc.com> <jesszhan@codeaurora.org>
Jilai Wang <quic_jilaiw@quicinc.com> <jilaiw@codeaurora.org>
Jiri Kosina <jikos@kernel.org> <jikos@jikos.cz>
Jiri Kosina <jikos@kernel.org> <jkosina@suse.cz>
Jiri Kosina <jikos@kernel.org> <jkosina@suse.com>
Jiri Pirko <jiri@resnulli.us> <jiri@nvidia.com>
Jiri Pirko <jiri@resnulli.us> <jiri@mellanox.com>
Jiri Pirko <jiri@resnulli.us> <jpirko@redhat.com>
@ -355,7 +363,6 @@ Maheshwar Ajja <quic_majja@quicinc.com> <majja@codeaurora.org>
Malathi Gottam <quic_mgottam@quicinc.com> <mgottam@codeaurora.org>
Manikanta Pubbisetty <quic_mpubbise@quicinc.com> <mpubbise@codeaurora.org>
Manivannan Sadhasivam <mani@kernel.org> <manivannanece23@gmail.com>
Manivannan Sadhasivam <mani@kernel.org> <manivannan.sadhasivam@linaro.org>
Manoj Basapathi <quic_manojbm@quicinc.com> <manojbm@codeaurora.org>
Marcin Nowakowski <marcin.nowakowski@mips.com> <marcin.nowakowski@imgtec.com>
Marc Zyngier <maz@kernel.org> <marc.zyngier@arm.com>
@ -369,7 +376,7 @@ Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@puri.sm>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> <martyna.szapar-mudlaw@intel.com>
Mathieu Othacehe <m.othacehe@gmail.com>
Mathieu Othacehe <m.othacehe@gmail.com> <othacehe@gnu.org>
Mat Martineau <martineau@kernel.org> <mathew.j.martineau@linux.intel.com>
Mat Martineau <martineau@kernel.org> <mathewm@codeaurora.org>
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
@ -383,9 +390,10 @@ Matthias Fuchs <socketcan@esd.eu> <matthias.fuchs@esd.eu>
Matthieu Baerts <matttbe@kernel.org> <matthieu.baerts@tessares.net>
Matthieu CASTET <castet.matthieu@free.fr>
Matti Vaittinen <mazziesaccount@gmail.com> <matti.vaittinen@fi.rohmeurope.com>
Matt Ranostay <matt.ranostay@konsulko.com> <matt@ranostay.consulting>
Matt Ranostay <mranostay@gmail.com> Matthew Ranostay <mranostay@embeddedalley.com>
Matt Ranostay <mranostay@gmail.com> <matt.ranostay@intel.com>
Matt Ranostay <matt@ranostay.sg> <matt.ranostay@konsulko.com>
Matt Ranostay <matt@ranostay.sg> <matt@ranostay.consulting>
Matt Ranostay <matt@ranostay.sg> Matthew Ranostay <mranostay@embeddedalley.com>
Matt Ranostay <matt@ranostay.sg> <matt.ranostay@intel.com>
Matt Redfearn <matt.redfearn@mips.com> <matt.redfearn@imgtec.com>
Maulik Shah <quic_mkshah@quicinc.com> <mkshah@codeaurora.org>
Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com>
@ -428,6 +436,7 @@ Muna Sinada <quic_msinada@quicinc.com> <msinada@codeaurora.org>
Murali Nalajala <quic_mnalajal@quicinc.com> <mnalajal@codeaurora.org>
Mythri P K <mythripk@ti.com>
Nadia Yvette Chambers <nyc@holomorphy.com> William Lee Irwin III <wli@holomorphy.com>
Naoya Horiguchi <naoya.horiguchi@nec.com> <n-horiguchi@ah.jp.nec.com>
Nathan Chancellor <nathan@kernel.org> <natechancellor@gmail.com>
Neeraj Upadhyay <quic_neeraju@quicinc.com> <neeraju@codeaurora.org>
Neil Armstrong <neil.armstrong@linaro.org> <narmstrong@baylibre.com>
@ -469,6 +478,8 @@ Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.vnet.ibm.com>
Paul E. McKenney <paulmck@kernel.org> <paulmck@us.ibm.com>
Paul Mackerras <paulus@ozlabs.org> <paulus@samba.org>
Paul Mackerras <paulus@ozlabs.org> <paulus@au1.ibm.com>
Paul Moore <paul@paul-moore.com> <paul.moore@hp.com>
Paul Moore <paul@paul-moore.com> <pmoore@redhat.com>
Pavankumar Kondeti <quic_pkondeti@quicinc.com> <pkondeti@codeaurora.org>
Peter A Jonsson <pj@ludd.ltu.se>
Peter Oruba <peter.oruba@amd.com>
@ -493,6 +504,9 @@ Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Ram Chandra Jangir <quic_rjangir@quicinc.com> <rjangir@codeaurora.org>
Randy Dunlap <rdunlap@infradead.org> <rdunlap@xenotime.net>
Randy Dunlap <rdunlap@infradead.org> <randy.dunlap@oracle.com>
Randy Dunlap <rdunlap@infradead.org> <rddunlap@osdl.org>
Randy Dunlap <rdunlap@infradead.org> <randy.dunlap@intel.com>
Ravi Kumar Bokka <quic_rbokka@quicinc.com> <rbokka@codeaurora.org>
Ravi Kumar Siddojigari <quic_rsiddoji@quicinc.com> <rsiddoji@codeaurora.org>
Rémi Denis-Courmont <rdenis@simphalempin.com>
@ -533,6 +547,8 @@ Sebastian Reichel <sre@kernel.org> <sebastian.reichel@collabora.co.uk>
Sebastian Reichel <sre@kernel.org> <sre@debian.org>
Sedat Dilek <sedat.dilek@gmail.com> <sedat.dilek@credativ.de>
Senthilkumar N L <quic_snlakshm@quicinc.com> <snlakshm@codeaurora.org>
Serge Hallyn <sergeh@kernel.org> <serge.hallyn@canonical.com>
Serge Hallyn <sergeh@kernel.org> <serue@us.ibm.com>
Seth Forshee <sforshee@kernel.org> <seth.forshee@canonical.com>
Shannon Nelson <shannon.nelson@amd.com> <snelson@pensando.io>
Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@intel.com>
@ -569,6 +585,7 @@ Surabhi Vishnoi <quic_svishnoi@quicinc.com> <svishnoi@codeaurora.org>
Takashi YOSHII <takashi.yoshii.zj@renesas.com>
Tamizh Chelvam Raja <quic_tamizhr@quicinc.com> <tamizhr@codeaurora.org>
Taniya Das <quic_tdas@quicinc.com> <tdas@codeaurora.org>
Tanzir Hasan <tanzhasanwork@gmail.com> <tanzirh@google.com>
Tejun Heo <htejun@gmail.com>
Tomeu Vizoso <tomeu@tomeuvizoso.net> <tomeu.vizoso@collabora.com>
Thomas Graf <tgraf@suug.ch>
@ -595,6 +612,9 @@ Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Uwe Kleine-König <ukleinek@strlen.de>
Uwe Kleine-König <ukl@pengutronix.de>
Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Vadim Fedorenko <vadim.fedorenko@linux.dev> <vadfed@fb.com>
Vadim Fedorenko <vadim.fedorenko@linux.dev> <vadfed@meta.com>
Vadim Fedorenko <vadim.fedorenko@linux.dev> <vfedorenko@novek.ru>
Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Vara Reddy <quic_varar@quicinc.com> <varar@codeaurora.org>
Varadarajan Narayanan <quic_varada@quicinc.com> <varada@codeaurora.org>
@ -629,4 +649,5 @@ Wolfram Sang <wsa@kernel.org> <w.sang@pengutronix.de>
Wolfram Sang <wsa@kernel.org> <wsa@the-dreams.de>
Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
Yusuke Goda <goda.yusuke@renesas.com>
Zack Rusin <zack.rusin@broadcom.com> <zackr@vmware.com>
Zhu Yanjun <zyjzyj2000@gmail.com> <yanjunz@nvidia.com>

67
CREDITS
View file

@ -9,10 +9,6 @@
Linus
----------
N: Matt Mackal
E: mpm@selenic.com
D: SLOB slab allocator
N: Matti Aarnio
E: mea@nic.funet.fi
D: Alpha systems hacking, IPv6 and other network related stuff
@ -183,6 +179,7 @@ E: ralf@gnu.org
P: 1024/AF7B30C1 CF 97 C2 CC 6D AE A7 FE C8 BA 9C FC 88 DE 32 C3
D: Linux/MIPS port
D: Linux/68k hacker
D: AX25 maintainer
S: Hauptstrasse 19
S: 79837 St. Blasien
S: Germany
@ -323,6 +320,9 @@ N: Ohad Ben Cohen
E: ohad@wizery.com
D: Remote Processor (remoteproc) subsystem
D: Remote Processor Messaging (rpmsg) subsystem
D: Hardware spinlock (hwspinlock) subsystem
D: OMAP hwspinlock driver
D: OMAP remoteproc driver
N: Krzysztof Benedyczak
E: golbi@mat.uni.torun.pl
@ -678,6 +678,10 @@ D: Media subsystem (V4L/DVB) drivers and core
D: EDAC drivers and EDAC 3.0 core rework
S: Brazil
N: Landen Chao
E: Landen.Chao@mediatek.com
D: MT7531 Ethernet switch support
N: Raymond Chen
E: raymondc@microsoft.com
D: Author of Configure script
@ -815,6 +819,10 @@ D: Support for Xircom PGSDB9 (firmware and host driver)
S: Bucharest
S: Romania
N: John Crispin
E: john@phrozen.org
D: MediaTek MT7623 Gigabit ethernet support
N: Laurence Culhane
E: loz@holmes.demon.co.uk
D: Wrote the initial alpha SLIP code
@ -1425,6 +1433,10 @@ S: University of Stellenbosch
S: Stellenbosch, Western Cape
S: South Africa
N: Andy Gross
E: agross@kernel.org
D: Qualcomm SoC subsystem and drivers
N: Grant Grundler
E: grantgrundler@gmail.com
W: http://obmouse.sourceforge.net/
@ -1535,6 +1547,10 @@ N: Andrew Haylett
E: ajh@primag.co.uk
D: Selection mechanism
N: Johan Hedberg
E: johan.hedberg@gmail.com
D: Bluetooth subsystem maintainer
N: Andre Hedrick
E: andre@linux-ide.org
E: andre@linuxdiskcert.org
@ -1572,6 +1588,10 @@ S: Ampferstr. 50 / 4
S: 6020 Innsbruck
S: Austria
N: Mark Hemment
E: markhe@nextd.demon.co.uk
D: SLAB allocator implementation
N: Richard Henderson
E: rth@twiddle.net
E: rth@cygnus.com
@ -1828,6 +1848,13 @@ S: K osmidomkum 723
S: 160 00 Praha 6
S: Czech Republic
N: Michael Kerrisk
E: mtk.manpages@gmail.com
W: https://man7.org/
P: 4096R/3A35CE5E E522 595B 52ED A4E6 BFCC CB5E 8561 9911 3A35 CE5E
D: Maintainer of the Linux man-pages project
D: Linux man pages online, at <https://man7.org/linux/man-pages/>
N: Niels Kristian Bech Jensen
E: nkbj1970@hotmail.com
D: Miscellaneous kernel updates and fixes.
@ -1855,6 +1882,10 @@ D: Fedora kernel maintenance (2003-2014).
D: 'Trinity' and similar fuzz testing work.
D: Misc/Other.
N: Tom Joseph
E: tjoseph@cadence.com
D: Cadence PCIe driver
N: Martin Josfsson
E: gandalf@wlug.westbo.se
P: 1024D/F6B6D3B1 7610 7CED 5C34 4AA6 DBA2 8BE1 5A6D AF95 F6B6 D3B1
@ -2126,6 +2157,10 @@ S: 2213 La Terrace Circle
S: San Jose, CA 95123
S: USA
N: Mike Kravetz
E: mike.kravetz@oracle.com
D: Maintenance and development of the hugetlb subsystem
N: Andreas S. Krebs
E: akrebs@altavista.net
D: CYPRESS CY82C693 chipset IDE, Digital's PC-Alpha 164SX boards
@ -2437,6 +2472,10 @@ D: work on suspend-to-ram/disk, killing duplicates from ioctl32,
D: Altera SoCFPGA and Nokia N900 support.
S: Czech Republic
N: Olivia Mackall
E: olivia@selenic.com
D: SLOB slab allocator
N: Paul Mackerras
E: paulus@samba.org
D: PPP driver
@ -2944,6 +2983,14 @@ D: IPX development and support
N: Venkatesh Pallipadi (Venki)
D: x86/HPET
N: Antti Palosaari
E: crope@iki.fi
D: Various DVB drivers
W: https://palosaari.fi/linux/
S: Yliopistokatu 1 D 513
S: FI-90570 Oulu
S: FINLAND
N: Kyungmin Park
E: kyungmin.park@samsung.com
D: Samsung S5Pv210 and Exynos4210 mobile platforms
@ -3018,6 +3065,10 @@ S: Demonstratsii 8-382
S: Tula 300000
S: Russia
N: Thomas Petazzoni
E: thomas.petazzoni@bootlin.com
D: Driver for the Marvell Armada 370/XP network unit.
N: Gordon Peters
E: GordPeters@smarttech.com
D: Isochronous receive for IEEE 1394 driver (OHCI module).
@ -3916,6 +3967,10 @@ S: 21513 Conradia Ct
S: Cupertino, CA 95014
S: USA
N: Manohar Vanga
E: manohar.vanga@gmail.com
D: VME subsystem maintainer
N: Thibaut Varène
E: hacks+kernel@slashdirt.org
W: http://hacks.slashdirt.org/
@ -4016,6 +4071,10 @@ D: Fixes for the NE/2-driver
D: Miscellaneous MCA-support
D: Cleanup of the Config-files
N: Martyn Welch
E: martyn@welchs.me.uk
D: VME subsystem maintainer
N: Matt Welsh
E: mdw@metalab.unc.edu
W: http://www.cs.berkeley.edu/~mdw

View file

@ -1,4 +1,4 @@
What: /sys/kernel/debug/accel/<n>/addr
What: /sys/kernel/debug/accel/<parent_device>/addr
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -8,34 +8,34 @@ Description: Sets the device address to be used for read or write through
only when the IOMMU is disabled.
The acceptable value is a string that starts with "0x"
What: /sys/kernel/debug/accel/<n>/clk_gate
What: /sys/kernel/debug/accel/<parent_device>/clk_gate
Date: May 2020
KernelVersion: 5.8
Contact: ogabbay@kernel.org
Description: This setting is now deprecated as clock gating is handled solely by the f/w
What: /sys/kernel/debug/accel/<n>/command_buffers
What: /sys/kernel/debug/accel/<parent_device>/command_buffers
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Displays a list with information about the currently allocated
command buffers
What: /sys/kernel/debug/accel/<n>/command_submission
What: /sys/kernel/debug/accel/<parent_device>/command_submission
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Displays a list with information about the currently active
command submissions
What: /sys/kernel/debug/accel/<n>/command_submission_jobs
What: /sys/kernel/debug/accel/<parent_device>/command_submission_jobs
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Displays a list with detailed information about each JOB (CB) of
each active command submission
What: /sys/kernel/debug/accel/<n>/data32
What: /sys/kernel/debug/accel/<parent_device>/data32
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -50,7 +50,7 @@ Description: Allows the root user to read or write directly through the
If the IOMMU is disabled, it also allows the root user to read
or write from the host a device VA of a host mapped memory
What: /sys/kernel/debug/accel/<n>/data64
What: /sys/kernel/debug/accel/<parent_device>/data64
Date: Jan 2020
KernelVersion: 5.6
Contact: ogabbay@kernel.org
@ -65,7 +65,7 @@ Description: Allows the root user to read or write 64 bit data directly
If the IOMMU is disabled, it also allows the root user to read
or write from the host a device VA of a host mapped memory
What: /sys/kernel/debug/accel/<n>/data_dma
What: /sys/kernel/debug/accel/<parent_device>/data_dma
Date: Apr 2021
KernelVersion: 5.13
Contact: ogabbay@kernel.org
@ -83,7 +83,7 @@ Description: Allows the root user to read from the device's internal
workloads.
Only supported on GAUDI at this stage.
What: /sys/kernel/debug/accel/<n>/device
What: /sys/kernel/debug/accel/<parent_device>/device
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -91,14 +91,14 @@ Description: Enables the root user to set the device to specific state.
Valid values are "disable", "enable", "suspend", "resume".
User can read this property to see the valid values
What: /sys/kernel/debug/accel/<n>/device_release_watchdog_timeout
What: /sys/kernel/debug/accel/<parent_device>/device_release_watchdog_timeout
Date: Oct 2022
KernelVersion: 6.2
Contact: ttayar@habana.ai
Description: The watchdog timeout value in seconds for a device release upon
certain error cases, after which the device is reset.
What: /sys/kernel/debug/accel/<n>/dma_size
What: /sys/kernel/debug/accel/<parent_device>/dma_size
Date: Apr 2021
KernelVersion: 5.13
Contact: ogabbay@kernel.org
@ -108,7 +108,7 @@ Description: Specify the size of the DMA transaction when using DMA to read
When the write is finished, the user can read the "data_dma"
blob
What: /sys/kernel/debug/accel/<n>/dump_razwi_events
What: /sys/kernel/debug/accel/<parent_device>/dump_razwi_events
Date: Aug 2022
KernelVersion: 5.20
Contact: fkassabri@habana.ai
@ -117,7 +117,7 @@ Description: Dumps all razwi events to dmesg if exist.
the routine will clear the status register.
Usage: cat dump_razwi_events
What: /sys/kernel/debug/accel/<n>/dump_security_violations
What: /sys/kernel/debug/accel/<parent_device>/dump_security_violations
Date: Jan 2021
KernelVersion: 5.12
Contact: ogabbay@kernel.org
@ -125,14 +125,14 @@ Description: Dumps all security violations to dmesg. This will also ack
all security violations meanings those violations will not be
dumped next time user calls this API
What: /sys/kernel/debug/accel/<n>/engines
What: /sys/kernel/debug/accel/<parent_device>/engines
Date: Jul 2019
KernelVersion: 5.3
Contact: ogabbay@kernel.org
Description: Displays the status registers values of the device engines and
their derived idle status
What: /sys/kernel/debug/accel/<n>/i2c_addr
What: /sys/kernel/debug/accel/<parent_device>/i2c_addr
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -140,7 +140,7 @@ Description: Sets I2C device address for I2C transaction that is generated
by the device's CPU, Not available when device is loaded with secured
firmware
What: /sys/kernel/debug/accel/<n>/i2c_bus
What: /sys/kernel/debug/accel/<parent_device>/i2c_bus
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -148,7 +148,7 @@ Description: Sets I2C bus address for I2C transaction that is generated by
the device's CPU, Not available when device is loaded with secured
firmware
What: /sys/kernel/debug/accel/<n>/i2c_data
What: /sys/kernel/debug/accel/<parent_device>/i2c_data
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -157,7 +157,7 @@ Description: Triggers an I2C transaction that is generated by the device's
reading from the file generates a read transaction, Not available
when device is loaded with secured firmware
What: /sys/kernel/debug/accel/<n>/i2c_len
What: /sys/kernel/debug/accel/<parent_device>/i2c_len
Date: Dec 2021
KernelVersion: 5.17
Contact: obitton@habana.ai
@ -165,7 +165,7 @@ Description: Sets I2C length in bytes for I2C transaction that is generated b
the device's CPU, Not available when device is loaded with secured
firmware
What: /sys/kernel/debug/accel/<n>/i2c_reg
What: /sys/kernel/debug/accel/<parent_device>/i2c_reg
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -173,35 +173,35 @@ Description: Sets I2C register id for I2C transaction that is generated by
the device's CPU, Not available when device is loaded with secured
firmware
What: /sys/kernel/debug/accel/<n>/led0
What: /sys/kernel/debug/accel/<parent_device>/led0
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Sets the state of the first S/W led on the device, Not available
when device is loaded with secured firmware
What: /sys/kernel/debug/accel/<n>/led1
What: /sys/kernel/debug/accel/<parent_device>/led1
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Sets the state of the second S/W led on the device, Not available
when device is loaded with secured firmware
What: /sys/kernel/debug/accel/<n>/led2
What: /sys/kernel/debug/accel/<parent_device>/led2
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Sets the state of the third S/W led on the device, Not available
when device is loaded with secured firmware
What: /sys/kernel/debug/accel/<n>/memory_scrub
What: /sys/kernel/debug/accel/<parent_device>/memory_scrub
Date: May 2022
KernelVersion: 5.19
Contact: dhirschfeld@habana.ai
Description: Allows the root user to scrub the dram memory. The scrubbing
value can be set using the debugfs file memory_scrub_val.
What: /sys/kernel/debug/accel/<n>/memory_scrub_val
What: /sys/kernel/debug/accel/<parent_device>/memory_scrub_val
Date: May 2022
KernelVersion: 5.19
Contact: dhirschfeld@habana.ai
@ -209,7 +209,7 @@ Description: The value to which the dram will be set to when the user
scrubs the dram using 'memory_scrub' debugfs file and
the scrubbing value when using module param 'memory_scrub'
What: /sys/kernel/debug/accel/<n>/mmu
What: /sys/kernel/debug/accel/<parent_device>/mmu
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -219,7 +219,7 @@ Description: Displays the hop values and physical address for a given ASID
e.g. to display info about VA 0x1000 for ASID 1 you need to do:
echo "1 0x1000" > /sys/kernel/debug/accel/0/mmu
What: /sys/kernel/debug/accel/<n>/mmu_error
What: /sys/kernel/debug/accel/<parent_device>/mmu_error
Date: Mar 2021
KernelVersion: 5.12
Contact: fkassabri@habana.ai
@ -229,7 +229,7 @@ Description: Check and display page fault or access violation mmu errors for
echo "0x200" > /sys/kernel/debug/accel/0/mmu_error
cat /sys/kernel/debug/accel/0/mmu_error
What: /sys/kernel/debug/accel/<n>/monitor_dump
What: /sys/kernel/debug/accel/<parent_device>/monitor_dump
Date: Mar 2022
KernelVersion: 5.19
Contact: osharabi@habana.ai
@ -243,7 +243,7 @@ Description: Allows the root user to dump monitors status from the device's
This interface doesn't support concurrency in the same device.
Only supported on GAUDI.
What: /sys/kernel/debug/accel/<n>/monitor_dump_trig
What: /sys/kernel/debug/accel/<parent_device>/monitor_dump_trig
Date: Mar 2022
KernelVersion: 5.19
Contact: osharabi@habana.ai
@ -253,14 +253,14 @@ Description: Triggers dump of monitor data. The value to trigger the operatio
When the write is finished, the user can read the "monitor_dump"
blob
What: /sys/kernel/debug/accel/<n>/set_power_state
What: /sys/kernel/debug/accel/<parent_device>/set_power_state
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Sets the PCI power state. Valid values are "1" for D0 and "2"
for D3Hot
What: /sys/kernel/debug/accel/<n>/skip_reset_on_timeout
What: /sys/kernel/debug/accel/<parent_device>/skip_reset_on_timeout
Date: Jun 2021
KernelVersion: 5.13
Contact: ynudelman@habana.ai
@ -268,7 +268,7 @@ Description: Sets the skip reset on timeout option for the device. Value of
"0" means device will be reset in case some CS has timed out,
otherwise it will not be reset.
What: /sys/kernel/debug/accel/<n>/state_dump
What: /sys/kernel/debug/accel/<parent_device>/state_dump
Date: Oct 2021
KernelVersion: 5.15
Contact: ynudelman@habana.ai
@ -279,7 +279,7 @@ Description: Gets the state dump occurring on a CS timeout or failure.
Writing an integer X discards X state dumps, so that the
next read would return X+1-st newest state dump.
What: /sys/kernel/debug/accel/<n>/stop_on_err
What: /sys/kernel/debug/accel/<parent_device>/stop_on_err
Date: Mar 2020
KernelVersion: 5.6
Contact: ogabbay@kernel.org
@ -287,13 +287,13 @@ Description: Sets the stop-on_error option for the device engines. Value of
"0" is for disable, otherwise enable.
Relevant only for GOYA and GAUDI.
What: /sys/kernel/debug/accel/<n>/timeout_locked
What: /sys/kernel/debug/accel/<parent_device>/timeout_locked
Date: Sep 2021
KernelVersion: 5.16
Contact: obitton@habana.ai
Description: Sets the command submission timeout value in seconds.
What: /sys/kernel/debug/accel/<n>/userptr
What: /sys/kernel/debug/accel/<parent_device>/userptr
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org
@ -301,7 +301,7 @@ Description: Displays a list with information about the current user
pointers (user virtual addresses) that are pinned and mapped
to DMA addresses
What: /sys/kernel/debug/accel/<n>/userptr_lookup
What: /sys/kernel/debug/accel/<parent_device>/userptr_lookup
Date: Oct 2021
KernelVersion: 5.15
Contact: ogabbay@kernel.org
@ -309,7 +309,7 @@ Description: Allows to search for specific user pointers (user virtual
addresses) that are pinned and mapped to DMA addresses, and see
their resolution to the specific dma address.
What: /sys/kernel/debug/accel/<n>/vm
What: /sys/kernel/debug/accel/<parent_device>/vm
Date: Jan 2019
KernelVersion: 5.1
Contact: ogabbay@kernel.org

View file

@ -0,0 +1,228 @@
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/control
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RW) Enables/disables the reporting of telemetry metrics.
Allowed values to write:
========================
* 0: disable telemetry
* 1: enable telemetry
* 2, 3, 4: enable telemetry and calculate minimum, maximum
and average for each counter over 2, 3 or 4 samples
Returned values:
================
* 1-4: telemetry is enabled and running
* 0: telemetry is disabled
Example.
Writing '3' to this file starts the collection of
telemetry metrics. Samples are collected every second and
stored in a circular buffer of size 3. These values are then
used to calculate the minimum, maximum and average for each
counter. After enabling, counters can be retrieved through
the ``device_data`` file::
echo 3 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
Writing '0' to this file stops the collection of telemetry
metrics::
echo 0 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
This attribute is only available for qat_4xxx devices.
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/device_data
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RO) Reports device telemetry counters.
Reads report metrics about performance and utilization of
a QAT device:
======================= ========================================
Field Description
======================= ========================================
sample_cnt number of acquisitions of telemetry data
from the device. Reads are performed
every 1000 ms.
pci_trans_cnt number of PCIe partial transactions
max_rd_lat maximum logged read latency [ns] (could
be any read operation)
rd_lat_acc_avg average read latency [ns]
max_gp_lat max get to put latency [ns] (only takes
samples for AE0)
gp_lat_acc_avg average get to put latency [ns]
bw_in PCIe, write bandwidth [Mbps]
bw_out PCIe, read bandwidth [Mbps]
at_page_req_lat_avg Address Translator(AT), average page
request latency [ns]
at_trans_lat_avg AT, average page translation latency [ns]
at_max_tlb_used AT, maximum uTLB used
util_cpr<N> utilization of Compression slice N [%]
exec_cpr<N> execution count of Compression slice N
util_xlt<N> utilization of Translator slice N [%]
exec_xlt<N> execution count of Translator slice N
util_dcpr<N> utilization of Decompression slice N [%]
exec_dcpr<N> execution count of Decompression slice N
util_pke<N> utilization of PKE N [%]
exec_pke<N> execution count of PKE N
util_ucs<N> utilization of UCS slice N [%]
exec_ucs<N> execution count of UCS slice N
util_wat<N> utilization of Wireless Authentication
slice N [%]
exec_wat<N> execution count of Wireless Authentication
slice N
util_wcp<N> utilization of Wireless Cipher slice N [%]
exec_wcp<N> execution count of Wireless Cipher slice N
util_cph<N> utilization of Cipher slice N [%]
exec_cph<N> execution count of Cipher slice N
util_ath<N> utilization of Authentication slice N [%]
exec_ath<N> execution count of Authentication slice N
======================= ========================================
The telemetry report file can be read with the following command::
cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/device_data
If ``control`` is set to 1, only the current values of the
counters are displayed::
<counter_name> <current>
If ``control`` is 2, 3 or 4, counters are displayed in the
following format::
<counter_name> <current> <min> <max> <avg>
If a device lacks of a specific accelerator, the corresponding
attribute is not reported.
This attribute is only available for qat_4xxx devices.
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/rp_<A/B/C/D>_data
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RW) Selects up to 4 Ring Pairs (RP) to monitor, one per file,
and report telemetry counters related to each.
Allowed values to write:
========================
* 0 to ``<num_rps - 1>``:
Ring pair to be monitored. The value of ``num_rps`` can be
retrieved through ``/sys/bus/pci/devices/<BDF>/qat/num_rps``.
See Documentation/ABI/testing/sysfs-driver-qat.
Reads report metrics about performance and utilization of
the selected RP:
======================= ========================================
Field Description
======================= ========================================
sample_cnt number of acquisitions of telemetry data
from the device. Reads are performed
every 1000 ms
rp_num RP number associated with slot <A/B/C/D>
service_type service associated to the RP
pci_trans_cnt number of PCIe partial transactions
gp_lat_acc_avg average get to put latency [ns]
bw_in PCIe, write bandwidth [Mbps]
bw_out PCIe, read bandwidth [Mbps]
at_glob_devtlb_hit Message descriptor DevTLB hit rate
at_glob_devtlb_miss Message descriptor DevTLB miss rate
tl_at_payld_devtlb_hit Payload DevTLB hit rate
tl_at_payld_devtlb_miss Payload DevTLB miss rate
======================= ========================================
Example.
Writing the value '32' to the file ``rp_C_data`` starts the
collection of telemetry metrics for ring pair 32::
echo 32 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/rp_C_data
Once a ring pair is selected, statistics can be read accessing
the file::
cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/rp_C_data
If ``control`` is set to 1, only the current values of the
counters are displayed::
<counter_name> <current>
If ``control`` is 2, 3 or 4, counters are displayed in the
following format::
<counter_name> <current> <min> <max> <avg>
On QAT GEN4 devices there are 64 RPs on a PF, so the allowed
values are 0..63. This number is absolute to the device.
If Virtual Functions (VF) are used, the ring pair number can
be derived from the Bus, Device, Function of the VF:
============ ====== ====== ====== ======
PCI BDF/VF RP0 RP1 RP2 RP3
============ ====== ====== ====== ======
0000:6b:0.1 RP 0 RP 1 RP 2 RP 3
0000:6b:0.2 RP 4 RP 5 RP 6 RP 7
0000:6b:0.3 RP 8 RP 9 RP 10 RP 11
0000:6b:0.4 RP 12 RP 13 RP 14 RP 15
0000:6b:0.5 RP 16 RP 17 RP 18 RP 19
0000:6b:0.6 RP 20 RP 21 RP 22 RP 23
0000:6b:0.7 RP 24 RP 25 RP 26 RP 27
0000:6b:1.0 RP 28 RP 29 RP 30 RP 31
0000:6b:1.1 RP 32 RP 33 RP 34 RP 35
0000:6b:1.2 RP 36 RP 37 RP 38 RP 39
0000:6b:1.3 RP 40 RP 41 RP 42 RP 43
0000:6b:1.4 RP 44 RP 45 RP 46 RP 47
0000:6b:1.5 RP 48 RP 49 RP 50 RP 51
0000:6b:1.6 RP 52 RP 53 RP 54 RP 55
0000:6b:1.7 RP 56 RP 57 RP 58 RP 59
0000:6b:2.0 RP 60 RP 61 RP 62 RP 63
============ ====== ====== ====== ======
The mapping is only valid for the BDFs of VFs on the host.
The service provided on a ring-pair varies depending on the
configuration. The configuration for a given device can be
queried and set using ``cfg_services``.
See Documentation/ABI/testing/sysfs-driver-qat for details.
The following table reports how ring pairs are mapped to VFs
on the PF 0000:6b:0.0 configured for `sym;asym` or `asym;sym`:
=========== ============ =========== ============ ===========
PCI BDF/VF RP0/service RP1/service RP2/service RP3/service
=========== ============ =========== ============ ===========
0000:6b:0.1 RP 0 asym RP 1 sym RP 2 asym RP 3 sym
0000:6b:0.2 RP 4 asym RP 5 sym RP 6 asym RP 7 sym
0000:6b:0.3 RP 8 asym RP 9 sym RP10 asym RP11 sym
... ... ... ... ...
=========== ============ =========== ============ ===========
All VFs follow the same pattern.
The following table reports how ring pairs are mapped to VFs on
the PF 0000:6b:0.0 configured for `dc`:
=========== ============ =========== ============ ===========
PCI BDF/VF RP0/service RP1/service RP2/service RP3/service
=========== ============ =========== ============ ===========
0000:6b:0.1 RP 0 dc RP 1 dc RP 2 dc RP 3 dc
0000:6b:0.2 RP 4 dc RP 5 dc RP 6 dc RP 7 dc
0000:6b:0.3 RP 8 dc RP 9 dc RP10 dc RP11 dc
... ... ... ... ...
=========== ============ =========== ============ ===========
The mapping of a RP to a service can be retrieved using
``rp2srv`` from sysfs.
See Documentation/ABI/testing/sysfs-driver-qat for details.
This attribute is only available for qat_4xxx devices.

View file

@ -101,7 +101,7 @@ What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/status
Date: Apr 2020
Contact: linux-crypto@vger.kernel.org
Description: Dump the status of the QM.
Four states: initiated, started, stopped and closed.
Two states: work, stop.
Available for both PF and VF, and take no other effect on HPRE.
What: /sys/kernel/debug/hisi_hpre/<bdf>/qm/diff_regs

View file

@ -81,7 +81,7 @@ What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/status
Date: Apr 2020
Contact: linux-crypto@vger.kernel.org
Description: Dump the status of the QM.
Four states: initiated, started, stopped and closed.
Two states: work, stop.
Available for both PF and VF, and take no other effect on SEC.
What: /sys/kernel/debug/hisi_sec2/<bdf>/qm/diff_regs

View file

@ -94,7 +94,7 @@ What: /sys/kernel/debug/hisi_zip/<bdf>/qm/status
Date: Apr 2020
Contact: linux-crypto@vger.kernel.org
Description: Dump the status of the QM.
Four states: initiated, started, stopped and closed.
Two states: work, stop.
Available for both PF and VF, and take no other effect on ZIP.
What: /sys/kernel/debug/hisi_zip/<bdf>/qm/diff_regs

View file

@ -0,0 +1,25 @@
What: /sys/kernel/debug/vfio
Date: December 2023
KernelVersion: 6.8
Contact: Longfang Liu <liulongfang@huawei.com>
Description: This debugfs file directory is used for debugging
of vfio devices, it's a common directory for all vfio devices.
Vfio core will create a device subdirectory under this
directory.
What: /sys/kernel/debug/vfio/<device>/migration
Date: December 2023
KernelVersion: 6.8
Contact: Longfang Liu <liulongfang@huawei.com>
Description: This debugfs file directory is used for debugging
of vfio devices that support live migration.
The debugfs of each vfio device that supports live migration
could be created under this directory.
What: /sys/kernel/debug/vfio/<device>/migration/state
Date: December 2023
KernelVersion: 6.8
Contact: Longfang Liu <liulongfang@huawei.com>
Description: Read the live migration status of the vfio device.
The contents of the state file reflects the migration state
relative to those defined in the vfio_device_mig_state enum

View file

@ -98,6 +98,13 @@ Description:
# echo 1 > /sys/bus/cdx/devices/.../remove
What: /sys/bus/cdx/devices/.../resource<N>
Date: July 2023
Contact: puneet.gupta@amd.com
Description:
The resource binary file contains the content of the memory
regions. These files can be m'maped from userspace.
What: /sys/bus/cdx/devices/.../modalias
Date: July 2023
Contact: nipun.gupta@amd.com

View file

@ -91,3 +91,19 @@ Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (RW) Size of the trace buffer for TMC-ETR when used in SYSFS
mode. Writable only for TMC-ETR configurations. The value
should be aligned to the kernel pagesize.
What: /sys/bus/coresight/devices/<memory_map>.tmc/buf_modes_available
Date: August 2023
KernelVersion: 6.7
Contact: Anshuman Khandual <anshuman.khandual@arm.com>
Description: (Read) Shows all supported Coresight TMC-ETR buffer modes available
for the users to configure explicitly. This file is avaialble only
for TMC ETR devices.
What: /sys/bus/coresight/devices/<memory_map>.tmc/buf_mode_preferred
Date: August 2023
KernelVersion: 6.7
Contact: Anshuman Khandual <anshuman.khandual@arm.com>
Description: (RW) Current Coresight TMC-ETR buffer mode selected. But user could
only provide a mode which is supported for a given ETR device. This
file is available only for TMC ETR devices.

View file

@ -11,3 +11,162 @@ Description:
Accepts only one of the 2 values - 1 or 2.
1 : Generate 64 bits data
2 : Generate 32 bits data
What: /sys/bus/coresight/devices/<tpdm-name>/reset_dataset
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(Write) Reset the dataset of the tpdm.
Accepts only one value - 1.
1 : Reset the dataset of the tpdm
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_trig_type
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the trigger type of the DSB for tpdm.
Accepts only one of the 2 values - 0 or 1.
0 : Set the DSB trigger type to false
1 : Set the DSB trigger type to true
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_trig_ts
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the trigger timestamp of the DSB for tpdm.
Accepts only one of the 2 values - 0 or 1.
0 : Set the DSB trigger type to false
1 : Set the DSB trigger type to true
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_mode
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the programming mode of the DSB for tpdm.
Accepts the value needs to be greater than 0. What data
bits do is listed below.
Bit[0:1] : Test mode control bit for choosing the inputs.
Bit[3] : Set to 0 for low performance mode. Set to 1 for high
performance mode.
Bit[4:8] : Select byte lane for high performance mode.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_edge/ctrl_idx
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the index number of the edge detection for the DSB
subunit TPDM. Since there are at most 256 edge detections, this
value ranges from 0 to 255.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_edge/ctrl_val
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
Write a data to control the edge detection corresponding to
the index number. Before writing data to this sysfs file,
"ctrl_idx" should be written first to configure the index
number of the edge detection which needs to be controlled.
Accepts only one of the following values.
0 - Rising edge detection
1 - Falling edge detection
2 - Rising and falling edge detection (toggle detection)
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_edge/ctrl_mask
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
Write a data to mask the edge detection corresponding to the index
number. Before writing data to this sysfs file, "ctrl_idx" should
be written first to configure the index number of the edge detection
which needs to be masked.
Accepts only one of the 2 values - 0 or 1.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_edge/edcr[0:15]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
Read a set of the edge control value of the DSB in TPDM.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_edge/edcmr[0:7]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
Read a set of the edge control mask of the DSB in TPDM.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_trig_patt/xpr[0:7]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the value of the trigger pattern for the DSB
subunit TPDM.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_trig_patt/xpmr[0:7]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the mask of the trigger pattern for the DSB
subunit TPDM.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_patt/tpr[0:7]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the value of the pattern for the DSB subunit TPDM.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_patt/tpmr[0:7]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the mask of the pattern for the DSB subunit TPDM.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_patt/enable_ts
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(Write) Set the pattern timestamp of DSB tpdm. Read
the pattern timestamp of DSB tpdm.
Accepts only one of the 2 values - 0 or 1.
0 : Disable DSB pattern timestamp.
1 : Enable DSB pattern timestamp.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_patt/set_type
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(Write) Set the pattern type of DSB tpdm. Read
the pattern type of DSB tpdm.
Accepts only one of the 2 values - 0 or 1.
0 : Set the DSB pattern type to value.
1 : Set the DSB pattern type to toggle.
What: /sys/bus/coresight/devices/<tpdm-name>/dsb_msr/msr[0:31]
Date: March 2023
KernelVersion 6.7
Contact: Jinlong Mao (QUIC) <quic_jinlmao@quicinc.com>, Tao Zhang (QUIC) <quic_taozha@quicinc.com>
Description:
(RW) Set/Get the MSR(mux select register) for the DSB subunit
TPDM.

View file

@ -28,6 +28,23 @@ Description:
Payload in the CXL-2.0 specification.
What: /sys/bus/cxl/devices/memX/ram/qos_class
Date: May, 2023
KernelVersion: v6.8
Contact: linux-cxl@vger.kernel.org
Description:
(RO) For CXL host platforms that support "QoS Telemmetry"
this attribute conveys a comma delimited list of platform
specific cookies that identifies a QoS performance class
for the volatile partition of the CXL mem device. These
class-ids can be compared against a similar "qos_class"
published for a root decoder. While it is not required
that the endpoints map their local memory-class to a
matching platform class, mismatches are not recommended
and there are platform specific performance related
side-effects that may result. First class-id is displayed.
What: /sys/bus/cxl/devices/memX/pmem/size
Date: December, 2020
KernelVersion: v5.12
@ -38,6 +55,23 @@ Description:
Payload in the CXL-2.0 specification.
What: /sys/bus/cxl/devices/memX/pmem/qos_class
Date: May, 2023
KernelVersion: v6.8
Contact: linux-cxl@vger.kernel.org
Description:
(RO) For CXL host platforms that support "QoS Telemmetry"
this attribute conveys a comma delimited list of platform
specific cookies that identifies a QoS performance class
for the persistent partition of the CXL mem device. These
class-ids can be compared against a similar "qos_class"
published for a root decoder. While it is not required
that the endpoints map their local memory-class to a
matching platform class, mismatches are not recommended
and there are platform specific performance related
side-effects that may result. First class-id is displayed.
What: /sys/bus/cxl/devices/memX/serial
Date: January, 2022
KernelVersion: v5.18

View file

@ -16,3 +16,9 @@ Description:
Example output in powerpc:
grep . /sys/bus/event_source/devices/cpu/caps/*
/sys/bus/event_source/devices/cpu/caps/pmu_name:POWER9
The "branch_counter_nr" in the supported platform exposes the
maximum number of counters which can be shown in the u64 counters
of PERF_SAMPLE_BRANCH_COUNTERS, while the "branch_counter_width"
exposes the width of each counter. Both of them can be used by
the perf tool to parse the logged counters in each branch.

View file

@ -88,6 +88,21 @@ Description:
This entry describes the HDRCAP of the master controller
driving the bus.
What: /sys/bus/i3c/devices/i3c-<bus-id>/hotjoin
KernelVersion: 6.8
Contact: linux-i3c@vger.kernel.org
Description:
I3Cs Hot-Join mechanism allows an I3C Device to inform the
Active Controller that a newly-joined Target is present on the
I3C Bus and is ready to receive a Dynamic Address, in order to
become fully functional on the Bus. Hot-Join is used when the
Target is mounted on the same I3C bus and remains depowered
until needed or until the Target is physically inserted into the
I3C bus
This entry allows to enable or disable Hot-join of the Current
Controller driving the bus.
What: /sys/bus/i3c/devices/i3c-<bus-id>/<bus-id>-<device-pid>
KernelVersion: 5.0
Contact: linux-i3c@vger.kernel.org

View file

@ -362,10 +362,21 @@ Description:
What: /sys/bus/iio/devices/iio:deviceX/in_accel_x_peak_raw
What: /sys/bus/iio/devices/iio:deviceX/in_accel_y_peak_raw
What: /sys/bus/iio/devices/iio:deviceX/in_accel_z_peak_raw
What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_peak_raw
What: /sys/bus/iio/devices/iio:deviceX/in_temp_peak_raw
KernelVersion: 2.6.36
Contact: linux-iio@vger.kernel.org
Description:
Highest value since some reset condition. These
Highest value since some reset condition. These
attributes allow access to this and are otherwise
the direct equivalent of the <type>Y[_name]_raw attributes.
What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_trough_raw
What: /sys/bus/iio/devices/iio:deviceX/in_temp_trough_raw
KernelVersion: 6.7
Contact: linux-iio@vger.kernel.org
Description:
Lowest value since some reset condition. These
attributes allow access to this and are otherwise
the direct equivalent of the <type>Y[_name]_raw attributes.
@ -618,7 +629,9 @@ KernelVersion: 2.6.35
Contact: linux-iio@vger.kernel.org
Description:
If a discrete set of scale values is available, they
are listed in this attribute.
are listed in this attribute. Unlike illumination,
multiplying intensity by intensity_scale does not
yield value with any standardized unit.
What: /sys/bus/iio/devices/iio:deviceX/out_voltageY_hardwaregain
What: /sys/bus/iio/devices/iio:deviceX/in_intensity_hardwaregain
@ -1574,6 +1587,8 @@ What: /sys/.../iio:deviceX/in_intensityY_raw
What: /sys/.../iio:deviceX/in_intensityY_ir_raw
What: /sys/.../iio:deviceX/in_intensityY_both_raw
What: /sys/.../iio:deviceX/in_intensityY_uv_raw
What: /sys/.../iio:deviceX/in_intensityY_uva_raw
What: /sys/.../iio:deviceX/in_intensityY_uvb_raw
What: /sys/.../iio:deviceX/in_intensityY_duv_raw
KernelVersion: 3.4
Contact: linux-iio@vger.kernel.org
@ -1582,8 +1597,9 @@ Description:
that measurements contain visible and infrared light
components or just infrared light, respectively. Modifier
uv indicates that measurements contain ultraviolet light
components. Modifier duv indicates that measurements
contain deep ultraviolet light components.
components. Modifiers uva, uvb and duv indicate that
measurements contain A, B or deep (C) ultraviolet light
components respectively.
What: /sys/.../iio:deviceX/in_uvindex_input
KernelVersion: 4.6
@ -2254,3 +2270,21 @@ Description:
If a label is defined for this event add that to the event
specific attributes. This is useful for userspace to be able to
better identify an individual event.
What: /sys/.../events/in_accel_gesture_tap_wait_timeout
KernelVersion: 6.7
Contact: linux-iio@vger.kernel.org
Description:
Enable tap gesture confirmation with timeout.
What: /sys/.../events/in_accel_gesture_tap_wait_dur
KernelVersion: 6.7
Contact: linux-iio@vger.kernel.org
Description:
Timeout value in seconds for tap gesture confirmation.
What: /sys/.../events/in_accel_gesture_tap_wait_dur_available
KernelVersion: 6.7
Contact: linux-iio@vger.kernel.org
Description:
List of available timeout value for tap gesture confirmation.

View file

@ -6,3 +6,12 @@ Description:
OP-TEE bus provides reference to registered drivers under this directory. The <uuid>
matches Trusted Application (TA) driver and corresponding TA in secure OS. Drivers
are free to create needed API under optee-ta-<uuid> directory.
What: /sys/bus/tee/devices/optee-ta-<uuid>/need_supplicant
Date: November 2023
KernelVersion: 6.7
Contact: op-tee@lists.trustedfirmware.org
Description:
Allows to distinguish whether an OP-TEE based TA/device requires user-space
tee-supplicant to function properly or not. This attribute will be present for
devices which depend on tee-supplicant to be running.

View file

@ -25,6 +25,9 @@ KernelVersion: 5.14
Contact: linux-mtd@lists.infradead.org
Description: (RO) Part name of the SPI NOR flash.
The attribute is optional. User space should not rely on
it to be present or even correct. Instead, user space
should read the jedec_id attribute.
What: /sys/bus/spi/devices/.../spi-nor/sfdp
Date: April 2021

View file

@ -52,6 +52,9 @@ Description:
echo 0 > /sys/class/devfreq/.../trans_stat
If the transition table is bigger than PAGE_SIZE, reading
this will return an -EFBIG error.
What: /sys/class/devfreq/.../available_frequencies
Date: October 2012
Contact: Nishanth Menon <nm@ti.com>

View file

@ -381,6 +381,15 @@ Description:
RW
What: /sys/class/hwmon/hwmonX/tempY_max_alarm
Description:
Maximum temperature alarm flag.
- 0: OK
- 1: temperature has reached tempY_max
RO
What: /sys/class/hwmon/hwmonX/tempY_min
Description:
Temperature min value.
@ -389,6 +398,15 @@ Description:
RW
What: /sys/class/hwmon/hwmonX/tempY_min_alarm
Description:
Minimum temperature alarm flag.
- 0: OK
- 1: temperature has reached tempY_min
RO
What: /sys/class/hwmon/hwmonX/tempY_max_hyst
Description:
Temperature hysteresis value for max limit.
@ -434,12 +452,7 @@ Description:
- 0: OK
- 1: temperature has reached tempY_crit
RW
Contrary to regular alarm flags which clear themselves
automatically when read, this one sticks until cleared by
the user. This is done by writing 0 to the file. Writing
other values is unsupported.
RO
What: /sys/class/hwmon/hwmonX/tempY_crit_hyst
Description:
@ -462,6 +475,15 @@ Description:
RW
What: /sys/class/hwmon/hwmonX/tempY_emergency_alarm
Description:
Emergency high temperature alarm flag.
- 0: OK
- 1: temperature has reached tempY_emergency
RO
What: /sys/class/hwmon/hwmonX/tempY_emergency_hyst
Description:
Temperature hysteresis value for emergency limit.
@ -887,15 +909,15 @@ Description:
RW
What: /sys/class/hwmon/hwmonX/humidityY_input
What: /sys/class/hwmon/hwmonX/humidityY_alarm
Description:
Humidity
Humidity limit detection
Unit: milli-percent (per cent mille, pcm)
- 0: OK
- 1: Humidity limit has been reached
RO
What: /sys/class/hwmon/hwmonX/humidityY_enable
Description:
Enable or disable the sensors
@ -908,6 +930,74 @@ Description:
RW
What: /sys/class/hwmon/hwmonX/humidityY_fault
Description:
Reports a humidity sensor failure.
- 1: Failed
- 0: Ok
RO
What: /sys/class/hwmon/hwmonX/humidityY_input
Description:
Humidity
Unit: milli-percent (per cent mille, pcm)
RO
What: /sys/class/hwmon/hwmonX/humidityY_label
Description:
Suggested humidity channel label.
Text string
Should only be created if the driver has hints about what
this humidity channel is being used for, and user-space
doesn't. In all other cases, the label is provided by
user-space.
RO
What: /sys/class/hwmon/hwmonX/humidityY_max
Description:
Humidity max value.
Unit: milli-percent (per cent mille, pcm)
RW
What: /sys/class/hwmon/hwmonX/humidityY_max_hyst
Description:
Humidity hysteresis value for max limit.
Unit: milli-percent (per cent mille, pcm)
Must be reported as an absolute humidity, NOT a delta
from the max value.
RW
What: /sys/class/hwmon/hwmonX/humidityY_min
Description:
Humidity min value.
Unit: milli-percent (per cent mille, pcm)
RW
What: /sys/class/hwmon/hwmonX/humidityY_min_hyst
Description:
Humidity hysteresis value for min limit.
Unit: milli-percent (per cent mille, pcm)
Must be reported as an absolute humidity, NOT a delta
from the min value.
RW
What: /sys/class/hwmon/hwmonX/humidityY_rated_min
Description:
Minimum rated humidity.

View file

@ -59,15 +59,6 @@ Description:
brightness. Reading this file when no hw brightness change
event has happened will return an ENODATA error.
What: /sys/class/leds/<led>/color
Date: June 2023
KernelVersion: 6.5
Description:
Color of the LED.
This is a read-only file. Reading this file returns the color
of the LED as a string (e.g: "red", "green", "multicolor").
What: /sys/class/leds/<led>/trigger
Date: March 2006
KernelVersion: 2.6.17

View file

@ -114,6 +114,45 @@ Description:
speed of 1000Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_2500
Date: Nov 2023
KernelVersion: 6.8
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 2500Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 2500Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_5000
Date: Nov 2023
KernelVersion: 6.8
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 5000Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 5000Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/link_10000
Date: Nov 2023
KernelVersion: 6.8
Contact: linux-leds@vger.kernel.org
Description:
Signal the link speed state of 10000Mbps of the named network device.
If set to 0 (default), the LED's normal state is off.
If set to 1, the LED's normal state reflects the link state
speed of 10000Mbps of the named network device.
Setting this value also immediately changes the LED state.
What: /sys/class/leds/<led>/half_duplex
Date: Jun 2023
KernelVersion: 6.5

View file

@ -4,3 +4,59 @@ KernelVersion: 5.10
Contact: linux-leds@vger.kernel.org
Description:
Specifies the tty device name of the triggering tty
What: /sys/class/leds/<led>/rx
Date: February 2024
KernelVersion: 6.8
Description:
Signal reception (rx) of data on the named tty device.
If set to 0, the LED will not blink on reception.
If set to 1 (default), the LED will blink on reception.
What: /sys/class/leds/<led>/tx
Date: February 2024
KernelVersion: 6.8
Description:
Signal transmission (tx) of data on the named tty device.
If set to 0, the LED will not blink on transmission.
If set to 1 (default), the LED will blink on transmission.
What: /sys/class/leds/<led>/cts
Date: February 2024
KernelVersion: 6.8
Description:
CTS = Clear To Send
DCE is ready to accept data from the DTE.
If the line state is detected, the LED is switched on.
If set to 0 (default), the LED will not evaluate CTS.
If set to 1, the LED will evaluate CTS.
What: /sys/class/leds/<led>/dsr
Date: February 2024
KernelVersion: 6.8
Description:
DSR = Data Set Ready
DCE is ready to receive and send data.
If the line state is detected, the LED is switched on.
If set to 0 (default), the LED will not evaluate DSR.
If set to 1, the LED will evaluate DSR.
What: /sys/class/leds/<led>/dcd
Date: February 2024
KernelVersion: 6.8
Description:
DCD = Data Carrier Detect
DTE is receiving a carrier from the DCE.
If the line state is detected, the LED is switched on.
If set to 0 (default), the LED will not evaluate CAR (DCD).
If set to 1, the LED will evaluate CAR (DCD).
What: /sys/class/leds/<led>/rng
Date: February 2024
KernelVersion: 6.8
Description:
RNG = Ring Indicator
DCE has detected an incoming ring signal on the telephone
line. If the line state is detected, the LED is switched on.
If set to 0 (default), the LED will not evaluate RNG.
If set to 1, the LED will evaluate RNG.

View file

@ -3,7 +3,7 @@ What: /sys/devices/platform/HISI04Bx:00/chipX/linked_full_lane
What: /sys/devices/platform/HISI04Bx:00/chipX/crc_err_cnt
Date: November 2023
KernelVersion: 6.6
Contact: Huisong Li <lihuisong@huawei.org>
Contact: Huisong Li <lihuisong@huawei.com>
Description:
The /sys/devices/platform/HISI04Bx:00/chipX/ directory
contains read-only attributes exposing some summarization
@ -26,7 +26,7 @@ What: /sys/devices/platform/HISI04Bx:00/chipX/dieY/linked_full_lane
What: /sys/devices/platform/HISI04Bx:00/chipX/dieY/crc_err_cnt
Date: November 2023
KernelVersion: 6.6
Contact: Huisong Li <lihuisong@huawei.org>
Contact: Huisong Li <lihuisong@huawei.com>
Description:
The /sys/devices/platform/HISI04Bx:00/chipX/dieY/ directory
contains read-only attributes exposing some summarization
@ -54,7 +54,7 @@ What: /sys/devices/platform/HISI04Bx:00/chipX/dieY/hccsN/lane_mask
What: /sys/devices/platform/HISI04Bx:00/chipX/dieY/hccsN/crc_err_cnt
Date: November 2023
KernelVersion: 6.6
Contact: Huisong Li <lihuisong@huawei.org>
Contact: Huisong Li <lihuisong@huawei.com>
Description:
The /sys/devices/platform/HISI04Bx/chipX/dieX/hccsN/ directory
contains read-only attributes exposing information about

View file

@ -149,6 +149,18 @@ Contact: ogabbay@kernel.org
Description: Displays the current clock frequency, in Hz, of the MME compute
engine. This property is valid only for the Goya ASIC family
What: /sys/class/accel/accel<n>/device/module_id
Date: Nov 2023
KernelVersion: not yet upstreamed
Contact: ogabbay@kernel.org
Description: Displays the device's module id
What: /sys/class/accel/accel<n>/device/parent_device
Date: Nov 2023
KernelVersion: 6.8
Contact: ttayar@habana.ai
Description: Displays the name of the parent device of the accel device
What: /sys/class/accel/accel<n>/device/pci_addr
Date: Jan 2019
KernelVersion: 5.1

View file

@ -0,0 +1,70 @@
What: /sys/devices/.../hwmon/hwmon<i>/power1_max
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RW. Card reactive sustained (PL1) power limit in microwatts.
The power controller will throttle the operating frequency
if the power averaged over a window (typically seconds)
exceeds this limit. A read value of 0 means that the PL1
power limit is disabled, writing 0 disables the
limit. Writing values > 0 and <= TDP will enable the power limit.
Only supported for particular Intel xe graphics platforms.
What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RO. Card default power limit (default TDP setting).
Only supported for particular Intel xe graphics platforms.
What: /sys/devices/.../hwmon/hwmon<i>/power1_crit
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RW. Card reactive critical (I1) power limit in microwatts.
Card reactive critical (I1) power limit in microwatts is exposed
for client products. The power controller will throttle the
operating frequency if the power averaged over a window exceeds
this limit.
Only supported for particular Intel xe graphics platforms.
What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RW. Card reactive critical (I1) power limit in milliamperes.
Card reactive critical (I1) power limit in milliamperes is
exposed for server products. The power controller will throttle
the operating frequency if the power averaged over a window
exceeds this limit.
What: /sys/devices/.../hwmon/hwmon<i>/in0_input
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RO. Current Voltage in millivolt.
Only supported for particular Intel xe graphics platforms.
What: /sys/devices/.../hwmon/hwmon<i>/energy1_input
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RO. Energy input of device in microjoules.
Only supported for particular Intel xe graphics platforms.
What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
Date: October 2023
KernelVersion: 6.6
Contact: intel-xe@lists.freedesktop.org
Description: RW. Sustained power limit interval (Tau in PL1/Tau) in
milliseconds over which sustained power is averaged.
Only supported for particular Intel xe graphics platforms.

View file

@ -1223,6 +1223,55 @@ Description: This file shows the total latency (in micro seconds) of write
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/power_info/lane
What: /sys/bus/platform/devices/*.ufs/power_info/lane
Date: September 2023
Contact: Can Guo <quic_cang@quicinc.com>
Description: This file shows how many lanes are enabled on the UFS link,
i.e., an output 2 means UFS link is operating with 2 lanes.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/power_info/mode
What: /sys/bus/platform/devices/*.ufs/power_info/mode
Date: September 2023
Contact: Can Guo <quic_cang@quicinc.com>
Description: This file shows the PA power mode of UFS.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/power_info/rate
What: /sys/bus/platform/devices/*.ufs/power_info/rate
Date: September 2023
Contact: Can Guo <quic_cang@quicinc.com>
Description: This file shows the speed rate of UFS link.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/power_info/gear
What: /sys/bus/platform/devices/*.ufs/power_info/gear
Date: September 2023
Contact: Can Guo <quic_cang@quicinc.com>
Description: This file shows the gear of UFS link.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/power_info/dev_pm
What: /sys/bus/platform/devices/*.ufs/power_info/dev_pm
Date: September 2023
Contact: Can Guo <quic_cang@quicinc.com>
Description: This file shows the UFS device power mode.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/power_info/link_state
What: /sys/bus/platform/devices/*.ufs/power_info/link_state
Date: September 2023
Contact: Can Guo <quic_cang@quicinc.com>
Description: This file shows the state of UFS link.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/device_descriptor/wb_presv_us_en
What: /sys/bus/platform/devices/*.ufs/device_descriptor/wb_presv_us_en
Date: June 2020
@ -1474,3 +1523,10 @@ Description: Indicates status of Write Booster.
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/rtc_update_ms
What: /sys/bus/platform/devices/*.ufs/rtc_update_ms
Date: November 2023
Contact: Bean Huo <beanhuo@micron.com>
Description:
rtc_update_ms indicates how often the host should synchronize or update the
UFS RTC. If set to 0, this will disable UFS RTC periodic update.

View file

@ -0,0 +1,8 @@
What: /sys/firmware/initrd
Date: December 2023
Contact: Alexander Graf <graf@amazon.com>
Description:
When the kernel was booted with an initrd and the
"retain_initrd" option is set on the kernel command
line, /sys/firmware/initrd contains the contents of the
initrd that the kernel was booted with.

View file

@ -498,6 +498,21 @@ Description: Show status of f2fs checkpoint in real time.
CP_RESIZEFS_FLAG 0x00004000
=============================== ==============================
What: /sys/fs/f2fs/<disk>/stat/issued_discard
Date: December 2023
Contact: "Zhiguo Niu" <zhiguo.niu@unisoc.com>
Description: Shows the number of issued discard.
What: /sys/fs/f2fs/<disk>/stat/queued_discard
Date: December 2023
Contact: "Zhiguo Niu" <zhiguo.niu@unisoc.com>
Description: Shows the number of queued discard.
What: /sys/fs/f2fs/<disk>/stat/undiscard_blks
Date: December 2023
Contact: "Zhiguo Niu" <zhiguo.niu@unisoc.com>
Description: Shows the total number of undiscard blocks.
What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio
Date: January 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
@ -740,3 +755,9 @@ Description: When compress cache is on, it controls cached page
If cached page percent exceed threshold, then deny caching compress page.
The value should be in range of (0, 100], by default it was initialized
as 20(%).
What: /sys/fs/f2fs/<disk>/discard_io_aware
Date: November 2023
Contact: "Chao Yu" <chao@kernel.org>
Description: It controls to enable/disable IO aware feature for background discard.
By default, the value is 1 which indicates IO aware is on.

View file

@ -25,12 +25,14 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
stops, respectively. Reading the file returns the keywords
based on the current status. Writing 'commit' to this file
makes the kdamond reads the user inputs in the sysfs files
except 'state' again. Writing 'update_schemes_stats' to the
file updates contents of schemes stats files of the kdamond.
Writing 'update_schemes_tried_regions' to the file updates
contents of 'tried_regions' directory of every scheme directory
of this kdamond. Writing 'update_schemes_tried_bytes' to the
file updates only '.../tried_regions/total_bytes' files of this
except 'state' again. Writing 'commit_schemes_quota_goals' to
this file makes the kdamond reads the quota goal files again.
Writing 'update_schemes_stats' to the file updates contents of
schemes stats files of the kdamond. Writing
'update_schemes_tried_regions' to the file updates contents of
'tried_regions' directory of every scheme directory of this
kdamond. Writing 'update_schemes_tried_bytes' to the file
updates only '.../tried_regions/total_bytes' files of this
kdamond. Writing 'clear_schemes_tried_regions' to the file
removes contents of the 'tried_regions' directory.
@ -212,6 +214,25 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the quotas
charge reset interval of the scheme in milliseconds.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/nr_goals
Date: Nov 2023
Contact: SeongJae Park <sj@kernel.org>
Description: Writing a number 'N' to this file creates the number of
directories for setting automatic tuning of the scheme's
aggressiveness named '0' to 'N-1' under the goals/ directory.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_value
Date: Nov 2023
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the target
value of the goal metric.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/current_value
Date: Nov 2023
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the current
value of the goal metric.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>

View file

@ -0,0 +1,21 @@
What: /sys/bus/nvmem/devices/.../cells/<cell-name>
Date: May 2023
KernelVersion: 6.5
Contact: Miquel Raynal <miquel.raynal@bootlin.com>
Description:
The "cells" folder contains one file per cell exposed by the
NVMEM device. The name of the file is: <name>@<where>, with
<name> being the cell name and <where> its location in the NVMEM
device, in hexadecimal (without the '0x' prefix, to mimic device
tree node names). The length of the file is the size of the cell
(when known). The content of the file is the binary content of
the cell (may sometimes be ASCII, likely without trailing
character).
Note: This file is only present if CONFIG_NVMEM_SYSFS
is enabled.
Example::
hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d
00000000 54 4e 34 38 4d 2d 50 2d 44 4e |TN48M-P-DN|
0000000a

View file

@ -0,0 +1,29 @@
What: /sys/devices/platform/silicom-platform/uc_version
Date: November 2023
KernelVersion: 6.7
Contact: Henry Shi <henrys@silicom-usa.com>
Description:
This file allows to read microcontroller firmware
version of current platform.
What: /sys/devices/platform/silicom-platform/power_cycle
Date: November 2023
KernelVersion: 6.7
Contact: Henry Shi <henrys@silicom-usa.com>
This file allow user to power cycle the platform.
Default value is 0; when set to 1, it powers down
the platform, waits 5 seconds, then powers on the
device. It returns to default value after power cycle.
0 - default value.
What: /sys/devices/platform/silicom-platform/efuse_status
Date: November 2023
KernelVersion: 6.7
Contact: Henry Shi <henrys@silicom-usa.com>
Description:
This file is read only. It returns the current
OTP status:
0 - not programmed.
1 - programmed.

View file

@ -97,7 +97,21 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
fi
htmldocs:
YNL_INDEX:=$(srctree)/Documentation/networking/netlink_spec/index.rst
YNL_RST_DIR:=$(srctree)/Documentation/networking/netlink_spec
YNL_YAML_DIR:=$(srctree)/Documentation/netlink/specs
YNL_TOOL:=$(srctree)/tools/net/ynl/ynl-gen-rst.py
YNL_RST_FILES_TMP := $(patsubst %.yaml,%.rst,$(wildcard $(YNL_YAML_DIR)/*.yaml))
YNL_RST_FILES := $(patsubst $(YNL_YAML_DIR)%,$(YNL_RST_DIR)%, $(YNL_RST_FILES_TMP))
$(YNL_INDEX): $(YNL_RST_FILES)
$(Q)$(YNL_TOOL) -o $@ -x
$(YNL_RST_DIR)/%.rst: $(YNL_YAML_DIR)/%.yaml $(YNL_TOOL)
$(Q)$(YNL_TOOL) -i $< -o $@
htmldocs: $(YNL_INDEX)
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))

View file

@ -61,7 +61,7 @@ Conditions
==========
The use of threaded interrupts is the most likely condition to trigger
this problem today. Threaded interrupts may not be reenabled after the IRQ
this problem today. Threaded interrupts may not be re-enabled after the IRQ
handler wakes. These "one shot" conditions mean that the threaded interrupt
needs to keep the interrupt line masked until the threaded handler has run.
Especially when dealing with high data rate interrupts, the thread needs to

View file

@ -236,7 +236,7 @@ including a full 'lspci -v' so we can add the quirks to the kernel.
Disabling MSIs below a bridge
-----------------------------
Some PCI bridges are not able to route MSIs between busses properly.
Some PCI bridges are not able to route MSIs between buses properly.
In this case, MSIs must be disabled on all devices behind the bridge.
Some bridges allow you to enable MSIs by changing some bits in their

26
Documentation/RAS/ras.rst Normal file
View file

@ -0,0 +1,26 @@
.. SPDX-License-Identifier: GPL-2.0
Reliability, Availability and Serviceability features
=====================================================
This documents different aspects of the RAS functionality present in the
kernel.
Error decoding
---------------
* x86
Error decoding on AMD systems should be done using the rasdaemon tool:
https://github.com/mchehab/rasdaemon/
While the daemon is running, it would automatically log and decode
errors. If not, one can still decode such errors by supplying the
hardware information from the error::
$ rasdaemon -p --status <STATUS> --ipid <IPID> --smca
Also, the user can pass particular family and model to decode the error
string::
$ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM>

View file

@ -241,15 +241,22 @@ over a rather long period of time, but improvements are always welcome!
srcu_struct. The rules for the expedited RCU grace-period-wait
primitives are the same as for their non-expedited counterparts.
If the updater uses call_rcu_tasks() or synchronize_rcu_tasks(),
then the readers must refrain from executing voluntary
context switches, that is, from blocking. If the updater uses
call_rcu_tasks_trace() or synchronize_rcu_tasks_trace(), then
the corresponding readers must use rcu_read_lock_trace() and
rcu_read_unlock_trace(). If an updater uses call_rcu_tasks_rude()
or synchronize_rcu_tasks_rude(), then the corresponding readers
must use anything that disables preemption, for example,
preempt_disable() and preempt_enable().
Similarly, it is necessary to correctly use the RCU Tasks flavors:
a. If the updater uses synchronize_rcu_tasks() or
call_rcu_tasks(), then the readers must refrain from
executing voluntary context switches, that is, from
blocking.
b. If the updater uses call_rcu_tasks_trace()
or synchronize_rcu_tasks_trace(), then the
corresponding readers must use rcu_read_lock_trace()
and rcu_read_unlock_trace().
c. If an updater uses call_rcu_tasks_rude() or
synchronize_rcu_tasks_rude(), then the corresponding
readers must use anything that disables preemption,
for example, preempt_disable() and preempt_enable().
Mixing things up will result in confusion and broken kernels, and
has even resulted in an exploitable security issue. Therefore,

View file

@ -3,13 +3,26 @@
PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
===============================================================
Most of the time, you can use values from rcu_dereference() or one of
the similar primitives without worries. Dereferencing (prefix "*"),
field selection ("->"), assignment ("="), address-of ("&"), addition and
subtraction of constants, and casts all work quite naturally and safely.
Proper care and feeding of address and data dependencies is critically
important to correct use of things like RCU. To this end, the pointers
returned from the rcu_dereference() family of primitives carry address and
data dependencies. These dependencies extend from the rcu_dereference()
macro's load of the pointer to the later use of that pointer to compute
either the address of a later memory access (representing an address
dependency) or the value written by a later memory access (representing
a data dependency).
It is nevertheless possible to get into trouble with other operations.
Follow these rules to keep your RCU code working properly:
Most of the time, these dependencies are preserved, permitting you to
freely use values from rcu_dereference(). For example, dereferencing
(prefix "*"), field selection ("->"), assignment ("="), address-of
("&"), casts, and addition or subtraction of constants all work quite
naturally and safely. However, because current compilers do not take
either address or data dependencies into account it is still possible
to get into trouble.
Follow these rules to preserve the address and data dependencies emanating
from your calls to rcu_dereference() and friends, thus keeping your RCU
readers working properly:
- You must use one of the rcu_dereference() family of primitives
to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU

View file

@ -185,7 +185,7 @@ argument.
Not all changes require that all scenarios be run. For example, a change
to Tree SRCU might run only the SRCU-N and SRCU-P scenarios using the
--configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'".
Large systems can run multiple copies of of the full set of scenarios,
Large systems can run multiple copies of the full set of scenarios,
for example, a system with 448 hardware threads can run five instances
of the full set concurrently. To make this happen::

View file

@ -7,5 +7,5 @@ marked to be removed at some later point in time.
The description of the interface will document the reason why it is
obsolete and when it can be expected to be removed.
.. kernel-abi:: $srctree/Documentation/ABI/obsolete
.. kernel-abi:: ABI/obsolete
:rst:

View file

@ -1,5 +1,5 @@
ABI removed symbols
===================
.. kernel-abi:: $srctree/Documentation/ABI/removed
.. kernel-abi:: ABI/removed
:rst:

View file

@ -10,5 +10,5 @@ for at least 2 years.
Most interfaces (like syscalls) are expected to never change and always
be available.
.. kernel-abi:: $srctree/Documentation/ABI/stable
.. kernel-abi:: ABI/stable
:rst:

View file

@ -16,5 +16,5 @@ Programs that use these interfaces are strongly encouraged to add their
name to the description of these interfaces, so that the kernel
developers can easily notify them if any changes occur.
.. kernel-abi:: $srctree/Documentation/ABI/testing
.. kernel-abi:: ABI/testing
:rst:

View file

@ -75,4 +75,4 @@ taking two different snapshots of feedback counters at time T1 and T2.
delivered_counter_delta = fbc_t2[del] - fbc_t1[del]
reference_counter_delta = fbc_t2[ref] - fbc_t1[ref]
delivered_perf = (refernce_perf x delivered_counter_delta) / reference_counter_delta
delivered_perf = (reference_perf x delivered_counter_delta) / reference_counter_delta

View file

@ -328,7 +328,7 @@ as idle::
From now on, any pages on zram are idle pages. The idle mark
will be removed until someone requests access of the block.
IOW, unless there is access request, those pages are still idle pages.
Additionally, when CONFIG_ZRAM_MEMORY_TRACKING is enabled pages can be
Additionally, when CONFIG_ZRAM_TRACK_ENTRY_ACTIME is enabled pages can be
marked as idle based on how long (in seconds) it's been since they were
last accessed::

View file

@ -1093,7 +1093,11 @@ All time durations are in microseconds.
A read-write single value file which exists on non-root
cgroups. The default is "100".
The weight in the range [1, 10000].
For non idle groups (cpu.idle = 0), the weight is in the
range [1, 10000].
If the cgroup has been configured to be SCHED_IDLE (cpu.idle = 1),
then the weight will show as a 0.
cpu.weight.nice
A read-write single value file which exists on non-root
@ -1157,6 +1161,16 @@ All time durations are in microseconds.
values similar to the sched_setattr(2). This maximum utilization
value is used to clamp the task specific maximum utilization clamp.
cpu.idle
A read-write single value file which exists on non-root cgroups.
The default is 0.
This is the cgroup analog of the per-task SCHED_IDLE sched policy.
Setting this value to a 1 will make the scheduling policy of the
cgroup SCHED_IDLE. The threads inside the cgroup will retain their
own relative priorities, but the cgroup itself will be treated as
very low priority relative to its peers.
Memory
@ -1679,6 +1693,21 @@ PAGE_SIZE multiple when read back.
limit, it will refuse to take any more stores before existing
entries fault back in or are written out to disk.
memory.zswap.writeback
A read-write single value file. The default value is "1". The
initial value of the root cgroup is 1, and when a new cgroup is
created, it inherits the current value of its parent.
When this is set to 0, all swapping attempts to swapping devices
are disabled. This included both zswap writebacks, and swapping due
to zswap store failures. If the zswap store failures are recurring
(for e.g if the pages are incompressible), users can observe
reclaim inefficiency after disabling writeback (because the same
pages might be rejected again and again).
Note that this is subtly different from setting memory.swap.max to
0, as it still allows for pages to be written to the zswap pool.
memory.pressure
A read-only nested-keyed file.
@ -2316,6 +2345,13 @@ Cpuset Interface Files
treated to have an implicit value of "cpuset.cpus" in the
formation of local partition.
cpuset.cpus.isolated
A read-only and root cgroup only multiple values file.
This file shows the set of all isolated CPUs used in existing
isolated partitions. It will be empty if no isolated partition
is created.
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
@ -2358,11 +2394,11 @@ Cpuset Interface Files
partition or scheduling domain. The set of exclusive CPUs is
determined by the value of its "cpuset.cpus.exclusive.effective".
When set to "isolated", the CPUs in that partition will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
individual CPUs for optimal performance.
When set to "isolated", the CPUs in that partition will be in
an isolated state without any load balancing from the scheduler
and excluded from the unbound workqueues. Tasks placed in such
a partition with multiple CPUs should be carefully distributed
and bound to each of the individual CPUs for optimal performance.
A partition root ("root" or "isolated") can be in one of the
two possible states - valid or invalid. An invalid partition

View file

@ -2,7 +2,8 @@
TODO
====
Version 2.14 December 21, 2018
As of 6.7 kernel. See https://wiki.samba.org/index.php/LinuxCIFSKernel
for list of features added by release
A Partial List of Missing Features
==================================
@ -12,22 +13,22 @@ for visible, important contributions to this module. Here
is a partial list of the known problems and missing features:
a) SMB3 (and SMB3.1.1) missing optional features:
multichannel performance optimizations, algorithmic channel selection,
directory leases optimizations,
support for faster packet signing (GMAC),
support for compression over the network,
T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl
are currently the only two server side copy mechanisms supported)
- multichannel (partially integrated), integration of multichannel with RDMA
- directory leases (improved metadata caching). Currently only implemented for root dir
- T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl
currently the only two server side copy mechanisms supported)
b) Better optimized compounding and error handling for sparse file support,
perhaps addition of new optional SMB3.1.1 fsctls to make collapse range
and insert range more atomic
b) improved sparse file support (fiemap and SEEK_HOLE are implemented
but additional features would be supportable by the protocol such
as FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE)
c) Directory entry caching relies on a 1 second timer, rather than
using Directory Leases, currently only the root file handle is cached longer
by leveraging Directory Leases
c) Support for SMB3.1.1 over QUIC (and perhaps other socket based protocols
like SCTP)
d) quota support (needs minor kernel change since quota calls otherwise
won't make it to network filesystems or deviceless filesystems).
won't make it to network filesystems or deviceless filesystems).
e) Additional use cases can be optimized to use "compounding" (e.g.
open/query/close and open/setinfo/close) to reduce the number of
@ -92,23 +93,20 @@ t) split cifs and smb3 support into separate modules so legacy (and less
v) Additional testing of POSIX Extensions for SMB3.1.1
w) Add support for additional strong encryption types, and additional spnego
authentication mechanisms (see MS-SMB2). GCM-256 is now partially implemented.
w) Support for the Mac SMB3.1.1 extensions to improve interop with Apple servers
x) Finish support for SMB3.1.1 compression
x) Support for additional authentication options (e.g. IAKERB, peer-to-peer
Kerberos, SCRAM and others supported by existing servers)
y) Improved tracing, more eBPF trace points, better scripts for performance
analysis
Known Bugs
==========
See https://bugzilla.samba.org - search on product "CifsVFS" for
current bug list. Also check http://bugzilla.kernel.org (Product = File System, Component = CIFS)
1) existing symbolic links (Windows reparse points) are recognized but
can not be created remotely. They are implemented for Samba and those that
support the CIFS Unix extensions, although earlier versions of Samba
overly restrict the pathnames.
2) follow_link and readdir code does not follow dfs junctions
but recognizes them
and xfstest results e.g. https://wiki.samba.org/index.php/Xfstest-results-smb3
Misc testing to do
==================

View file

@ -81,7 +81,7 @@ much older and less secure than the default dialect SMB3 which includes
many advanced security features such as downgrade attack detection
and encrypted shares and stronger signing and authentication algorithms.
There are additional mount options that may be helpful for SMB3 to get
improved POSIX behavior (NB: can use vers=3.0 to force only SMB3, never 2.1):
improved POSIX behavior (NB: can use vers=3 to force SMB3 or later, never 2.1):
``mfsymlinks`` and either ``cifsacl`` or ``modefromsid`` (usually with ``idsfromsid``)
@ -715,6 +715,7 @@ DebugData Displays information about active CIFS sessions and
Stats Lists summary resource usage information as well as per
share statistics.
open_files List all the open file handles on all active SMB sessions.
mount_params List of all mount parameters available for the module
======================= =======================================================
Configuration pseudo-files:
@ -864,6 +865,11 @@ i.e.::
echo "value" > /sys/module/cifs/parameters/<param>
More detailed descriptions of the available module parameters and their values
can be seen by doing:
modinfo cifs (or modinfo smb3)
================= ==========================================================
1. enable_oplocks Enable or disable oplocks. Oplocks are enabled by default.
[Y/y/1]. To disable use any of [N/n/0].

View file

@ -2704,6 +2704,9 @@
...
185 = /dev/ttyNX15 Hilscher netX serial port 15
186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation
If maximum number of uartlite serial ports is more than 4, then the driver
uses dynamic allocation instead of static allocation for major number.
187 = /dev/ttyUL0 Xilinx uartlite - port 0
...
190 = /dev/ttyUL3 Xilinx uartlite - port 3

View file

@ -321,13 +321,13 @@ Examples
:#> ddcmd 'format "nfsd: READ" +p'
// enable messages in files of which the paths include string "usb"
:#> ddcmd 'file *usb* +p' > /proc/dynamic_debug/control
:#> ddcmd 'file *usb* +p'
// enable all messages
:#> ddcmd '+p' > /proc/dynamic_debug/control
:#> ddcmd '+p'
// add module, function to all enabled messages
:#> ddcmd '+mf' > /proc/dynamic_debug/control
:#> ddcmd '+mf'
// boot-args example, with newlines and comments for readability
Kernel command line: ...

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features
.. kernel-feat:: features

View file

@ -14,10 +14,9 @@ into that core.
To make the most effective use of these mechanisms, you
should download the support software as well. Download the
latest version of the "rng-tools" package from the
hw_random driver's official Web site:
latest version of the "rng-tools" package from:
http://sourceforge.net/projects/gkernel/
https://github.com/nhorman/rng-tools
Those tools use /dev/hwrng to fill the kernel entropy pool,
which is used internally and exported by the /dev/urandom and

View file

@ -119,6 +119,7 @@ configure specific aspects of kernel behavior to your liking.
parport
perf-security
pm/index
pmf
pnp
rapidio
ras

View file

@ -172,7 +172,7 @@ variables.
Offset of the free_list's member. This value is used to compute the number
of free pages.
Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
Each zone has a free_area structure array called free_area[NR_PAGE_ORDERS].
The free_list represents a linked list of free page blocks.
(list_head, next|prev)
@ -189,11 +189,11 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
information. Makedumpfile gets the start address of the vmalloc region
from this.
(zone.free_area, MAX_ORDER + 1)
-------------------------------
(zone.free_area, NR_PAGE_ORDERS)
--------------------------------
Free areas descriptor. User-space tools use this value to iterate the
free_area ranges. MAX_ORDER is used by the zone buddy allocator.
free_area ranges. NR_PAGE_ORDERS is used by the zone buddy allocator.
prb
---

View file

@ -1,3 +1,14 @@
accept_memory= [MM]
Format: { eager | lazy }
default: lazy
By default, unaccepted memory is accepted lazily to
avoid prolonged boot times. The lazy option will add
some runtime overhead until all memory is eventually
accepted. In most cases the overhead is negligible.
For some workloads or for debugging purposes
accept_memory=eager can be used to accept all memory
at once during boot.
acpi= [HW,ACPI,X86,ARM64,RISCV64]
Advanced Configuration and Power Interface
Format: { force | on | off | strict | noirq | rsdt |
@ -877,9 +888,9 @@
memory region [offset, offset + size] for that kernel
image. If '@offset' is omitted, then a suitable offset
is selected automatically.
[KNL, X86-64, ARM64, RISCV] Select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
[KNL, X86-64, ARM64, RISCV, LoongArch] Select a region
under 4G first, and fall back to reserve region above
4G when '@offset' hasn't been specified.
See Documentation/admin-guide/kdump/kdump.rst for further details.
crashkernel=range1:size1[,range2:size2,...][@offset]
@ -890,25 +901,27 @@
Documentation/admin-guide/kdump/kdump.rst for an example.
crashkernel=size[KMG],high
[KNL, X86-64, ARM64, RISCV] range could be above 4G.
[KNL, X86-64, ARM64, RISCV, LoongArch] range could be
above 4G.
Allow kernel to allocate physical memory region from top,
so could be above 4G if system have more than 4G ram
installed. Otherwise memory region will be allocated
below 4G, if available.
It will be ignored if crashkernel=X is specified.
crashkernel=size[KMG],low
[KNL, X86-64, ARM64, RISCV] range under 4G. When crashkernel=X,high
is passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate
[KNL, X86-64, ARM64, RISCV, LoongArch] range under 4G.
When crashkernel=X,high is passed, kernel could allocate
physical memory region above 4G, that cause second kernel
crash on system that require some amount of low memory,
e.g. swiotlb requires at least 64M+32K low memory, also
enough extra low memory is needed to make sure DMA buffers
for 32-bit devices won't run out. Kernel would try to allocate
default size of memory below 4G automatically. The default
size is platform dependent.
--> x86: max(swiotlb_size_or_default() + 8MiB, 256MiB)
--> arm64: 128MiB
--> riscv: 128MiB
--> loongarch: 128MiB
This one lets the user specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
@ -970,17 +983,17 @@
buddy allocator. Bigger value increase the probability
of catching random memory corruption, but reduce the
amount of memory for normal system use. The maximum
possible value is MAX_ORDER/2. Setting this parameter
to 1 or 2 should be enough to identify most random
memory corruption problems caused by bugs in kernel or
driver code when a CPU writes to (or reads from) a
random memory location. Note that there exists a class
of memory corruptions problems caused by buggy H/W or
F/W or by drivers badly programming DMA (basically when
memory is written at bus level and the CPU MMU is
bypassed) which are not detectable by
CONFIG_DEBUG_PAGEALLOC, hence this option will not help
tracking down these problems.
possible value is MAX_PAGE_ORDER/2. Setting this
parameter to 1 or 2 should be enough to identify most
random memory corruption problems caused by bugs in
kernel or driver code when a CPU writes to (or reads
from) a random memory location. Note that there exists
a class of memory corruptions problems caused by buggy
H/W or F/W or by drivers badly programming DMA
(basically when memory is written at bus level and the
CPU MMU is bypassed) which are not detectable by
CONFIG_DEBUG_PAGEALLOC, hence this option will not
help tracking down these problems.
debug_pagealloc=
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this parameter
@ -2438,7 +2451,7 @@
between unregistering the boot console and initializing
the real console.
keepinitrd [HW,ARM]
keepinitrd [HW,ARM] See retain_initrd.
kernelcore= [KNL,X86,IA-64,PPC]
Format: nn[KMGTPE] | nn% | "mirror"
@ -3985,9 +3998,9 @@
vulnerability. System may allow data leaks with this
option.
no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized
steal time accounting. steal time is computed, but
won't influence scheduler behaviour
no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV] Disable
paravirtualized steal time accounting. steal time is
computed, but won't influence scheduler behaviour
nosync [HW,M68K] Disables sync negotiation for all devices.
@ -4136,7 +4149,7 @@
[KNL] Minimal page reporting order
Format: <integer>
Adjust the minimal page reporting order. The page
reporting is disabled when it exceeds MAX_ORDER.
reporting is disabled when it exceeds MAX_PAGE_ORDER.
panic= [KNL] Kernel behaviour on panic: delay <timeout>
timeout > 0: seconds before rebooting
@ -5302,6 +5315,12 @@
Dump ftrace buffer after reporting RCU CPU
stall warning.
rcupdate.rcu_cpu_stall_notifiers= [KNL]
Provide RCU CPU stall notifiers, but see the
warnings in the RCU_CPU_STALL_NOTIFIER Kconfig
option's help text. TL;DR: You almost certainly
do not want rcupdate.rcu_cpu_stall_notifiers.
rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages.
@ -5544,6 +5563,13 @@
print every Nth verbose statement, where N is the value
specified.
regulator_ignore_unused
[REGULATOR]
Prevents regulator framework from disabling regulators
that are unused, due no driver claiming them. This may
be useful for debug and development, but should not be
needed on a platform with proper driver support.
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
@ -5580,7 +5606,8 @@
Useful for devices that are detected asynchronously
(e.g. USB and MMC devices).
retain_initrd [RAM] Keep initrd memory after extraction
retain_initrd [RAM] Keep initrd memory after extraction. After boot, it will
be accessible via /sys/firmware/initrd.
retbleed= [X86] Control mitigation of RETBleed (Arbitrary
Speculative Code Execution with Return Instructions)
@ -6908,6 +6935,9 @@
pause after every control message);
o = USB_QUIRK_HUB_SLOW_RESET (Hub needs extra
delay after resetting its port);
p = USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT
(Reduce timeout of the SET_ADDRESS
request from 5000 ms to 500 ms);
Example: quirks=0781:5580:bk,0a5c:5834:gij
usbhid.mousepoll=

View file

@ -20,16 +20,8 @@ Documentation/driver-api/media/index.rst
- for driver development information and Kernel APIs used by
media devices;
The media subsystem
===================
.. only:: html
.. class:: toc-title
Table of Contents
.. toctree::
:caption: Table of Contents
:maxdepth: 2
:numbered:

View file

@ -0,0 +1,72 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
================================
Starfive Camera Subsystem driver
================================
Introduction
------------
This file documents the driver for the Starfive Camera Subsystem found on
Starfive JH7110 SoC. The driver is located under drivers/staging/media/starfive/
camss.
The driver implements V4L2, Media controller and v4l2_subdev interfaces. Camera
sensor using V4L2 subdev interface in the kernel is supported.
The driver has been successfully used on the Gstreamer 1.18.5 with v4l2src
plugin.
Starfive Camera Subsystem hardware
----------------------------------
The Starfive Camera Subsystem hardware consists of::
|\ +---------------+ +-----------+
+----------+ | \ | | | |
| | | | | | | |
| MIPI |----->| |----->| ISP |----->| |
| | | | | | | |
+----------+ | | | | | Memory |
|MUX| +---------------+ | Interface |
+----------+ | | | |
| | | |---------------------------->| |
| Parallel |----->| | | |
| | | | | |
+----------+ | / | |
|/ +-----------+
- MIPI: The MIPI interface, receiving data from a MIPI CSI-2 camera sensor.
- Parallel: The parallel interface, receiving data from a parallel sensor.
- ISP: The ISP, processing raw Bayer data from an image sensor and producing
YUV frames.
Topology
--------
The media controller pipeline graph is as follows:
.. _starfive_camss_graph:
.. kernel-figure:: starfive_camss_graph.dot
:alt: starfive_camss_graph.dot
:align: center
The driver has 2 video devices:
- capture_raw: The capture device, capturing image data directly from a sensor.
- capture_yuv: The capture device, capturing YUV frame data processed by the
ISP module
The driver has 3 subdevices:
- stf_isp: is responsible for all the isp operations, outputs YUV frames.
- cdns_csi2rx: a CSI-2 bridge supporting up to 4 CSI lanes in input, and 4
different pixel streams in output.
- imx219: an image sensor, image data is sent through MIPI CSI-2.

View file

@ -0,0 +1,12 @@
digraph board {
rankdir=TB
n00000001 [label="{{<port0> 0} | stf_isp\n/dev/v4l-subdev0 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
n00000001:port1 -> n00000008 [style=dashed]
n00000004 [label="capture_raw\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
n00000008 [label="capture_yuv\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
n0000000e [label="{{<port0> 0} | cdns_csi2rx.19800000.csi-bridge\n | {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
n0000000e:port1 -> n00000001:port0 [style=dashed]
n0000000e:port1 -> n00000004 [style=dashed]
n00000018 [label="{{} | imx219 6-0010\n/dev/v4l-subdev1 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
n00000018:port0 -> n0000000e:port0 [style=bold]
}

View file

@ -28,6 +28,7 @@ Video4Linux (V4L) driver-specific documentation
si470x
si4713
si476x
starfive_camss
vimc
visl
vivid

View file

@ -71,6 +71,7 @@ The following codecs are supported:
- VP9
- H.264
- HEVC
- AV1
visl trace events
-----------------
@ -79,6 +80,7 @@ The trace events are defined on a per-codec basis, e.g.:
.. code-block:: bash
$ ls /sys/kernel/tracing/events/ | grep visl
visl_av1_controls
visl_fwht_controls
visl_h264_controls
visl_hevc_controls

View file

@ -59,41 +59,47 @@ Files Hierarchy
The files hierarchy of DAMON sysfs interface is shown below. In the below
figure, parents-children relations are represented with indentations, each
directory is having ``/`` suffix, and files in each directory are separated by
comma (","). ::
comma (",").
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/avail_operations,operations
│ │ │ │ │ monitoring_attrs/
.. parsed-literal::
:ref:`/sys/kernel/mm/damon <sysfs_root>`/admin
:ref:`kdamonds <sysfs_kdamonds>`/nr_kdamonds
│ │ :ref:`0 <sysfs_kdamond>`/state,pid
│ │ │ :ref:`contexts <sysfs_contexts>`/nr_contexts
│ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations
│ │ │ │ │ :ref:`monitoring_attrs <sysfs_monitoring_attrs>`/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
│ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target
│ │ │ │ │ │ │ :ref:`regions <sysfs_regions>`/nr_regions
│ │ │ │ │ │ │ │ :ref:`0 <sysfs_region>`/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action,apply_interval_us
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ :ref:`schemes <sysfs_schemes>`/nr_schemes
│ │ │ │ │ │ :ref:`0 <sysfs_scheme>`/action,apply_interval_us
│ │ │ │ │ │ │ :ref:`access_pattern <sysfs_access_pattern>`/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ filters/nr_filters
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
│ │ │ │ │ │ │ │ │ 0/target_value,current_value
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`filters <sysfs_filters>`/nr_filters
│ │ │ │ │ │ │ │ 0/type,matching,memcg_id
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ │ tried_regions/total_bytes
│ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ │ :ref:`tried_regions <sysfs_schemes_tried_regions>`/total_bytes
│ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
.. _sysfs_root:
Root
----
@ -102,6 +108,8 @@ has one directory named ``admin``. The directory contains the files for
privileged user space programs' control of DAMON. User space tools or daemons
having the root permission could use this directory.
.. _sysfs_kdamonds:
kdamonds/
---------
@ -113,6 +121,8 @@ details) exists. In the beginning, this directory has only one file,
child directories named ``0`` to ``N-1``. Each directory represents each
kdamond.
.. _sysfs_kdamond:
kdamonds/<N>/
-------------
@ -120,29 +130,37 @@ In each kdamond directory, two files (``state`` and ``pid``) and one directory
(``contexts``) exist.
Reading ``state`` returns ``on`` if the kdamond is currently running, or
``off`` if it is not running. Writing ``on`` or ``off`` makes the kdamond be
in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the
user inputs in the sysfs files except ``state`` file again. Writing
``update_schemes_stats`` to ``state`` file updates the contents of stats files
for each DAMON-based operation scheme of the kdamond. For details of the
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`.
``off`` if it is not running.
Writing ``update_schemes_tried_regions`` to ``state`` file updates the
DAMON-based operation scheme action tried regions directory for each
DAMON-based operation scheme of the kdamond. Writing
``update_schemes_tried_bytes`` to ``state`` file updates only
``.../tried_regions/total_bytes`` files. Writing
``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based
operating scheme action tried regions directory for each DAMON-based operation
scheme of the kdamond. For details of the DAMON-based operation scheme action
tried regions directory, please refer to :ref:`tried_regions section
<sysfs_schemes_tried_regions>`.
Users can write below commands for the kdamond to the ``state`` file.
- ``on``: Start running.
- ``off``: Stop running.
- ``commit``: Read the user inputs in the sysfs files except ``state`` file
again.
- ``commit_schemes_quota_goals``: Read the DAMON-based operation schemes'
:ref:`quota goals <sysfs_schemes_quota_goals>`.
- ``update_schemes_stats``: Update the contents of stats files for each
DAMON-based operation scheme of the kdamond. For details of the stats,
please refer to :ref:`stats section <sysfs_schemes_stats>`.
- ``update_schemes_tried_regions``: Update the DAMON-based operation scheme
action tried regions directory for each DAMON-based operation scheme of the
kdamond. For details of the DAMON-based operation scheme action tried
regions directory, please refer to
:ref:`tried_regions section <sysfs_schemes_tried_regions>`.
- ``update_schemes_tried_bytes``: Update only ``.../tried_regions/total_bytes``
files.
- ``clear_schemes_tried_regions``: Clear the DAMON-based operating scheme
action tried regions directory for each DAMON-based operation scheme of the
kdamond.
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
``contexts`` directory contains files for controlling the monitoring contexts
that this kdamond will execute.
.. _sysfs_contexts:
kdamonds/<N>/contexts/
----------------------
@ -153,7 +171,7 @@ number (``N``) to the file creates the number of child directories named as
details). At the moment, only one context per kdamond is supported, so only
``0`` or ``1`` can be written to the file.
.. _sysfs_contexts:
.. _sysfs_context:
contexts/<N>/
-------------
@ -203,6 +221,8 @@ writing to and rading from the files.
For more details about the intervals and monitoring regions range, please refer
to the Design document (:doc:`/mm/damon/design`).
.. _sysfs_targets:
contexts/<N>/targets/
---------------------
@ -210,6 +230,8 @@ In the beginning, this directory has only one file, ``nr_targets``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each monitoring target.
.. _sysfs_target:
targets/<N>/
------------
@ -244,6 +266,8 @@ In the beginning, this directory has only one file, ``nr_regions``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each initial monitoring target region.
.. _sysfs_region:
regions/<N>/
------------
@ -254,6 +278,8 @@ region by writing to and reading from the files, respectively.
Each region should not overlap with others. ``end`` of directory ``N`` should
be equal or smaller than ``start`` of directory ``N+1``.
.. _sysfs_schemes:
contexts/<N>/schemes/
---------------------
@ -265,6 +291,8 @@ In the beginning, this directory has only one file, ``nr_schemes``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each DAMON-based operation scheme.
.. _sysfs_scheme:
schemes/<N>/
------------
@ -277,7 +305,7 @@ The ``action`` file is for setting and getting the scheme's :ref:`action
from the file and their meaning are as below.
Note that support of each action depends on the running DAMON operations set
:ref:`implementation <sysfs_contexts>`.
:ref:`implementation <sysfs_context>`.
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
@ -299,6 +327,8 @@ Note that support of each action depends on the running DAMON operations set
The ``apply_interval_us`` file is for setting and getting the scheme's
:ref:`apply_interval <damon_design_damos>` in microseconds.
.. _sysfs_access_pattern:
schemes/<N>/access_pattern/
---------------------------
@ -312,6 +342,8 @@ to and reading from the ``min`` and ``max`` files under ``sz``,
``nr_accesses``, and ``age`` directories, respectively. Note that the ``min``
and the ``max`` form a closed interval.
.. _sysfs_quotas:
schemes/<N>/quotas/
-------------------
@ -319,8 +351,7 @@ The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
DAMON-based operation scheme.
Under ``quotas`` directory, three files (``ms``, ``bytes``,
``reset_interval_ms``) and one directory (``weights``) having three files
(``sz_permil``, ``nr_accesses_permil``, and ``age_permil``) in it exist.
``reset_interval_ms``) and two directores (``weights`` and ``goals``) exist.
You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
``reset interval`` in milliseconds by writing the values to the three files,
@ -330,11 +361,37 @@ apply the action to only up to ``bytes`` bytes of memory regions within the
``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the
quota limits.
You can also set the :ref:`prioritization weights
Under ``weights`` directory, three files (``sz_permil``,
``nr_accesses_permil``, and ``age_permil``) exist.
You can set the :ref:`prioritization weights
<damon_design_damos_quotas_prioritization>` for size, access frequency, and age
in per-thousand unit by writing the values to the three files under the
``weights`` directory.
.. _sysfs_schemes_quota_goals:
schemes/<N>/quotas/goals/
-------------------------
The directory for the :ref:`automatic quota tuning goals
<damon_design_damos_quotas_auto_tuning>` of the given DAMON-based operation
scheme.
In the beginning, this directory has only one file, ``nr_goals``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each goal and current achievement.
Among the multiple feedback, the best one is used.
Each goal directory contains two files, namely ``target_value`` and
``current_value``. Users can set and get any number to those files to set the
feedback. User space main workload's latency or throughput, system metrics
like free memory ratio or memory pressure stall time (PSI) could be example
metrics for the values. Note that users should write
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
directory <sysfs_kdamond>` to pass the feedback to DAMON.
.. _sysfs_watermarks:
schemes/<N>/watermarks/
-----------------------
@ -354,6 +411,8 @@ as below.
The ``interval`` should written in microseconds unit.
.. _sysfs_filters:
schemes/<N>/filters/
--------------------
@ -394,7 +453,7 @@ pages of all memory cgroups except ``/having_care_already``.::
echo N > 1/matching
Note that ``anon`` and ``memcg`` filters are currently supported only when
``paddr`` :ref:`implementation <sysfs_contexts>` is being used.
``paddr`` :ref:`implementation <sysfs_context>` is being used.
Also, memory regions that are filtered out by ``addr`` or ``target`` filters
are not counted as the scheme has tried to those, while regions that filtered
@ -449,6 +508,8 @@ and query-like efficient data access monitoring results retrievals. For the
latter use case, in particular, users can set the ``action`` as ``stat`` and
set the ``access pattern`` as their interested pattern that they want to query.
.. _sysfs_schemes_tried_region:
tried_regions/<N>/
------------------

View file

@ -80,6 +80,9 @@ pages_to_scan
how many pages to scan before ksmd goes to sleep
e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``.
The pages_to_scan value cannot be changed if ``advisor_mode`` has
been set to scan-time.
Default: 100 (chosen for demonstration purposes)
sleep_millisecs
@ -164,6 +167,29 @@ smart_scan
optimization is enabled. The ``pages_skipped`` metric shows how
effective the setting is.
advisor_mode
The ``advisor_mode`` selects the current advisor. Two modes are
supported: none and scan-time. The default is none. By setting
``advisor_mode`` to scan-time, the scan time advisor is enabled.
The section about ``advisor`` explains in detail how the scan time
advisor works.
adivsor_max_cpu
specifies the upper limit of the cpu percent usage of the ksmd
background thread. The default is 70.
advisor_target_scan_time
specifies the target scan time in seconds to scan all the candidate
pages. The default value is 200 seconds.
advisor_min_pages_to_scan
specifies the lower limit of the ``pages_to_scan`` parameter of the
scan time advisor. The default is 500.
adivsor_max_pages_to_scan
specifies the upper limit of the ``pages_to_scan`` parameter of the
scan time advisor. The default is 30000.
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
general_profit
@ -263,6 +289,35 @@ ksm_swpin_copy
note that KSM page might be copied when swapping in because do_swap_page()
cannot do all the locking needed to reconstitute a cross-anon_vma KSM page.
Advisor
=======
The number of candidate pages for KSM is dynamic. It can be often observed
that during the startup of an application more candidate pages need to be
processed. Without an advisor the ``pages_to_scan`` parameter needs to be
sized for the maximum number of candidate pages. The scan time advisor can
changes the ``pages_to_scan`` parameter based on demand.
The advisor can be enabled, so KSM can automatically adapt to changes in the
number of candidate pages to scan. Two advisors are implemented: none and
scan-time. With none, no advisor is enabled. The default is none.
The scan time advisor changes the ``pages_to_scan`` parameter based on the
observed scan times. The possible values for the ``pages_to_scan`` parameter is
limited by the ``advisor_max_cpu`` parameter. In addition there is also the
``advisor_target_scan_time`` parameter. This parameter sets the target time to
scan all the KSM candidate pages. The parameter ``advisor_target_scan_time``
decides how aggressive the scan time advisor scans candidate pages. Lower
values make the scan time advisor to scan more aggresively. This is the most
important parameter for the configuration of the scan time advisor.
The initial value and the maximum value can be changed with
``advisor_min_pages_to_scan`` and ``advisor_max_pages_to_scan``. The default
values are sufficient for most workloads and use cases.
The ``pages_to_scan`` parameter is re-calculated after a scan has been completed.
--
Izik Eidus,
Hugh Dickins, 17 Nov 2009

View file

@ -253,6 +253,7 @@ Following flags about pages are currently supported:
- ``PAGE_IS_SWAPPED`` - Page is in swapped
- ``PAGE_IS_PFNZERO`` - Page has zero PFN
- ``PAGE_IS_HUGE`` - Page is THP or Hugetlb backed
- ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty
The ``struct pm_scan_arg`` is used as the argument of the IOCTL.

View file

@ -45,10 +45,25 @@ components:
the two is using hugepages just because of the fact the TLB miss is
going to run faster.
Modern kernels support "multi-size THP" (mTHP), which introduces the
ability to allocate memory in blocks that are bigger than a base page
but smaller than traditional PMD-size (as described above), in
increments of a power-of-2 number of pages. mTHP can back anonymous
memory (for example 16K, 32K, 64K, etc). These THPs continue to be
PTE-mapped, but in many cases can still provide similar benefits to
those outlined above: Page faults are significantly reduced (by a
factor of e.g. 4, 8, 16, etc), but latency spikes are much less
prominent because the size of each page isn't as huge as the PMD-sized
variant and there is less memory to clear in each page fault. Some
architectures also employ TLB compression mechanisms to squeeze more
entries in when a set of PTEs are virtually and physically contiguous
and approporiately aligned. In this case, TLB misses will occur less
often.
THP can be enabled system wide or restricted to certain tasks or even
memory ranges inside task's address space. Unless THP is completely
disabled, there is ``khugepaged`` daemon that scans memory and
collapses sequences of basic pages into huge pages.
collapses sequences of basic pages into PMD-sized huge pages.
The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
interface and using madvise(2) and prctl(2) system calls.
@ -95,12 +110,40 @@ Global THP controls
Transparent Hugepage Support for anonymous memory can be entirely disabled
(mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
regions (to avoid the risk of consuming more memory resources) or enabled
system wide. This can be achieved with one of::
system wide. This can be achieved per-supported-THP-size with one of::
echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
where <size> is the hugepage size being addressed, the available sizes
for which vary by system.
For example::
echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
Alternatively it is possible to specify that a given hugepage size
will inherit the top-level "enabled" value::
echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
For example::
echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
The top-level setting (for use with "inherit") can be set by issuing
one of the following commands::
echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled
By default, PMD-sized hugepages have enabled="inherit" and all other
hugepage sizes have enabled="never". If enabling multiple hugepage
sizes, the kernel will select the most appropriate enabled size for a
given allocation.
It's also possible to limit defrag efforts in the VM to generate
anonymous hugepages in case they're not immediately free to madvise
regions or to never try to defrag memory and simply fallback to regular
@ -146,25 +189,34 @@ madvise
never
should be self-explanatory.
By default kernel tries to use huge zero page on read page fault to
anonymous mapping. It's possible to disable huge zero page by writing 0
or enable it back by writing 1::
By default kernel tries to use huge, PMD-mappable zero page on read
page fault to anonymous mapping. It's possible to disable huge zero
page by writing 0 or enable it back by writing 1::
echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
Some userspace (such as a test program, or an optimized memory allocation
library) may want to know the size (in bytes) of a transparent hugepage::
Some userspace (such as a test program, or an optimized memory
allocation library) may want to know the size (in bytes) of a
PMD-mappable transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
khugepaged will be automatically started when
transparent_hugepage/enabled is set to "always" or "madvise, and it'll
be automatically shutdown if it's set to "never".
khugepaged will be automatically started when one or more hugepage
sizes are enabled (either by directly setting "always" or "madvise",
or by setting "inherit" while the top-level enabled is set to "always"
or "madvise"), and it'll be automatically shutdown when the last
hugepage size is disabled (either by directly setting "never", or by
setting "inherit" while the top-level enabled is set to "never").
Khugepaged controls
-------------------
.. note::
khugepaged currently only searches for opportunities to collapse to
PMD-sized THP and no attempt is made to collapse to other THP
sizes.
khugepaged runs usually at low frequency so while one may not want to
invoke defrag algorithms synchronously during the page faults, it
should be worth invoking defrag at least in khugepaged. However it's
@ -282,19 +334,26 @@ force
Need of application restart
===========================
The transparent_hugepage/enabled values and tmpfs mount option only affect
future behavior. So to make them effective you need to restart any
application that could have been using hugepages. This also applies to the
regions registered in khugepaged.
The transparent_hugepage/enabled and
transparent_hugepage/hugepages-<size>kB/enabled values and tmpfs mount
option only affect future behavior. So to make them effective you need
to restart any application that could have been using hugepages. This
also applies to the regions registered in khugepaged.
Monitoring usage
================
The number of anonymous transparent huge pages currently used by the
.. note::
Currently the below counters only record events relating to
PMD-sized THP. Events relating to other THP sizes are not included.
The number of PMD-sized anonymous transparent huge pages currently used by the
system is available by reading the AnonHugePages field in ``/proc/meminfo``.
To identify what applications are using anonymous transparent huge pages,
it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages fields
for each mapping.
To identify what applications are using PMD-sized anonymous transparent huge
pages, it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages
fields for each mapping. (Note that AnonHugePages only applies to traditional
PMD-sized THP for historical reasons and should have been called
AnonHugePmdMapped).
The number of file transparent huge pages mapped to userspace is available
by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``.
@ -413,7 +472,7 @@ for huge pages.
Optimizing the applications
===========================
To be guaranteed that the kernel will map a 2M page immediately in any
To be guaranteed that the kernel will map a THP immediately in any
memory region, the mmap region has to be hugepage naturally
aligned. posix_memalign() can provide that guarantee.

View file

@ -113,6 +113,9 @@ events, except page fault notifications, may be generated:
areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating
support for shmem virtual memory areas.
- ``UFFD_FEATURE_MOVE`` indicates that the kernel supports moving an
existing page contents from userspace.
The userland application should set the feature flags it intends to use
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
enabled if supported.

View file

@ -153,6 +153,26 @@ attribute, e. g.::
Setting this parameter to 100 will disable the hysteresis.
Some users cannot tolerate the swapping that comes with zswap store failures
and zswap writebacks. Swapping can be disabled entirely (without disabling
zswap itself) on a cgroup-basis as follows:
echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback
Note that if the store failures are recurring (for e.g if the pages are
incompressible), users can observe reclaim inefficiency after disabling
writeback (because the same pages might be rejected again and again).
When there is a sizable amount of cold memory residing in the zswap pool, it
can be advantageous to proactively write these cold pages to swap and reclaim
the memory for other use cases. By default, the zswap shrinker is disabled.
User can enable it as follows:
echo Y > /sys/module/zswap/parameters/shrinker_enabled
This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
selected.
A debugfs interface is provided for various statistic about pool size, number
of pages stored, same-value filled pages and various counters for the reasons
pages are rejected.

View file

@ -0,0 +1,94 @@
======================================================================
Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
======================================================================
DesignWare Cores (DWC) PCIe PMU
===============================
The PMU is a PCIe configuration space register block provided by each PCIe Root
Port in a Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error
injection, and Statistics).
As the name indicates, the RAS DES capability supports system level
debugging, AER error injection, and collection of statistics. To facilitate
collection of statistics, Synopsys DesignWare Cores PCIe controller
provides the following two features:
- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
time spent in each low-power LTSSM state) and
- one 32-bit counter for Event Counting (error and non-error events for
a specified lane)
Note: There is no interrupt for counter overflow.
Time Based Analysis
-------------------
Using this feature you can obtain information regarding RX/TX data
throughput and time spent in each low-power LTSSM state by the controller.
The PMU measures data in two categories:
- Group#0: Percentage of time the controller stays in LTSSM states.
- Group#1: Amount of data processed (Units of 16 bytes).
Lane Event counters
-------------------
Using this feature you can obtain Error and Non-Error information in
specific lane by the controller. The PMU event is selected by all of:
- Group i
- Event j within the Group i
- Lane k
Some of the events only exist for specific configurations.
DesignWare Cores (DWC) PCIe PMU Driver
=======================================
This driver adds PMU devices for each PCIe Root Port named based on the BDF of
the Root Port. For example,
30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
the PMU device name for this Root Port is dwc_rootport_3018.
The DWC PCIe PMU driver registers a perf PMU driver, which provides
description of available events and configuration options in sysfs, see
/sys/bus/event_source/devices/dwc_rootport_{bdf}.
The "format" directory describes format of the config fields of the
perf_event_attr structure. The "events" directory provides configuration
templates for all documented events. For example,
"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
The "perf list" command shall list the available events from sysfs, e.g.::
$# perf list | grep dwc_rootport
<...>
dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event]
<...>
dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event]
Time Based Analysis Event Usage
-------------------------------
Example usage of counting PCIe RX TLP data payload (Units of bytes)::
$# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
The average RX/TX bandwidth can be calculated using the following formula:
PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
Lane Event Usage
-------------------------------
Each lane has the same event set and to avoid generating a list of hundreds
of events, the user need to specify the lane ID explicitly, e.g.::
$# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
The driver does not support sampling, therefore "perf record" will not
work. Per-task (without "-a") perf sessions are not supported.

View file

@ -13,8 +13,8 @@ is one register for each counter. Counter 0 is special in that it always counts
interrupt is raised. If any other counter overflows, it continues counting, and
no interrupt is raised.
The "format" directory describes format of the config (event ID) and config1
(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/
The "format" directory describes format of the config (event ID) and config1/2
(AXI filter setting) fields of the perf_event_attr structure, see /sys/bus/event_source/
devices/imx8_ddr0/format/. The "events" directory describes the events types
hardware supported that can be used with perf tool, see /sys/bus/event_source/
devices/imx8_ddr0/events/. The "caps" directory describes filter features implemented
@ -28,12 +28,11 @@ in DDR PMU, see /sys/bus/events_source/devices/imx8_ddr0/caps/.
AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write)
to count reading or writing matches filter setting. Filter setting is various
from different DRAM controller implementations, which is distinguished by quirks
in the driver. You also can dump info from userspace, filter in "caps" directory
indicates whether PMU supports AXI ID filter or not; enhanced_filter indicates
whether PMU supports enhanced AXI ID filter or not. Value 0 for un-supported, and
value 1 for supported.
in the driver. You also can dump info from userspace, "caps" directory show the
type of AXI filter (filter, enhanced_filter and super_filter). Value 0 for
un-supported, and value 1 for supported.
* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0).
* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0, super_filter: 0).
Filter is defined with two configuration parts:
--AXI_ID defines AxID matching value.
--AXI_MASKING defines which bits of AxID are meaningful for the matching.
@ -65,7 +64,37 @@ value 1 for supported.
perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12
* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1).
* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1, super_filter: 0).
This is an extension to the DDR_CAP_AXI_ID_FILTER quirk which permits
counting the number of bytes (as opposed to the number of bursts) from DDR
read and write transactions concurrently with another set of data counters.
* With DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk(filter: 0, enhanced_filter: 0, super_filter: 1).
There is a limitation in previous AXI filter, it cannot filter different IDs
at the same time as the filter is shared between counters. This quirk is the
extension of AXI ID filter. One improvement is that counter 1-3 has their own
filter, means that it supports concurrently filter various IDs. Another
improvement is that counter 1-3 supports AXI PORT and CHANNEL selection. Support
selecting address channel or data channel.
Filter is defined with 2 configuration registers per counter 1-3.
--Counter N MASK COMP register - including AXI_ID and AXI_MASKING.
--Counter N MUX CNTL register - including AXI CHANNEL and AXI PORT.
- 0: address channel
- 1: data channel
PMU in DDR subsystem, only one single port0 exists, so axi_port is reserved
which should be 0.
.. code-block:: bash
perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd
perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd
.. note::
axi_channel is inverted in userspace, and it will be reverted in driver
automatically. So that users do not need specify axi_channel if want to
monitor data channel from DDR transactions, since data channel is more
meaningful.

View file

@ -19,6 +19,7 @@ Performance monitor support
arm_dsu_pmu
thunderx2-pmu
alibaba_pmu
dwc_pcie_pmu
nvidia-pmu
meson-ddr-pmu
cxl

View file

@ -361,7 +361,7 @@ Global Attributes
``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
control its functionality at the system level. They are located in the
``/sys/devices/system/cpu/amd-pstate/`` directory and affect all CPUs.
``/sys/devices/system/cpu/amd_pstate/`` directory and affect all CPUs.
``status``
Operation mode of the driver: "active", "passive" or "disable".

View file

@ -0,0 +1,24 @@
.. SPDX-License-Identifier: GPL-2.0
Set udev rules for PMF Smart PC Builder
---------------------------------------
AMD PMF(Platform Management Framework) Smart PC Solution builder has to set the system states
like S0i3, Screen lock, hibernate etc, based on the output actions provided by the PMF
TA (Trusted Application).
In order for this to work the PMF driver generates a uevent for userspace to react to. Below are
sample udev rules that can facilitate this experience when a machine has PMF Smart PC solution builder
enabled.
Please add the following line(s) to
``/etc/udev/rules.d/99-local.rules``::
DRIVERS=="amd-pmf", ACTION=="change", ENV{EVENT_ID}=="0", RUN+="/usr/bin/systemctl suspend"
DRIVERS=="amd-pmf", ACTION=="change", ENV{EVENT_ID}=="1", RUN+="/usr/bin/systemctl hibernate"
DRIVERS=="amd-pmf", ACTION=="change", ENV{EVENT_ID}=="2", RUN+="/bin/loginctl lock-sessions"
EVENT_ID values:
0= Put the system to S0i3/S2Idle
1= Put the system to hibernate
2= Lock the screen

View file

@ -345,7 +345,10 @@ optmem_max
----------
Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
of struct cmsghdr structures with appended data.
of struct cmsghdr structures with appended data. TCP tx zerocopy also uses
optmem_max as a limit for its internal structures.
Default : 128 KB
fb_tunnels_only_for_init_net
----------------------------

View file

@ -75,10 +75,19 @@ On other
submit a patch to be included in this section.
On all
Write a character to /proc/sysrq-trigger. e.g.::
Write a single character to /proc/sysrq-trigger.
Only the first character is processed, the rest of the string is
ignored. However, it is not recommended to write any extra characters
as the behavior is undefined and might change in the future versions.
E.g.::
echo t > /proc/sysrq-trigger
Alternatively, write multiple characters prepended by underscore.
This way, all characters will be processed. E.g.::
echo _reisub > /proc/sysrq-trigger
The :kbd:`<command key>` is case sensitive.
What are the 'command' keys?

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features arc
.. kernel-feat:: features arc

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features arm
.. kernel-feat:: features arm

View file

@ -130,7 +130,7 @@ When an Arm system boots, it can either have DT information, ACPI tables,
or in some very unusual cases, both. If no command line parameters are used,
the kernel will try to use DT for device enumeration; if there is no DT
present, the kernel will try to use ACPI tables, but only if they are present.
In neither is available, the kernel will not boot. If acpi=force is used
If neither is available, the kernel will not boot. If acpi=force is used
on the command line, the kernel will attempt to use ACPI tables first, but
fall back to DT if there are no ACPI tables present. The basic idea is that
the kernel will not fail to boot unless it absolutely has no other choice.

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features arm64
.. kernel-feat:: features arm64

View file

@ -164,3 +164,75 @@ and should be used to mask the upper bits as needed.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
.. _tools/lib/perf/tests/test-evsel.c:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
Event Counting Threshold
==========================================
Overview
--------
FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
events whose count meets a specified threshold condition. For example if
threshold_compare is set to 2 ('Greater than or equal'), and the
threshold is set to 2, then the PMU counter will now only increment by
when an event would have previously incremented the PMU counter by 2 or
more on a single processor cycle.
To increment by 1 after passing the threshold condition instead of the
number of events on that cycle, add the 'threshold_count' option to the
commandline.
How-to
------
These are the parameters for controlling the feature:
.. list-table::
:header-rows: 1
* - Parameter
- Description
* - threshold
- Value to threshold the event by. A value of 0 means that
thresholding is disabled and the other parameters have no effect.
* - threshold_compare
- | Comparison function to use, with the following values supported:
|
| 0: Not-equal
| 1: Equals
| 2: Greater-than-or-equal
| 3: Less-than
* - threshold_count
- If this is set, count by 1 after passing the threshold condition
instead of the value of the event on this cycle.
The threshold, threshold_compare and threshold_count values can be
provided per event, for example:
.. code-block:: sh
perf stat -e stall_slot/threshold=2,threshold_compare=2/ \
-e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/
In this example the stall_slot event will count by 2 or more on every
cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on
every cycle where the number of dtlb walks were less than 10.
The maximum supported threshold value can be read from the caps of each
PMU, for example:
.. code-block:: sh
cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
0x000000ff
If a value higher than this is given, then opening the event will result
in an error. The highest possible maximum is 4095, as the config field
for threshold is limited to 12 bits, and the Perf tool will refuse to
parse higher values.
If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read
0, and attempting to set a threshold value will also result in an error.
threshold_max will also read as 0 on aarch32 guests, even if the host
is running on hardware with the feature.

View file

@ -71,6 +71,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #3117295 | ARM64_ERRATUM_3117295 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A520 | #2966298 | ARM64_ERRATUM_2966298 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
@ -117,6 +119,10 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A76 | #1490853 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A77 | #1491015 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
@ -127,6 +133,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A715 | #2645198 | ARM64_ERRATUM_2645198 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-X1 | #1502854 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 |
@ -135,6 +143,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1490853 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
@ -143,6 +153,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-V1 | #1619801 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-600 | #1076982,1209401| N/A |
@ -225,11 +237,9 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| Rockchip | RK3588 | #3588001 | ROCKCHIP_ERRATUM_3588001 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| ASR | ASR8601 | #8601001 | N/A |
+----------------+-----------------+-----------------+-----------------------------+

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features loongarch
.. kernel-feat:: features loongarch

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features m68k
.. kernel-feat:: features m68k

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features mips
.. kernel-feat:: features mips

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features nios2
.. kernel-feat:: features nios2

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features openrisc
.. kernel-feat:: features openrisc

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features parisc
.. kernel-feat:: features parisc

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features powerpc
.. kernel-feat:: features powerpc

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features riscv
.. kernel-feat:: features riscv

View file

@ -12,7 +12,7 @@ is defined in <asm/hwprobe.h>::
};
long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
size_t cpu_count, cpu_set_t *cpus,
size_t cpusetsize, cpu_set_t *cpus,
unsigned int flags);
The arguments are split into three groups: an array of key-value pairs, a CPU
@ -20,12 +20,26 @@ set, and some flags. The key-value pairs are supplied with a count. Userspace
must prepopulate the key field for each element, and the kernel will fill in the
value if the key is recognized. If a key is unknown to the kernel, its key field
will be cleared to -1, and its value set to 0. The CPU set is defined by
CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
be only be valid if all CPUs in the given set have the same value. Otherwise -1
will be returned. For boolean-like keys, the value returned will be a logical
AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
this value must be zero for future compatibility.
CPU_SET(3) with size ``cpusetsize`` bytes. For value-like keys (eg. vendor,
arch, impl), the returned value will only be valid if all CPUs in the given set
have the same value. Otherwise -1 will be returned. For boolean-like keys, the
value returned will be a logical AND of the values for the specified CPUs.
Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
all online CPUs. The currently supported flags are:
* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
of sys_riscv_hwprobe(). Instead of populating the values of keys for a given
set of CPUs, the values of each key are given and the set of CPUs is reduced
by sys_riscv_hwprobe() to only those which match each of the key-value pairs.
How matching is done depends on the key type. For value-like keys, matching
means to be the exact same as the value. For boolean-like keys, matching
means the result of a logical AND of the pair's value with the CPU's value is
exactly the same as the pair's value. Additionally, when ``cpus`` is an empty
set, then it is initialized to all online CPUs which fit within it, i.e. the
CPU set returned is the reduction of all the online CPUs which can be
represented with a CPU set of size ``cpusetsize``.
All other flags are reserved for future compatibility and must be zero.
On success 0 is returned, on failure a negative error code is returned.
@ -80,6 +94,100 @@ The following keys are defined:
* :c:macro:`RISCV_HWPROBE_EXT_ZICBOZ`: The Zicboz extension is supported, as
ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
* :c:macro:`RISCV_HWPROBE_EXT_ZBC` The Zbc extension is supported, as defined
in version 1.0 of the Bit-Manipulation ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZBKB` The Zbkb extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZBKC` The Zbkc extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZBKX` The Zbkx extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZKND` The Zknd extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZKNE` The Zkne extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZKNH` The Zknh extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZKSED` The Zksed extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZKSH` The Zksh extension is supported, as
defined in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZKT` The Zkt extension is supported, as defined
in version 1.0 of the Scalar Crypto ISA extensions.
* :c:macro:`RISCV_HWPROBE_EXT_ZVBB`: The Zvbb extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVBC`: The Zvbc extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKB`: The Zvkb extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKG`: The Zvkg extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKNED`: The Zvkned extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKNHA`: The Zvknha extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKNHB`: The Zvknhb extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKSED`: The Zvksed extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKSH`: The Zvksh extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZVKT`: The Zvkt extension is supported as
defined in version 1.0 of the RISC-V Cryptography Extensions Volume II.
* :c:macro:`RISCV_HWPROBE_EXT_ZFH`: The Zfh extension version 1.0 is supported
as defined in the RISC-V ISA manual.
* :c:macro:`RISCV_HWPROBE_EXT_ZFHMIN`: The Zfhmin extension version 1.0 is
supported as defined in the RISC-V ISA manual.
* :c:macro:`RISCV_HWPROBE_EXT_ZIHINTNTL`: The Zihintntl extension version 1.0
is supported as defined in the RISC-V ISA manual.
* :c:macro:`RISCV_HWPROBE_EXT_ZVFH`: The Zvfh extension is supported as
defined in the RISC-V Vector manual starting from commit e2ccd0548d6c
("Remove draft warnings from Zvfh[min]").
* :c:macro:`RISCV_HWPROBE_EXT_ZVFHMIN`: The Zvfhmin extension is supported as
defined in the RISC-V Vector manual starting from commit e2ccd0548d6c
("Remove draft warnings from Zvfh[min]").
* :c:macro:`RISCV_HWPROBE_EXT_ZFA`: The Zfa extension is supported as
defined in the RISC-V ISA manual starting from commit 056b6ff467c7
("Zfa is ratified").
* :c:macro:`RISCV_HWPROBE_EXT_ZTSO`: The Ztso extension is supported as
defined in the RISC-V ISA manual starting from commit 5618fb5a216b
("Ztso is now ratified.")
* :c:macro:`RISCV_HWPROBE_EXT_ZACAS`: The Zacas extension is supported as
defined in the Atomic Compare-and-Swap (CAS) instructions manual starting
from commit 5059e0ca641c ("update to ratified").
* :c:macro:`RISCV_HWPROBE_EXT_ZICOND`: The Zicond extension is supported as
defined in the RISC-V Integer Conditional (Zicond) operations extension
manual starting from commit 95cf1f9 ("Add changes requested by Ved
during signoff")
* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
information about the selected set of processors.

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features s390
.. kernel-feat:: features s390

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features sh
.. kernel-feat:: features sh

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features sparc
.. kernel-feat:: features sparc

View file

@ -71,7 +71,7 @@ Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in
Protocol 2.14 BURNT BY INCORRECT COMMIT
ae7e1238e68f2a472a125673ab506d49158c1889
(x86/boot: Add ACPI RSDP address to setup_header)
("x86/boot: Add ACPI RSDP address to setup_header")
DO NOT USE!!! ASSUME SAME AS 2.13.
Protocol 2.15 (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max.

View file

@ -7,27 +7,74 @@ x86 Feature Flags
Introduction
============
On x86, flags appearing in /proc/cpuinfo have an X86_FEATURE definition
in arch/x86/include/asm/cpufeatures.h. If the kernel cares about a feature
or KVM want to expose the feature to a KVM guest, it can and should have
an X86_FEATURE_* defined. These flags represent hardware features as
well as software features.
The list of feature flags in /proc/cpuinfo is not complete and
represents an ill-fated attempt from long time ago to put feature flags
in an easy to find place for userspace.
If users want to know if a feature is available on a given system, they
try to find the flag in /proc/cpuinfo. If a given flag is present, it
means that the kernel supports it and is currently making it available.
If such flag represents a hardware feature, it also means that the
hardware supports it.
However, the amount of feature flags is growing by the CPU generation,
leading to unparseable and unwieldy /proc/cpuinfo.
What is more, those feature flags do not even need to be in that file
because userspace doesn't care about them - glibc et al already use
CPUID to find out what the target machine supports and what not.
And even if it doesn't show a particular feature flag - although the CPU
still does have support for the respective hardware functionality and
said CPU supports CPUID faulting - userspace can simply probe for the
feature and figure out if it is supported or not, regardless of whether
it is being advertised somewhere.
Furthermore, those flag strings become an ABI the moment they appear
there and maintaining them forever when nothing even uses them is a lot
of wasted effort.
So, the current use of /proc/cpuinfo is to show features which the
kernel has *enabled* and *supports*. As in: the CPUID feature flag is
there, there's an additional setup which the kernel has done while
booting and the functionality is ready to use. A perfect example for
that is "user_shstk" where additional code enablement is present in the
kernel to support shadow stack for user programs.
So, if users want to know if a feature is available on a given system,
they try to find the flag in /proc/cpuinfo. If a given flag is present,
it means that
* the kernel knows about the feature enough to have an X86_FEATURE bit
* the kernel supports it and is currently making it available either to
userspace or some other part of the kernel
* if the flag represents a hardware feature the hardware supports it.
The absence of a flag in /proc/cpuinfo by itself means almost nothing to
an end user.
On the one hand, a feature like "vaes" might be fully available to user
applications on a kernel that has not defined X86_FEATURE_VAES and thus
there is no "vaes" in /proc/cpuinfo.
On the other hand, a new kernel running on non-VAES hardware would also
have no "vaes" in /proc/cpuinfo. There's no way for an application or
user to tell the difference.
The end result is that the flags field in /proc/cpuinfo is marginally
useful for kernel debugging, but not really for anything else.
Applications should instead use things like the glibc facilities for
querying CPU support. Users should rely on tools like
tools/arch/x86/kcpuid and cpuid(1).
Regarding implementation, flags appearing in /proc/cpuinfo have an
X86_FEATURE definition in arch/x86/include/asm/cpufeatures.h. These flags
represent hardware features as well as software features.
If the kernel cares about a feature or KVM want to expose the feature to
a KVM guest, it should only then expose it to the guest when the guest
needs to parse /proc/cpuinfo. Which, as mentioned above, is highly
unlikely. KVM can synthesize the CPUID bit and the KVM guest can simply
query CPUID and figure out what the hypervisor supports and what not. As
already stated, /proc/cpuinfo is not a dumping ground for useless
feature flags.
If the expected flag does not appear in /proc/cpuinfo, things are murkier.
Users need to find out the reason why the flag is missing and find the way
how to enable it, which is not always easy. There are several factors that
can explain missing flags: the expected feature failed to enable, the feature
is missing in hardware, platform firmware did not enable it, the feature is
disabled at build or run time, an old kernel is in use, or the kernel does
not support the feature and thus has not enabled it. In general, /proc/cpuinfo
shows features which the kernel supports. For a full list of CPUID flags
which the CPU supports, use tools/arch/x86/kcpuid.
How are feature flags created?
==============================

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features x86
.. kernel-feat:: features x86

View file

@ -81,11 +81,9 @@ this protection comes at a cost:
and exit (it can be skipped when the kernel is interrupted,
though.) Moves to CR3 are on the order of a hundred
cycles, and are required at every entry and exit.
b. A "trampoline" must be used for SYSCALL entry. This
trampoline depends on a smaller set of resources than the
non-PTI SYSCALL entry code, so requires mapping fewer
things into the userspace page tables. The downside is
that stacks must be switched at entry time.
b. Percpu TSS is mapped into the user page tables to allow SYSCALL64 path
to work under PTI. This doesn't have a direct runtime cost but it can
be argued it opens certain timing attack scenarios.
c. Global pages are disabled for all kernel structures not
mapped into both kernel and userspace page tables. This
feature of the MMU allows different processes to share TLB
@ -167,7 +165,7 @@ that are worth noting here.
* Failures of the selftests/x86 code. Usually a bug in one of the
more obscure corners of entry_64.S
* Crashes in early boot, especially around CPU bringup. Bugs
in the trampoline code or mappings cause these.
in the mappings cause these.
* Crashes at the first interrupt. Caused by bugs in entry_64.S,
like screwing up a page table switch. Also caused by
incorrectly mapping the IRQ handler entry code.

View file

@ -10,6 +10,191 @@ encrypting the guest memory. In TDX, a special module running in a special
mode sits between the host and the guest and manages the guest/host
separation.
TDX Host Kernel Support
=======================
TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM) and
a new isolated range pointed by the SEAM Ranger Register (SEAMRR). A
CPU-attested software module called 'the TDX module' runs inside the new
isolated range to provide the functionalities to manage and run protected
VMs.
TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
provide crypto-protection to the VMs. TDX reserves part of MKTME KeyIDs
as TDX private KeyIDs, which are only accessible within the SEAM mode.
BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs.
Before the TDX module can be used to create and run protected VMs, it
must be loaded into the isolated range and properly initialized. The TDX
architecture doesn't require the BIOS to load the TDX module, but the
kernel assumes it is loaded by the BIOS.
TDX boot-time detection
-----------------------
The kernel detects TDX by detecting TDX private KeyIDs during kernel
boot. Below dmesg shows when TDX is enabled by BIOS::
[..] virt/tdx: BIOS enabled: private KeyID range: [16, 64)
TDX module initialization
---------------------------------------
The kernel talks to the TDX module via the new SEAMCALL instruction. The
TDX module implements SEAMCALL leaf functions to allow the kernel to
initialize it.
If the TDX module isn't loaded, the SEAMCALL instruction fails with a
special error. In this case the kernel fails the module initialization
and reports the module isn't loaded::
[..] virt/tdx: module not loaded
Initializing the TDX module consumes roughly ~1/256th system RAM size to
use it as 'metadata' for the TDX memory. It also takes additional CPU
time to initialize those metadata along with the TDX module itself. Both
are not trivial. The kernel initializes the TDX module at runtime on
demand.
Besides initializing the TDX module, a per-cpu initialization SEAMCALL
must be done on one cpu before any other SEAMCALLs can be made on that
cpu.
The kernel provides two functions, tdx_enable() and tdx_cpu_enable() to
allow the user of TDX to enable the TDX module and enable TDX on local
cpu respectively.
Making SEAMCALL requires VMXON has been done on that CPU. Currently only
KVM implements VMXON. For now both tdx_enable() and tdx_cpu_enable()
don't do VMXON internally (not trivial), but depends on the caller to
guarantee that.
To enable TDX, the caller of TDX should: 1) temporarily disable CPU
hotplug; 2) do VMXON and tdx_enable_cpu() on all online cpus; 3) call
tdx_enable(). For example::
cpus_read_lock();
on_each_cpu(vmxon_and_tdx_cpu_enable());
ret = tdx_enable();
cpus_read_unlock();
if (ret)
goto no_tdx;
// TDX is ready to use
And the caller of TDX must guarantee the tdx_cpu_enable() has been
successfully done on any cpu before it wants to run any other SEAMCALL.
A typical usage is do both VMXON and tdx_cpu_enable() in CPU hotplug
online callback, and refuse to online if tdx_cpu_enable() fails.
User can consult dmesg to see whether the TDX module has been initialized.
If the TDX module is initialized successfully, dmesg shows something
like below::
[..] virt/tdx: 262668 KBs allocated for PAMT
[..] virt/tdx: module initialized
If the TDX module failed to initialize, dmesg also shows it failed to
initialize::
[..] virt/tdx: module initialization failed ...
TDX Interaction to Other Kernel Components
------------------------------------------
TDX Memory Policy
~~~~~~~~~~~~~~~~~
TDX reports a list of "Convertible Memory Region" (CMR) to tell the
kernel which memory is TDX compatible. The kernel needs to build a list
of memory regions (out of CMRs) as "TDX-usable" memory and pass those
regions to the TDX module. Once this is done, those "TDX-usable" memory
regions are fixed during module's lifetime.
To keep things simple, currently the kernel simply guarantees all pages
in the page allocator are TDX memory. Specifically, the kernel uses all
system memory in the core-mm "at the time of TDX module initialization"
as TDX memory, and in the meantime, refuses to online any non-TDX-memory
in the memory hotplug.
Physical Memory Hotplug
~~~~~~~~~~~~~~~~~~~~~~~
Note TDX assumes convertible memory is always physically present during
machine's runtime. A non-buggy BIOS should never support hot-removal of
any convertible memory. This implementation doesn't handle ACPI memory
removal but depends on the BIOS to behave correctly.
CPU Hotplug
~~~~~~~~~~~
TDX module requires the per-cpu initialization SEAMCALL must be done on
one cpu before any other SEAMCALLs can be made on that cpu. The kernel
provides tdx_cpu_enable() to let the user of TDX to do it when the user
wants to use a new cpu for TDX task.
TDX doesn't support physical (ACPI) CPU hotplug. During machine boot,
TDX verifies all boot-time present logical CPUs are TDX compatible before
enabling TDX. A non-buggy BIOS should never support hot-add/removal of
physical CPU. Currently the kernel doesn't handle physical CPU hotplug,
but depends on the BIOS to behave correctly.
Note TDX works with CPU logical online/offline, thus the kernel still
allows to offline logical CPU and online it again.
Kexec()
~~~~~~~
TDX host support currently lacks the ability to handle kexec. For
simplicity only one of them can be enabled in the Kconfig. This will be
fixed in the future.
Erratum
~~~~~~~
The first few generations of TDX hardware have an erratum. A partial
write to a TDX private memory cacheline will silently "poison" the
line. Subsequent reads will consume the poison and generate a machine
check.
A partial write is a memory write where a write transaction of less than
cacheline lands at the memory controller. The CPU does these via
non-temporal write instructions (like MOVNTI), or through UC/WC memory
mappings. Devices can also do partial writes via DMA.
Theoretically, a kernel bug could do partial write to TDX private memory
and trigger unexpected machine check. What's more, the machine check
code will present these as "Hardware error" when they were, in fact, a
software-triggered issue. But in the end, this issue is hard to trigger.
If the platform has such erratum, the kernel prints additional message in
machine check handler to tell user the machine check may be caused by
kernel bug on TDX private memory.
Interaction vs S3 and deeper states
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TDX cannot survive from S3 and deeper states. The hardware resets and
disables TDX completely when platform goes to S3 and deeper. Both TDX
guests and the TDX module get destroyed permanently.
The kernel uses S3 for suspend-to-ram, and use S4 and deeper states for
hibernation. Currently, for simplicity, the kernel chooses to make TDX
mutually exclusive with S3 and hibernation.
The kernel disables TDX during early boot when hibernation support is
available::
[..] virt/tdx: initialization failed: Hibernation support is enabled
Add 'nohibernate' kernel command line to disable hibernation in order to
use TDX.
ACPI S3 is disabled during kernel early boot if TDX is enabled. The user
needs to turn off TDX in the BIOS in order to use S3.
TDX Guest Support
=================
Since the host cannot directly access guest registers or memory, much
normal functionality of a hypervisor must be moved into the guest. This is
implemented using a Virtualization Exception (#VE) that is handled by the
@ -20,7 +205,7 @@ TDX includes new hypercall-like mechanisms for communicating from the
guest to the hypervisor or the TDX module.
New TDX Exceptions
==================
------------------
TDX guests behave differently from bare-metal and traditional VMX guests.
In TDX guests, otherwise normal instructions or memory accesses can cause
@ -30,7 +215,7 @@ Instructions marked with an '*' conditionally cause exceptions. The
details for these instructions are discussed below.
Instruction-based #VE
---------------------
~~~~~~~~~~~~~~~~~~~~~
- Port I/O (INS, OUTS, IN, OUT)
- HLT
@ -41,7 +226,7 @@ Instruction-based #VE
- CPUID*
Instruction-based #GP
---------------------
~~~~~~~~~~~~~~~~~~~~~
- All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH,
VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON
@ -52,7 +237,7 @@ Instruction-based #GP
- RDMSR*,WRMSR*
RDMSR/WRMSR Behavior
--------------------
~~~~~~~~~~~~~~~~~~~~
MSR access behavior falls into three categories:
@ -73,7 +258,7 @@ trapping and handling in the TDX module. Other than possibly being slow,
these MSRs appear to function just as they would on bare metal.
CPUID Behavior
--------------
~~~~~~~~~~~~~~
For some CPUID leaves and sub-leaves, the virtualized bit fields of CPUID
return values (in guest EAX/EBX/ECX/EDX) are configurable by the
@ -93,7 +278,7 @@ not know how to handle. The guest kernel may ask the hypervisor for the
value with a hypercall.
#VE on Memory Accesses
======================
----------------------
There are essentially two classes of TDX memory: private and shared.
Private memory receives full TDX protections. Its content is protected
@ -107,7 +292,7 @@ entries. This helps ensure that a guest does not place sensitive
information in shared memory, exposing it to the untrusted hypervisor.
#VE on Shared Memory
--------------------
~~~~~~~~~~~~~~~~~~~~
Access to shared mappings can cause a #VE. The hypervisor ultimately
controls whether a shared memory access causes a #VE, so the guest must be
@ -127,7 +312,7 @@ be careful not to access device MMIO regions unless it is also prepared to
handle a #VE.
#VE on Private Pages
--------------------
~~~~~~~~~~~~~~~~~~~~
An access to private mappings can also cause a #VE. Since all kernel
memory is also private memory, the kernel might theoretically need to
@ -145,7 +330,7 @@ The hypervisor is permitted to unilaterally move accepted pages to a
to handle the exception.
Linux #VE handler
=================
-----------------
Just like page faults or #GP's, #VE exceptions can be either handled or be
fatal. Typically, an unhandled userspace #VE results in a SIGSEGV.
@ -167,7 +352,7 @@ While the block is in place, any #VE is elevated to a double fault (#DF)
which is not recoverable.
MMIO handling
=============
-------------
In non-TDX VMs, MMIO is usually implemented by giving a guest access to a
mapping which will cause a VMEXIT on access, and then the hypervisor
@ -189,7 +374,7 @@ MMIO access via other means (like structure overlays) may result in an
oops.
Shared Memory Conversions
=========================
-------------------------
All TDX guest memory starts out as private at boot. This memory can not
be accessed by the hypervisor. However, some kernel users like device

View file

@ -1,3 +1,3 @@
.. SPDX-License-Identifier: GPL-2.0
.. kernel-feat:: $srctree/Documentation/features xtensa
.. kernel-feat:: features xtensa

View file

@ -6,17 +6,16 @@ Block io priorities
Intro
-----
With the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io
priorities are supported for reads on files. This enables users to io nice
processes or process groups, similar to what has been possible with cpu
scheduling for ages. This document mainly details the current possibilities
with cfq; other io schedulers do not support io priorities thus far.
The io priority feature enables users to io nice processes or process groups,
similar to what has been possible with cpu scheduling for ages. Support for io
priorities is io scheduler dependent and currently supported by bfq and
mq-deadline.
Scheduling classes
------------------
CFQ implements three generic scheduling classes that determine how io is
served for a process.
Three generic scheduling classes are implemented for io priorities that
determine how io is served for a process.
IOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given
higher priority than any other in the system, processes from this class are

Some files were not shown because too many files have changed in this diff Show more