Skip to content

Conversation

@shekharsorot
Copy link
Contributor

Pull Request Summary: Azure Files NFSv4 Support

Overview

This PR adds comprehensive NFSv4.1 protocol support for Azure Files testing in LISA, including:

  • A new unified AzureFileShare class that handles both SMB and NFS protocols
  • A fully implemented xfstests test case (verify_azure_file_share_nfsv4) with parallel worker execution
  • A basic NFS mount smoke test (verify_nfsv4_basic) for quick validation
  • The legacy Nfs class is preserved for backward compatibility (to be removed in future)

Key Changes

1. Unified AzureFileShare Class with Protocol Support

File: lisa/sut_orchestrator/azure/features.py

The AzureFileShare class now supports both SMB and NFS protocols via set_protocol():

# SMB (default)
azure_file_share.set_protocol(FileShareProtocol.SMB, FileShareConnectivity.PRIVATE_ENDPOINT)

# NFS
azure_file_share.set_protocol(FileShareProtocol.NFS, FileShareConnectivity.PRIVATE_ENDPOINT)

New Enums Added:

Enum Values Description
FileShareProtocol SMB, NFS Protocol selection
FileShareAuthMode SHARED_KEY, MANAGED_IDENTITY, KERBEROS, NETWORK Authentication mode
NfsSecurityMode SYS NFS security modes (future: KRB5, KRB5I, KRB5P)
FileShareConnectivity PUBLIC, PRIVATE_ENDPOINT, SERVICE_ENDPOINT Connectivity options

Storage Account Naming:

  • SMB: lisasmb<random10chars>
  • NFS: lisanfs<random10chars>

2. NFS xfstests Test Case: verify_azure_file_share_nfsv4

File: lisa/microsoft/testsuites/xfstests/xfstesting.py

Fully implemented xfstests validation with parallel worker execution:

Feature Description
Parallel Workers Default 4 workers, each with own xfstests copy
Separate Shares Each worker gets dedicated test/scratch NFS shares
Test Cases 73 generic tests validated for Azure Files NFS
Excluded Tests 16+ tests excluded (unsupported features documented)
Mount Options vers=4,minorversion=1,sec=sys

NFS Test Configuration:

_default_nfs_mount_opts = "vers=4,minorversion=1,sec=sys"
_default_nfs_excluded_tests = "generic/013 generic/014 ..."  # Hard links, sparse files, etc.
_default_nfs_testcases = "generic/001 generic/005 ..."       # 73 tests

3. NFS Smoke Test: verify_nfsv4_basic

File: lisa/microsoft/testsuites/core/storage.py

Renamed from verify_azure_file_share_nfs to verify_nfsv4_basic. Simple NFS mount validation:

# Uses unified AzureFileShare class with NFS protocol
azure_file_share.set_protocol(FileShareProtocol.NFS, FileShareConnectivity.PRIVATE_ENDPOINT)
azure_file_share.create_share()
# Mount and verify...

4. Legacy Nfs Class Preserved

File: lisa/sut_orchestrator/azure/features.py

The original Nfs class is unchanged and preserved for backward compatibility:

  • Inherits from AzureFeatureMixin, features.Nfs
  • Used by any existing tests that haven't migrated
  • Will be removed in a future PR once all consumers migrate

5. NFS Worker Setup Helper

File: lisa/microsoft/testsuites/xfstests/xfstesting.py

New helper method _setup_azure_nfs_workers() that:

  • Configures AzureFileShare for NFS protocol
  • Creates per-worker NFS shares using _deploy_azure_file_share()
  • Mounts shares using NFSClient tool
  • Configures xfstests local.config for each worker

Technical Details

Azure Files NFS Requirements

Requirement Value Reason
Storage SKU Premium_LRS or PremiumV2_LRS (Default) NFS requires Premium tier
Account Kind FileStorage NFS requires FileStorage
HTTPS Disabled NFS doesn't use HTTPS
Private Endpoint Required NFS has no public endpoint access
Shared Key Access Disabled NFS uses network-based auth

NFS Mount Options

vers=4,minorversion=1,sec=sys

Resource Naming Conventions

Resource Pattern
Storage Account (SMB) lisasmb<random10chars>
Storage Account (NFS) lisanfs<random10chars>
Private Endpoint <storageaccount>-file-pe
File Shares Caller-specified

Architecture

Protocol Selection Flow

AzureFileShare.set_protocol(protocol, connectivity)
        │
        ├── SMB: Uses CIFS mount with credential file
        │        Storage: lisasmb*, HTTPS enabled
        │
        └── NFS: Uses NFSClient mount (no credentials)
                 Storage: lisanfs*, HTTPS disabled, Premium required

Parallel Worker Architecture (xfstests)

XfstestsParallelRunner
        │
        ├── Worker 1: xfstests copy + test_share_w1 + scratch_share_w1
        ├── Worker 2: xfstests copy + test_share_w2 + scratch_share_w2
        ├── Worker 3: xfstests copy + test_share_w3 + scratch_share_w3
        └── Worker 4: xfstests copy + test_share_w4 + scratch_share_w4

Files Changed

File Change Type Description
lisa/sut_orchestrator/azure/features.py Modified Added protocol enums, updated AzureFileShare with set_protocol(), preserved Nfs class
lisa/microsoft/testsuites/xfstests/xfstesting.py Modified Added verify_azure_file_share_nfsv4, _setup_azure_nfs_workers(), NFS test config
lisa/microsoft/testsuites/core/storage.py Modified Renamed verify_azure_file_share_nfsverify_nfsv4_basic, uses unified AzureFileShare

Testing

Test Cases

Test Case File Purpose
verify_azure_file_share_nfsv4 xfstesting.py Comprehensive xfstests with parallel workers (73 tests)
verify_nfsv4_basic storage.py Simple NFS mount smoke test
verify_azure_file_share xfstesting.py Existing SMB xfstests (unchanged)

Run Commands

# Run NFS xfstests (comprehensive)
lisa -r azure.yml -v testcase:verify_azure_file_share_nfsv4

# Run NFS smoke test (quick)
lisa -r azure.yml -v testcase:verify_nfsv4_basic

Recommended Test Images

Image Purpose
canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest Ubuntu 22.04
canonical ubuntu-24_04-lts server latest Ubuntu 24.04
redhat rhel 9_5 latest RHEL 9.5
microsoftcblmariner azure-linux-3 azure-linux-3-gen2 latest Azure Linux 3

NFS Test Exclusions

Tests excluded due to Azure Files NFS limitations:

Category Tests Reason
Hard links generic/013 link() syscall not supported
Sparse files generic/014, 129, 130, 239, 240, 469, 567 Hole punching not supported
Device nodes generic/184, 306 mknod/mkfifo not supported
fallocate generic/071, 086, 214, 228, 286, 315, 391, 422, 568, 590 Not implemented
NFS-specific generic/024, 117, 465, 528, 599, 635 Protocol limitations

Future Work

  • Remove legacy Nfs class once all consumers migrate to AzureFileShare
  • Managed Identity support for SMB (using FileShareAuthMode.MANAGED_IDENTITY)
  • Encryption-in-transit for NFS (Kerberos modes: KRB5, KRB5I, KRB5P)
  • Service endpoint connectivity option for NFS

Breaking Changes

Change Migration Path
verify_azure_file_share_nfs renamed Use verify_nfsv4_basic

Backward Compatibility

Component Status
verify_azure_file_share (SMB xfstests) ✅ Unchanged
AzureFileShare class ✅ Backward compatible (default = SMB)
Nfs class (lisa.sut_orchestrator.azure.features) ✅ Preserved for compatibility
Nfs base class (lisa.features.nfs) ✅ Unchanged

References


Key Test Cases: verify_azure_file_share_nfsv4 | verify_nfsv4_basic

Impacted LISA Features: AzureFileShare, Nfs

Tested Azure Marketplace Images:

  • canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
  • canonical ubuntu-24_04-lts server latest
  • redhat rhel 9_5 latest
  • microsoftcblmariner azure-linux-3 azure-linux-3-gen2 latest

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive Azure Files NFSv4.1 protocol support to LISA's testing framework. The changes introduce a unified AzureFileShare class that handles both SMB and NFS protocols through a configurable set_protocol() method, while preserving the legacy Nfs class for backward compatibility.

Changes:

  • Adds protocol-aware Azure File Share management with NFS and SMB support via new enums and set_protocol() configuration
  • Implements parallel NFSv4.1 xfstests validation (verify_azure_file_share_nfsv4) with 73 validated test cases across 4 workers
  • Refactors storage test to use unified AzureFileShare class (verify_nfsv4_basic, renamed from verify_azure_file_share_nfs)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lisa/sut_orchestrator/azure/features.py Adds FileShareProtocol/AuthMode/Connectivity enums, unified AzureFileShare with set_protocol() method, NFS-specific storage account configuration, preserves legacy Nfs class
lisa/sut_orchestrator/azure/common.py Implements protocol-aware file share creation/deletion using ARM API for NFS (shared key disabled) and data plane API for SMB
lisa/microsoft/testsuites/xfstests/xfstests.py Adds deferred notification support for parallel workers, file existence timeout handling, random delay to prevent SSH connection pool contention
lisa/microsoft/testsuites/xfstests/xfstesting.py Implements verify_azure_file_share_nfsv4 with parallel workers, NFS worker setup helper, unified cleanup method for SMB/NFS protocols
lisa/microsoft/testsuites/core/storage.py Renames verify_azure_file_share_nfs to verify_nfsv4_basic, migrates from legacy Nfs class to unified AzureFileShare with NFS protocol

Comment on lines +4051 to +4125
def create_share(
self,
quota_in_gb: int = 100,
) -> None:
"""
Create a file share using the configured protocol.
For NFS: Creates Premium storage account with private endpoint.
For SMB: Creates Standard storage account (public or private endpoint).
Must call set_protocol() before this method to configure NFS.
"""
platform: AzurePlatform = self._platform # type: ignore
node_context = self._node.capability.get_extended_runbook(AzureNodeSchema)
location = node_context.location
resource_group_name = self._resource_group_name

random_str = generate_random_chars(string.ascii_lowercase + string.digits, 10)
self._file_share_names = [f"lisa{random_str}fs"]
self._private_endpoint_name = f"{self._storage_account_name}-file-pe"

if self._protocol == FileShareProtocol.NFS:
# NFS requires Premium_LRS, FileStorage, HTTPS disabled
check_or_create_storage_account(
credential=platform.credential,
subscription_id=platform.subscription_id,
cloud=platform.cloud,
account_name=self._storage_account_name,
resource_group_name=resource_group_name,
location=location,
log=self._log,
sku="Premium_LRS",
kind="FileStorage",
enable_https_traffic_only=False,
)
get_or_create_file_share(
credential=platform.credential,
subscription_id=platform.subscription_id,
cloud=platform.cloud,
account_name=self._storage_account_name,
file_share_name=self._file_share_names[0],
resource_group_name=resource_group_name,
protocols="NFS",
log=self._log,
quota_in_gb=quota_in_gb,
)
else:
# SMB uses Standard storage
check_or_create_storage_account(
credential=platform.credential,
subscription_id=platform.subscription_id,
cloud=platform.cloud,
account_name=self._storage_account_name,
resource_group_name=resource_group_name,
location=location,
log=self._log,
sku="Standard_LRS",
kind="StorageV2",
enable_https_traffic_only=True,
allow_shared_key_access=False,
)
get_or_create_file_share(
credential=platform.credential,
subscription_id=platform.subscription_id,
cloud=platform.cloud,
account_name=self._storage_account_name,
file_share_name=self._file_share_names[0],
resource_group_name=resource_group_name,
log=self._log,
quota_in_gb=quota_in_gb,
)

# Create private endpoint if required
if self._connectivity == FileShareConnectivity.PRIVATE_ENDPOINT:
self._setup_private_endpoint(location)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_share() method has a subtle bug when called for NFS shares without calling set_protocol() first. If a caller uses create_share() directly for NFS (as shown in the docstring "Must call set_protocol() before this method"), but forgets to call set_protocol(), the method will:

  1. Use the default SMB protocol from _initialize() (line 4183)
  2. Use the SMB storage account prefix from _initialize_fileshare_information() (line 4193)
  3. Create an NFS share with the wrong storage account naming convention

This creates a mismatch where the storage account has the "lisasmb" prefix but contains NFS shares.

Consider either:

  • Add validation at the start of create_share() to check protocol compatibility with the requested share type
  • Or remove create_share() and require callers to use set_protocol() + create_file_share() instead for explicit protocol handling

Copilot uses AI. Check for mistakes.
sku="Standard_LRS",
kind="StorageV2",
enable_https_traffic_only=True,
allow_shared_key_access=False,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_share() method for SMB protocol hardcodes allow_shared_key_access=False (line 4110), but the default authentication mode is FileShareAuthMode.SHARED_KEY (set in _initialize() at line 4186). This creates an inconsistency:

  • The auth mode says "use shared key"
  • But the storage account disables shared key access

This will cause authentication failures when attempting to mount SMB shares created via create_share(). The method should either:

  1. Set allow_shared_key_access=True for SMB when _auth_mode == FileShareAuthMode.SHARED_KEY
  2. Or explicitly document that create_share() only supports managed identity/Kerberos authentication for SMB

Note: This issue doesn't affect existing create_file_share() callers since they explicitly pass allow_shared_key_access as a parameter.

Suggested change
allow_shared_key_access=False,
allow_shared_key_access=(
self._auth_mode == FileShareAuthMode.SHARED_KEY
),

Copilot uses AI. Check for mistakes.
Comment on lines +4137 to +4152
if self._connectivity == FileShareConnectivity.PRIVATE_ENDPOINT:
delete_private_dns_zone_groups(
platform,
resource_group_name,
self._log,
private_endpoint_name=self._private_endpoint_name,
)
delete_virtual_network_links(platform, resource_group_name, self._log)
delete_record_sets(platform, resource_group_name, self._log)
delete_private_zones(platform, resource_group_name, self._log)
delete_private_endpoints(
platform,
resource_group_name,
self._log,
private_endpoint_name=self._private_endpoint_name,
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete_share() method unconditionally deletes private endpoint resources when _connectivity == PRIVATE_ENDPOINT (lines 4137-4152), without checking if the private endpoint was created by LISA (_private_endpoint_created_by_lisa flag).

This could delete pre-existing private endpoints that LISA reused rather than created. The cleanup logic should respect the _private_endpoint_created_by_lisa flag, similar to how delete_azure_fileshare() does at lines 4374-4421.

The fix should be:

if self._connectivity == FileShareConnectivity.PRIVATE_ENDPOINT and self._private_endpoint_created_by_lisa:
    # delete private endpoint resources

This ensures pre-existing private endpoints are preserved when LISA reuses existing infrastructure.

Copilot uses AI. Check for mistakes.
Comment on lines +2276 to +2300
# Check if share exists using ARM API
try:
storage_client.file_shares.get(
resource_group_name=resource_group_name,
account_name=account_name,
share_name=file_share_name,
)
log.debug(f"file share {file_share_name} already exists")
except Exception:
# Share doesn't exist, create it
log.debug(
f"creating file share {file_share_name} with protocols {protocols}"
)
from azure.mgmt.storage.models import FileShare

file_share = FileShare(
enabled_protocols=protocols,
share_quota=quota_in_gb,
)
storage_client.file_shares.create(
resource_group_name=resource_group_name,
account_name=account_name,
share_name=file_share_name,
file_share=file_share,
)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling at line 2284 catches all exceptions with a bare except Exception:, which will mask actual errors (like authentication failures, network errors, or resource group not found errors).

This should be refined to only catch the specific exception for "resource not found" (typically ResourceNotFoundError from azure.core.exceptions). Other exceptions should propagate to help diagnose configuration or permission issues.

Recommended fix:

from azure.core.exceptions import ResourceNotFoundError

try:
    storage_client.file_shares.get(...)
    log.debug(f"file share {file_share_name} already exists")
except ResourceNotFoundError:
    # Share doesn't exist, create it
    ...

This makes error handling more precise and helps debugging when real issues occur.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant