Skip to content

Conversation

@lucasbazan
Copy link

Description

Refactored examples/orderbook_data/create_dataset.py to replace os.system calls (which invoked Linux-specific shell commands like cp, tar, and chmod) with native Python libraries (shutil, tarfile, os).

Key changes:

  1. Replaced cp with shutil.copy.
  2. Replaced tar shell command with the tarfile module.
  3. Replaced chmod 777 with os.chmod using safer permissions (0o755), ensuring the script doesn't grant excessive world-writable permissions.
  4. Introduced a lambda helper and a loop to handle the "SH" and "SZ" subfolders, removing code duplication.

Motivation and Context

  1. Cross-Platform Compatibility: The previous implementation relied on Linux shell commands (cp, tar, chmod) via os.system, causing the script to fail on Windows environments. Native Python libraries ensure the script runs correctly on Windows, Linux, and macOS.
  2. Security: Using os.system with shell commands can lead to command injection risks if variables are not sanitized. Additionally, chmod 777 is a security risk. The new implementation follows the principle of least privilege.
  3. Code Quality: The refactoring removes redundant code blocks by iterating over the target subfolders.

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Test Description:
I manually ran the create_dataset.py script locally to verify that:

  1. The .tar.gz files are correctly copied to the destination.
  2. The archives are correctly extracted into the SH and SZ folders.
  3. No shell errors are raised.

Types of changes

  • Fix bugs (fixes script failure on non-Linux OS)
  • Add new feature
  • Update documentation

NOTE: I know that this is a demo script, but is good to demonstrate with a secure code.

@lucasbazan
Copy link
Author

@microsoft-github-policy-service agree

@lucasbazan
Copy link
Author

@lucasbazan please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant