Data Science Virtual Machine (DSVM) is a virtual machine with pre-installed popular libraries for data analytics and machine learning. A DSVM can be used as an environment for training models and experimenting with data.
The image is based on Ubuntu and includes pre-installed software:
- Conda, a package manager with Python 2.7 (environment
py27
) and Python 3.10 (py310
). - Jupyter Notebook and JupyterLab, tools for interactive and reproducible computations.
- Data analysis, scientific computing and data visualisation libraries: Pandas, NumPy, SciPy, Matplotlib.
- Machine Learning libraries: PyTorch, CatBoost, TensorFlow, scikit-learn, Keras.
- PySpark, a library for interacting with Apache Spark™ and building distributed data processing pipelines.
- NLTK, a suite of natural language processing libraries and data.
- Docker®, a container management system.
- Git, a version control system.
- NVIDIA® Data Center Driver, CUDA® Toolkit 12, and Container Toolkit for accelerating machine learning and other compute-intensive applications on NVIDIA GPUs available in Nebius Israel.
- Optimised libraries and instruments for working with images: scikit-image, opencv-python, Pillow.
-
Click the button in this card to go to VM creation. The image will be automatically selected under Image/boot disk selection.
-
Under Disks, follow these recommendations for a boot disk:
- Type: SSD.
- Size: 30 GB or more.
-
Under Computing resources, follow these recommendations:
- vCPUs: 2 or more.
- RAM: 2 GB or more.
-
Paste the public key from the pair into the SSH key field.
-
Create the VM.
-
Activate a Conda environment with Python version of your choice: run
conda activate py27
for Python 2.7, orconda activate py310
for Python 3.10.
- Analysis and prediction of user behavior.
- Analysis of system operation and prediction of failures.
- Customer segmentation.
- Classification of images, documents, and any types of data.
- Recommendation systems.
- Speech synthesis and recognition services.
- Dialog engines.
Nebius Israel technical support responds to requests 24 hours a day, 7 days a week. The types of requests available and their response time depend on your pricing plan. You can activate paid support in the Management console. Learn more about requesting Technical support.
Software | Version |
---|---|
Ubuntu | 20.04 LTS |
CatBoost | 1.2 |
Conda | 23.5.0 |
Docker | 24.0.2 |
Git | 2.25.1 |
JypiterLab | 3.6.3 |
Keras | 2.11.0 |
Matplotlib | 3.7.1 |
NLTK | 3.7 |
NVIDIA CUDA Toolkit | 12.0.1 |
NVIDIA Container Toolkit | 1.13.2 |
NVIDIA Data Center Driver | 535.54.03 |
NumPy | 1.22.3 |
Pandas | 1.4.2 |
Pillow | 9.4.0 |
PySpark | 3.2.1 |
PyTorch | 1.13.1 |
SciPy | 1.8.1 |
TensorFlow | 2.11.0 |
opencv-python | 4.6.0 |
scikit-image | 0.19.2 |
scikit-learn | 1.1.1 |