What’s Better? Anaconda or Python Virtualenv

Anaconda or python virtualenv

What’s Better? Anaconda or Python Virtualenv

Creating deep learning or machine learning models in local systems is like a cakewalk. Things get complicated when we try to replicate the same project setup in the cloud. The two popular options we as a data science community have for managing project environments are anaconda environment and python virtualenv.

Which one did you use? 

Else let me put the straight question which project environment is best for deploying data science projects in the cloud?

Anaconda?  or python virtualenv environment? 

Confused to answer this question, don’t blame your mind, just relax and read this article. You will get clear answers with many reasons or features to keep in mind while selecting the environment for your next data science projects.

Anaconda?  or python virtualenv environment?  which one is the best ? #datascience #anaconda #python #machinelearning #deeplearing

Click to Tweet

 To get started, let quickly have a look at the table of contents of this article.

Why do we need the environment?

Why we need environment

Why we need environment

Let’s start the discussion with why we need to maintain environments?

In the world of data science, at the same time, we usually work on  various projects.

For example, we are building models to find the fraudulent activities of credit cards, and at the same time, we are testing the performance of the email spam classifier model we have already built.

The data science packages and project setup is completely different for these two projects. The ideal approach is having two various machines for these two projects so we won’t find any issues. 

But this approach is not a feasible one, and it’s not a cost-effective way too.

The other thing we can do is, using the same machine for both the projects, cost optimization wise this approach is perfect. 

But we will face issues with this approach as well.

Let’s say the packages we used got a few updations. Now we need to update the package. If we update, few functions in the last version code will throw errors.

For example

Let’s say we are using pandas version x. In this version, to get the frequency of the categorical values, we use the syntax get_frequency. Whereas in the new version, it’s changed to frequency_values.

Suppose we update the code and run our script without doing these modifications. Our main script will fall. So It’s always recommended to use the same library versions in our projects.

That’s the reason you will find the requirements.txt file in Github projects. Don’t worry about the requirements.txt file. For now, we will discuss this more in our upcoming sections of this article.

What is the requirements.txt file?

For now, just remembers using the requirements.txt file, we will keep track of each package or library we used in the project with a specific version details.

Below is the sample requirement.txt file with contents.

Sample requirements file

Sample requirements file

In the requirements.txt file, we will store what package we have used in our project, and we also keep track of the versions we are using.

To create the project setup, all we need is to install these packages in whatever system we intend. It could be the local laptop or desktop; else, it could be the cloud setup like google app engines or AWS EC2 engines.

Now let’s go back to our actual question,

why do we need the environments?

If we maintain the requirement.txt file, then for each project, we can create a different environment, and the project will run in the environment specific to the project.

In this way, in the same system,  we can create multiple environments and work on numerous projects.

The technical way of saying this managing the python packages, in more general, it’s called packages manager.

What is Pip?

To manage various packages, we need a system that will take care of keeping track of each package and its versions. That’s where we have PIP, a python package manager. Using the pip, we will get the feasibility to create the various environments. 

If you remember in the gender wise face detection project, we have used the pip to install various computer vision python libraries.

The pip will take care of holding the versions in the cloud; when we see a package and its version in the requirement.txt, then pip will provide you the specific version and help you in installing the packages.

Pip is the ideal place for mange packages that have not come with python installations. It helps in installing any other packages, and pip community members are super active and address the comments in a much faster way.

If you know any other programming language, the concept of the package manager is similar.

  • Ruby has gem
  • Javascript has npm

We learned how we can leverage the using various environments, and we also learned how pip will help us in achieving this. Now let’s see how we can create virtual environments using python and conda. Then we will deep drive to understand which one is better to use in data science projects.

Python Virtualenv

To create the virtualenv (environment), we can leverage the python virtualenv package. We need to install it with the help of pip. In side this environment we can install popular machine learning python packages.

pip install virtualenv

For installing any package using the pip all, we need to use the below command with the package you would like to install.

pip install package_name

Once we successfully install the virtualenv package, we can create the environment.

Creating an environment with virtualenv

As we successfully installed the virtualenv package with the pip’s help, now, using the below command, we can create the environment.

virtualenv name_of_the_folder

In your case, you can replace the name_of_the_floder with the name of your project or any relevant name. We generally add the env at the end of the folder name. 

Example: opinion_extractor_env

Activating and deactivating virtualenv environment

Once we created the environment, we need to activate the virtual environment to install the pancakes and to use the environment.

We use the below commands to activate the environments.

Mac or ubuntu command

source name_of_the_folder\bin\activate

Windows command

name_of_the_folder\Scripts\activate

With the above command, we can activate the environment created. The thing to note, as a first step, we need to go to the folder location where we created this virtual environment folder.

Installing packages

For installing any specific python package, all you need to do is replace the below command with the package name.

pip install package_name

This will install the specified package into the virtualenv environment we created. To list the packages we use in the project, we don’t need to list the packages manually and its versions. Will are going to use the below command to get the packages in the requirement.txt file format.

pip freeze > requirements.txt

This command gives the list of packages, and it’s the version we have used in our project. To install these in any new cloud setup system, all we need to do is run the below command.

pip install -r requirements.txt

This installs all the packages with the specified versions. For deactivating the environment, we just need to run the command, deactivate. This will deactivate the environment. Please note deactivate won’t delete the environment.

In some online platforms, we can see these kinds of features, for example, in platforms like AssignmentCore, where we can complete python assignments online without worrying about setups. These kinds of platforms give the required python help online to a large extent.  

Conda environment

Using the anaconda conda environment, we can create environments in the same way we created a virtual environment using the python virtualenv.

The functionality is the same, but the commands will change a bit. Let’s have a look at these.

Creating Conda environment

By default, all the created environments can be listed in the envs directory in your local conda directory. To create a new environment, you can use the below command.

conda create --name name_of_the_folder

For example, to create the opinion_extractor_env environment, you can run the below command.

conda create --name opinion_extractor_env

We can create environments with specific python versions too. For example, check the below code.

conda create -n name_of_the_folder python=3.6

Activating and deactivating conda environment

Once we created the environment, to activate the virtual environment, we need to run the below command.

conda activate name_of_the_folder

For deactivating the environment, we just need to run the below command.

conda deactivate

Like requirement.txt file, in conda environment, we use environment.yml files.

For example, to install all the packages with a specific version, we need to use the below command.

conda env create -f environment.yml

To list the package, we need to use the below command.

conda env list

By now, we learned how we could create an environment using python virtualenv and conda environment. Now let’s discuss which one is best to build various machine learning models.

Such as,

Which one is better, python virtualenv or Anaconda?

Which one is better anacoda or python virtualenv

Which one is better anaconda or python virtualenv

Now comes the real question, which environment we are supposed to use. 

The answer depends.

I know now you feel, what? Are you serious?

Yes, it depends on your projects.

Let’s say we are building a core data science project, where we don’t need any other packages set up other than data science packages. Then we can use the anaconda, no need to create a new python environment. 

One key pressing issue with anaconda is, when we install the anaconda, it will come up with all the major data science packages. This is heavy for our system as we are installing many packages which we are not using. 

If we require both the frontend and machine learning or data science pipeline, then it’s good to have the python virtualenv setup.

How to select the environment for data science projects?

From now onwards, when we need to select the project environment setup, remember the below image.

Anaconda Vs Python virtual env comparison

Anaconda Vs Python virtual env comparison

As you see, if we are integrating both the frontend and machine learning setup, we need to use the python virtualenv. If we are going to use only the data science or machine learning setup, it’s good to use the anaconda itself.

Conclusion

We learned what the need to create environments is. In the process, we learned how python mangoes the various python packages to keep track of all the python packages with the specific version.

We learned how to create virtual environments using python virtualenv and also with anaconda conda environment. In the end, we discussed which one we need to use. 

To summarise,

If your project needs both the front end (web app) and data science modeling, use python virtualenv.

If your project needs only the data science package, you can leverage the conda environment.

What next

We are having another package pyenv, which considers both these methods. It’s a wrapper on top of these two approaches and provides more flexibility for creating environments. You can have a look at it.

Recommended Courses

Recommended
Deep Learning python

Deep Learning A to Z Course in Python

Rating: 4.5/5

supervised learning

Supervised learning with Scikit Learn

Rating: 4.6/5

educative-machine-learning

Complete Supervised Learning Algorithms

Rating: 4.3/5

Follow us:

FACEBOOKQUORA |TWITTERGOOGLE+ | LINKEDINREDDIT FLIPBOARD | MEDIUM | GITHUB

I hope you like this post. If you have any questions ? or want me to write an article on a specific topic? then feel free to comment below.

Leave a Reply

Your email address will not be published. Required fields are marked *

>