Getting Started with Docker !

In this new blog, lets get our hands immersed in the Docker containers and what is the difference between the Docker and Virtual Machines and Why Docker is more powerful than Virtual machines. The internal working of the docker is explained in simple terms for easy understanding.

So, Lets get started !

What is Docker ?

Lets break it down into simple words, Docker is a platform for developing and deploying applications by isolating them.

What is Virtual machine ?

Virtual machines is like a emulator, which allows to run an operating system in an app window on our desktop that behaves like a full, separate computer allowing the developers to develop and deploy applications.

Difference between Virtual machines and Docker ?

As you can see, virtual machines isolate the entire system whereas docker container isolates the application.

Virtual machines Architecture

From the above architecture,

  1. Infrastructure – refers to Laptops, Systems
  2. Host OS – refers to Operating system (Linux, Windows, Mac OS)
  3. Hypervisor – refers to managing director who manages and allocates resources and provides access to the applications
  4. Guest OS – refers to the guest operating system which the developer wish to run (various varieties of Linux)
  5. Bins/Libs – refers to binaries and libraries associated with the guest operating system which occupies more space
  6. App1, App2, App3 – refers to the application running on different guest operating system

Docker Architecture

1 Infrastructure, Host OS,Bins/Libs and Apps are same as the Virtual machines.

2 Docker Daemon – similar to hypervisor which provides interface and isolates  the applications from the host operating system.

With these, you might have gained the difference between Virtual machines and Docker containers.

Lets make it more clear by diving into simple “hello-world” example :

Running Docker “hello-world”:

Docker Desktop should be installed based on the operating system you are using.

The simple working of the docker is explained in the above diagram.

After installing, try running the below command from your favourite command prompt :

$ docker run hello-world

“Hello-world” is the official docker image which is available in the Docker Hub. It is similar to running “hello-world” program.

When you run this command, the docker searches for the “docker-image” locally and the image wont be available in your local system, so it pulls the images from the docker hub and streams the output in the terminal as follows:

Hello from Docker!

By this, you come to know what is docker and its internal working. In the next tutorial, we can explore more about the terminologies in the docker world in detail !

Cheers 🙂

Data Mining

What is Data Mining?

Data mining is process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.

Data Mining Process:

  1. Develop understanding of application, goals
  2. Create dataset for study (often from Data Warehouse)
  3. Data Cleaning and Preprocessing
  4. Data Reduction and projection
  5. Choose Data Mining task
  6. Choose Data Mining algorithms
  7. Use algorithms to perform task
  8. Interpret and iterate through 1-7 if necessary
  9. Deploy: integrate into operational systems.

As you can see, the core steps of data mining is from step 4 – step 8. Well, on discussing about data mining process, it leads to an important methodology of data mining called “CRISP-DM“.

CRISP-DM:

“Cross Industry Standard Process for Data Mining” – a 6-phase model of the entire data mining process, from start to finish, that is broadly applicable across industries for a wide array of data mining projects.

As there are 6 phases, I will give short description about each phases.

  1. Business Understanding – Identifying the project objectives
  2. Data Understanding – Collect and review data
  3. Data Preparation – Select and clean data
  4. Modelling – Manipulate data and draw conclusions
  5. Evaluation – Evaluate model
  6. Deployment – Apply conclusions to business