AutoKeras – a tough competitor for Google’s AutoML

Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.

We can create deep learning models in just 4 lines of code :

import autokeras as ak 
clf = ak.ImageClassifier() 
clf.fit(x_train, y_train) 
results = clf.predict(x_test)

Simple right ? The preview version has been released and awaiting for its final official release.

I’m damn sure that this will make the newbies in DeepLearning to make their hands wet by creating complex deep learning models at ease.

TensorFlow 2.0 !!!!

TensorFlow has become the world’s most widely adopted machine learning framework, catering to a broad spectrum of users and use-cases. In this time, TensorFlow has evolved along with rapid developments in computing hardware, machine learning research, and commercial deployment.

The latest we hear from Martin Wicke is that :

TensorFlow 2.0 is coming with  major updates !!!!

Main Features of TensorFlow 2.0 include :

  • Eager Execution – which makes TensorFlow easier to learn and apply.
  • Support for more platforms and languages.
  • Removal of deprecated API’s.

Another major thing is “tf.contrib“, which will be stopped distributing as a part of release of TensorFlow 2.0

Preview version will be released this year 2018 lately.

Introduction to Pandas

In this blog, you will get to know about the working of pandas library in python with real-time examples.

Pandas is one of the most powerful toolkit for data manipulation and analysis built over Numpy.

In Panda, namely there are two terminologies :

1.Series

2.DataFrame

Series:

Series is nothing but the 1-Dimensional array or (1-D array).

Example:
Import Pandas as pd
obj = Series([1,2,3,4,5])
print(obj)

Output:
0    1
1    2
2    3
3    4
4    5
dtype: int64

As you can see, the type of the “obj” variable is an array of “int64” values. It’s simple as that to create a series object.

Now we can do some basic arithmetic operations, like:

Adding two series objects:

x = pd.Series([2, 4, 6, 8, 10])
y = pd.Series([1, 3, 5, 7, 9])
add = x + y
print("Add:")
print(add)

Output:
Add:
0     3
1     7
2    11
3    15
4    19
dtype: int64

Same way as above, we can do other arithmetic operations like Subtraction, Multiplication, Division, Modulo Operations.

Another exciting feature of series is that, you can easily convert the Python dictionary(dict) into a series object as below:

data = {'India': 5000, 'America': 2500, 'Europe': 1000}
seriesobj = pd.Series(data)
print(seriesobj)

output:

India      5000
America    2500
Europe     1000
dtype: int64

We can also check if any values in the series object is “NULL” using the isnull() function:

seriesobj.isnull()

output:
India      False
America    False
Europe     False
dtype: bool

As you can see, the result of the above operation is of type “Boolean”, Series is super easy and flexible to use.

DataFrame:

DataFrame on the other hand is the 2-Dimensional array with rows and columns that represents a tabular, spread-sheet like data structures.

Creating a data frame is as simple as below:

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

f = pd.DataFrame(exam_data,index=labels)

print(f)

Output:
attempts name       qualify score 
a 1     Anastasia     yes   12.5
b 3     Dima          no    9.0
c 2     Katherine     yes   16.5
d 3     James         no    NaN
e 2     Emily         no    9.0
f 3     Michael       yes   20.0
g 1     Matthew       yes   14.5
h 1     Laura         no    NaN
i 2     Kevin         no    8.0
j 1     Jonas         yes   19.0

We can play with dataframes using different functions and methods. For example in order to get the basic information about a dataframe, we can use a function called “info()“.

f.info()

Output:

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
attempts    10 non-null int64
name        10 non-null object
qualify     10 non-null object
score       8 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes

Now as you will be familiar with creating a data frame, we can play with “Sub-setting / Slicing” the data frames.

Subsetting:

It is a powerful indexing feature using which we can “select and exclude variables / feature columns ” from the data frame. We can subset / slice a data frame using various means like

a. Sub-setting by specifying number of rows

First 3 rows of the dataframe

f[:3]

Output:
attempts    name       qualify score 
a 1         Anastasia   yes    12.5
b 3         Dima        no     9.0
c 2         Katherine   yes    16.5

b. Sub-setting using the column names

f_new = f[['name','score']]
f_new

Output:
  name       score
a Anastasia   12.5
b Dima        9.0
c Katherine   16.5
d James       NaN
e Emily       9.0
f Michael     20.0
g Matthew     14.5
h Laura       NaN
i Kevin       8.0
j Jonas       19.0

c. Sub-setting only the rows[1,3,5,6] of the specific columns from the data frame.

f.ix[[1,3,5,6],['name','score']]

Output:
  name     score
b Dima      9.0
d James     NaN
f Michael   20.0
g Matthew   14.5

d. Sub-setting based on some Logical Conditions

Selecting the rows with 'score' values between 15 and 20(both inclusive)
Example:
f[f['score'].between(15,20)]

Output:
attempts  name       qualify   score 
c 2       Katherine    yes      16.5
f 3       Michael      yes      20.0
j 1       Jonas        yes      19.0
Selecting the rows with 'attempts' < 2 and 'score' > 15
Example:

f[(f['score']>15) & (f['attempts']<2)]

Output:
  attempts    name     qualify    score
j   1         Jonas      yes       19.0

As you can see, the data frame is more powerful and flexible to work with structured data. We can also explore some more features of data frame like “adding and dropping” rows and columns in the data frame.

a. Adding a new row to the data frame:

f.loc['k'] = [1,"Suresh",'yes',15.5]
f

Output:
attempts  name   qualify   score 
a 1 Anastasia      yes      12.5
b 3 Dima           no       9.0
c 2 Katherine      yes      16.5
d 3 James          no       NaN
e 2 Emily          no       9.0
f 3 Michael        yes      20.0
g 1 Matthew        yes      14.5
h 1 Laura          no       NaN
i 2 Kevin          no       8.0
j 1 Jonas          yes      19.0
k 1 Suresh         yes      15.5

b. Dropping the newly added row in the data frame

f = f.drop('k')
f
Output:
attempts name   qualify   score 
a 1 Anastasia     yes      12.5
b 3 Dima          no       9.0
c 2 Katherine     yes      16.5
d 3 James         no       NaN
e 2 Emily         no       9.0
f 3 Michael       yes      20.0
g 1 Matthew       yes      14.5
h 1 Laura         no       NaN
i 2 Kevin         no       8.0
j 1 Jonas         yes      19.0 

c. Dropping the columns from the data frame.

f = f.drop('attempts',1)
f

Output:
    name      qualify   score 
a  Anastasia    yes      12.5
b  Dima         no       9.0
c  Katherine    yes      16.5
d  James        no       NaN
e  Emily        no       9.0
f  Michael      yes      20.0
g  Matthew      yes      14.5
h  Laura        no       NaN
i  Kevin        no       8.0
j  Jonas        yes      19.0 

d. Adding a new column to the data frame.

color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']
f['color'] = color
f

Output:
     name        qualify     score 
a     Anastasia    yes        12.5
b     Dima         no         9.0
c     Katherine    yes        16.5
d     James        no         NaN
e     Emily        no         9.0
f     Michael      yes        20.0
g     Matthew      yes        14.5
h     Laura        no         NaN
i     Kevin        no         8.0
j     Jonas        yes        19.0

So with all these stuffs, I hope you might have gained something about the Pandas library and how it facilitates the data analysts for data manipulation and analysis. It’s just the beginning and lots more to come and you can make your hands dirty by looking at the official documentation of the Series(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) and Dataframe(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)