Continued: DataFrame & Matplotlib

6 minute read

Identifying Values: Columns, Rows

Column Names, Values: `.columns`

list(file1.columns) # checks all the columns, puts column names in list

['ID',
 'LIMIT_BAL',
 'SEX',
 'EDUCATION',
 'MARRIAGE',
 'AGE',
 'PAY_0',
 'PAY_2',
 'PAY_3',
 'PAY_4',
 'PAY_5',
 'PAY_6',
 'BILL_AMT1',
 'BILL_AMT2',
 'BILL_AMT3',
 'BILL_AMT4',
 'BILL_AMT5',
 'BILL_AMT6',
 'PAY_AMT1',
 'PAY_AMT2',
 'PAY_AMT3',
 'PAY_AMT4',
 'PAY_AMT5',
 'PAY_AMT6',
 'default payment next month']

file1['ID'].head() # checks for column: 'ID'

  1
  2
  3
  4
  5
Name: ID, dtype: int64

file1['EDUCATION'].head() # checks for column: "EDUCATION"

  2
  2
  2
  2
  2
Name: EDUCATION, dtype: int64

Row Names, Values: `.index` and `.iloc[]`

list(file1.index)[0:6] # puts the row index numbers into a list, displays first five elements

[0, 1, 2, 3, 4, 5]

file1.iloc[0] # prints the first row with index zero. 

ID                                1
LIMIT_BAL                     20000
SEX                               2
EDUCATION                         2
MARRIAGE                          1
AGE                              24
PAY_0                             2
PAY_2                             2
PAY_3                            -1
PAY_4                            -1
PAY_5                            -2
PAY_6                            -2
BILL_AMT1                      3913
BILL_AMT2                      3102
BILL_AMT3                       689
BILL_AMT4                         0
BILL_AMT5                         0
BILL_AMT6                         0
PAY_AMT1                          0
PAY_AMT2                        689
PAY_AMT3                          0
PAY_AMT4                          0
PAY_AMT5                          0
PAY_AMT6                          0
default payment next month        1
Name: 0, dtype: int64

Value At Row X, Column Y: `.at[]`

file1.at[0, 'EDUCATION'] # prints value at index zero, column "EDUCATION"

Checking Data Type: `.dtypes`

file1.dtypes # checks data types of each column

ID                            int64
LIMIT_BAL                     int64
SEX                           int64
EDUCATION                     int64
MARRIAGE                      int64
AGE                           int64
PAY_0                         int64
PAY_2                         int64
PAY_3                         int64
PAY_4                         int64
PAY_5                         int64
PAY_6                         int64
BILL_AMT1                     int64
BILL_AMT2                     int64
BILL_AMT3                     int64
BILL_AMT4                     int64
BILL_AMT5                     int64
BILL_AMT6                     int64
PAY_AMT1                      int64
PAY_AMT2                      int64
PAY_AMT3                      int64
PAY_AMT4                      int64
PAY_AMT5                      int64
PAY_AMT6                      int64
default payment next month    int64
dtype: object

Checking Data Values: `.unique()`

file1['EDUCATION'].unique() # returns unique values in column 'education'

array([2, 1, 3, 5, 4, 6, 0], dtype=int64)

Location of Specific Values: `.loc[]`

file1.loc[file1['AGE'] == 25].head() # locates and shows all rows where AGE == 25

	ID	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	default payment next month
38	39	50000	1	1	2	25	1	-1	-1	-2	...	0	0	0	780	0	0	0	0	0	1
41	42	70000	2	1	2	25	0	0	0	0	...	63699	64718	65970	3000	4500	4042	2500	2800	2500	0
53	54	180000	2	1	2	25	1	2	0	0	...	43510	44420	45319	1300	2010	1762	1762	1790	1622	0
76	77	50000	1	3	2	25	-1	0	0	0	...	9636	9590	10030	1759	1779	320	500	1000	1000	0
151	152	80000	1	1	2	25	0	0	0	0	...	41087	41951	31826	30000	3000	6000	8000	2000	14000	0

5 rows × 25 columns

Matplotlib

Plotting Lists: `.plot()`

import matplotlib.pyplot as plt
import numpy as np
data = [10, 15, 10, 15, 5, 20, 30, 15, 10]
plt.plot(data)
plt.show()

Plotting on Same Graph: `.plot(x,y1,x,y2 ...)`

import numpy as np
X = np.linspace(-10, 10, num=30, endpoint=True)
y1=X**2
y2=X+15
y3=-X**2 + 50
plt.plot(X,y1, X,y2, X,y3)
plt.grid(True)

Subplot: `.subplot()`

X = np.linspace(-np.pi,np.pi, num=50, endpoint=True)
y1 = np.sin(X)
y2 = np.cos(X)
y3 = np.tan(X)
y4 = X
plt.subplot(2,2,1)
plt.plot(X,y1)
plt.grid(True)
plt.subplot(2,2,2)
plt.plot(X,y2)
plt.grid(True)
plt.subplot(2,2,3)
plt.plot(X,y3)
plt.grid(True)
plt.subplot(2,2,4)
plt.plot(X,y4)
plt.grid(True)
plt.show()

Adding Legend: `.legend`

import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-4*np.pi,4*np.pi, num=320, endpoint=True)
y5 = np.sin(X) * 15
y6 = np.sin(X) * -15
plt.plot(X, y5, 'c', X, y6, 'y')
plt.grid(True)
plt.legend(['data1', 'data2'], loc = 'lower right')
plt.title('Trigonometric Functions: Sin() and Cos()')
plt.show() # b: blue, g: green, c: cyan, r:red, etc

Style of Lines: `(Use parameters)`

X = np.linspace(-10,10, num=20)
y1 = 1.5*X
y2 = 1.5*X + 6
y3 = 1.5*X + 12
y4 = 1.5*X + 18
plt.plot(X, y1, '--r', X, y2, ':m', X, y3, '-.b', X, y4, 'dg')
plt.legend(['data1', 'data2','data3', 'data4'], loc = 'lower right')
plt.title('Lines With Differing Styles')
plt.show()

Radar Charts

import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
# Compute pie slices
N = 20
theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N) # adjust radii by setting 10 to a different number. Right now, radii is between 0 and 10
width = np.pi / 4 * np.random.rand(N) # adjust width by setting 6 to a different number
colors = plt.cm.viridis(radii / 10.) # adjust color: viridis, magma, plasma, inferno
ax = plt.subplot(projection='polar')
ax.bar(theta, radii, width=width, bottom=0.0, color=colors, alpha=0.5)
plt.show()

Heatmap

import pandas as pd
import numpy as np
vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
              "potato", "wheat", "barley"]
farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening",
           "Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."]
harvest = [[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
                    [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
                    [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
                    [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0],
                    [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
                    [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
                    [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]]
df1 = pd.DataFrame(harvest, columns=farmers, index=vegetables)
df1

	Farmer Joe	Upland Bros.	Smith Gardening	Agrifun	Organiculture	BioGoods Ltd.	Cornylee Corp.
cucumber	0.8	2.4	2.5	3.9	0.0	4.0	0.0
tomato	2.4	0.0	4.0	1.0	2.7	0.0	0.0
lettuce	1.1	2.4	0.8	4.3	1.9	4.4	0.0
asparagus	0.6	0.0	0.3	0.0	3.1	0.0	0.0
potato	0.7	1.7	0.6	2.6	2.2	6.2	0.0
wheat	1.3	1.2	0.0	0.0	0.0	3.2	5.1
barley	0.1	2.0	0.0	1.4	0.0	1.9	6.3

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
              "potato", "wheat", "barley"]
farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening",
           "Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."]
harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
                    [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
                    [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
                    [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0],
                    [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
                    [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
                    [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])
fig, ax = plt.subplots()
im = ax.imshow(harvest)
# We want to show all ticks...
ax.set_xticks(np.arange(len(farmers)))
ax.set_yticks(np.arange(len(vegetables)))
# ... and label them with the respective list entries
ax.set_xticklabels(farmers)
ax.set_yticklabels(vegetables)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")
# Loop over data dimensions and create text annotations.
for i in range(len(vegetables)):
    for j in range(len(farmers)):
        text = ax.text(j, i, harvest[i, j],
                       ha="center", va="center", color="w")
ax.set_title("Harvest of local farmers (in tons/year)")
fig.tight_layout()
plt.show()

Spidermap

import matplotlib.pyplot as plt
import numpy as np
vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
              "potato", "wheat", "barley"]
vegetables = [*vegetables, vegetables[0]]
FarmerJoe = [3, 2.4, 2.5, 3.9, 3, 4.0, 3]
UplandBros = [2.4, 3.0, 4.0, 2.0, 2.7, 3.0, 2.4]
SmithG = [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 1.1]
FarmerJoe = [*FarmerJoe, FarmerJoe[0]]
UplandBros = [*UplandBros, UplandBros[0]]
SmithG = [*SmithG, SmithG[0]]
label_loc = np.linspace(start=0, stop=2 * np.pi, num=len(FarmerJoe))
plt.figure(figsize=(8, 8))
plt.subplot(polar=True)
plt.plot(label_loc, FarmerJoe, label='FarmerJoe')
plt.plot(label_loc, UplandBros, label='UplandBros')
plt.plot(label_loc, SmithG, label='SmithG')
plt.title('Comparison of Output by Farmers', size=20, y=1.05)
lines, labels = plt.thetagrids(np.degrees(label_loc), labels=vegetables)
plt.legend()
plt.show()

Conclusion

As the saying goes, one never know something until one tries it. Even though there are countless tutorials and other informative resources related to python modules on the web, until one actually runs the code and inspect the meaning behind each line of code, one cannot ascertain whether one really knows what is going on. Also, if one does not make a record of what one learned, he or she will inevitably end up forgetting it. In this post, I decided to create my own personal reference guide for the pandas and numpy library functions. Even though there are better, more informative guides on the web, I believe that reviewing past resources to re-learn, reuse, or reconfigure code is better than borrowing random code on Stack Overflow that one does not understand. In light of the claim, I should probably remind myself occasionally to organize systematically what I learned about python modules. This is the end of the post, and thank for reading.

Twitter Facebook LinkedIn

Junhyung Park

Continued: DataFrame & Matplotlib

Identifying Values: Columns, Rows

Column Names, Values: `.columns`

Row Names, Values: `.index` and `.iloc[]`

Value At Row X, Column Y: `.at[]`

Checking Data Type: `.dtypes`

Checking Data Values: `.unique()`

Location of Specific Values: `.loc[]`

Matplotlib

Plotting Lists: `.plot()`

Plotting on Same Graph: `.plot(x,y1,x,y2 ...)`

Subplot: `.subplot()`

Adding Legend: `.legend`

Style of Lines: `(Use parameters)`

Radar Charts

Heatmap

Spidermap

Conclusion

You May Also Enjoy

Thoughts on OCD (Obsessive-Compulsive Disorder)

Episode with Gastritis

Reflections Concerning Biomedical Research

Reflections on 2022-2024

Junhyung Park

Identifying Values: Columns, Rows

Column Names, Values: .columns

Row Names, Values: .index and .iloc[]

Value At Row X, Column Y: .at[]

Checking Data Type: .dtypes

Checking Data Values: .unique()

Location of Specific Values: .loc[]

Matplotlib

Plotting Lists: .plot()

Plotting on Same Graph: .plot(x,y1,x,y2 ...)

Subplot: .subplot()

Adding Legend: .legend

Style of Lines: (Use parameters)

Radar Charts

Heatmap

Spidermap

Conclusion

You May Also Enjoy

Thoughts on OCD (Obsessive-Compulsive Disorder)

Episode with Gastritis

Reflections Concerning Biomedical Research

Reflections on 2022-2024

Column Names, Values: `.columns`

Row Names, Values: `.index` and `.iloc[]`

Value At Row X, Column Y: `.at[]`

Checking Data Type: `.dtypes`

Checking Data Values: `.unique()`

Location of Specific Values: `.loc[]`

Plotting Lists: `.plot()`

Plotting on Same Graph: `.plot(x,y1,x,y2 ...)`

Subplot: `.subplot()`

Adding Legend: `.legend`

Style of Lines: `(Use parameters)`