In this article I cover data structures in Python. There are quite a few data structures available in Python, lets start with builtins first.
List : A List holds an ordered collection of items. Lets create a list of fruits in a basket and access its elements.
# Create a list
basket = [‘apple’, ‘banana’, ‘mango’]
# Access its elements
for fruit in basket:
print(fruit)
#Output
apple
banana
mango
Some of the common operations we can perform on a list are:
append () – Adds an element to the end of the list
basket.append(‘guava’)
print(basket)
[‘apple’, ‘banana’, ‘mango’, ‘guava’]
extend () – Adds all elements of a list to some other list
another_basket = [‘watermelon’, ‘papaya’, ‘pineapple’]
basket.extend(another_basket)
print(basket)
[‘apple’, ‘banana’, ‘mango’, ‘guava’, ‘watermelon’, ‘papaya’, ‘pineapple’]
pop () – Eliminates and returns last element from the list
lastfruit = basket.pop()
print(lastfruit)
Output: pineapple
remove () – Eliminates an element from the list
basket.remove(‘mango’)
print(basket)
[‘apple’, ‘banana’, ‘guava’, ‘watermelon’, ‘papaya’] # mango removed from list
index () – Returns the index of first occurrence of an element in list
print(basket)
print(basket.index(‘watermelon’))
[‘apple’, ‘banana’, ‘guava’, ‘watermelon’, ‘papaya’]
3 # index starts with number 0, watermelon is at index 3
Tuple:Another data-structure is tuple, which is also an ordered collection of elements, with a difference that tuple is immutable, i.e once created, it can not be modified, i.e. we can not add / remove elements from tuple.
# Create a tuple
base_colors = (‘red’, ‘green’, ‘blue’)
print(base_colors)
(‘red’, ‘green’, ‘blue’)
Common operations of tuple:
count() : counts no. of times an element is present in tuple
(5, 5, 6, 7, 7, 5, 9).count(5)
output: 3 # element 5 appears 3 times in tuple
index() : index of first appearance element in tuple
(7, 5, 3, 4, 5, 3, 2).index(5)
Output: 1 # element 5 appears first at index 1
tuple unpacking: tuple unpacking allows extraction of elements of tuple and assign to variables
x, y = (0, 1)
print(x);
0
print(y);
1
Dictionary : My favorite, a dictionary stores data in the form of key-value pairs. Keys need to be unique (thats why they are keys).
# Create a dictionary
favorites = {‘day’: ‘Sunday’,
‘number’: 9,
‘season’: ‘spring’
}
print(favorites)
{‘day’: ‘Sunday’, ‘number’: 9, ‘season’: ‘spring’}
# Extract value of a key
print(favorites[‘day’])
Sunday
# Adding an element to dictionary
favorites[‘movie’] = ‘Sholay’
print(favorites)
{‘day’: ‘Sunday’, ‘number’: 9, ‘season’: ‘spring’, ‘movie’: ‘Sholay’}
Set : A set is an un-ordered collection of elements and a set has no duplicate elements.
# Create a set
BRIC = {‘Brazil’, ‘Russia’}
print(BRIC)
{‘Brazil’, ‘Russia’}
# adding elements to a set
BRIC.add(‘India’)
BRIC.add(‘Çhina’)
print(BRIC)
{‘Brazil’, ‘Russia’, ‘India’, ‘Çhina’}
Set Operations:
Union :
{2,3,4,5}.union({4,5,6})
Output: {2, 3, 4, 5, 6}
Intersection:
{2,3,4,5}.intersection({4,5,6})
Output: {4, 5}
Difference:
{2,3,4,5}.difference({4,5,6})
Output: {2, 3}
Numpy Array:
A numpy array is a collection of homogeneous elements.
# Create a numpy array
a = np.array([1,2,3])
print(a)
[1 2 3]
# no. of dimensions
print(a.ndim) ;
1
# Create a numpy array using np.arange()
a = np.array(np.arange(start=11, stop=23))
print(a)
[11 12 13 14 15 16 17 18 19 20 21 22]
# Convert 1 dim array to 2 dim array using reshape()
a = a.reshape((3,4))
print(a)
[[11 12 13 14]
[15 16 17 18]
[19 20 21 22]]
Pandas Series: Pandas Series is a one-dimensional labeled array. The axis labels are called index. Pandas Series is like a column in an excel sheet.
# Create Series
s = pd.Series([‘Ind’, ‘Aus’, ‘NZ’])
print(s)
0 Ind
1 Aus
2 NZ
# default index are numeric, starting with 0
# Lets create a series with custom index values
s = pd.Series(data = [‘Rohit’, ‘Dhoni’], index = [‘Mumbai Indians’, ‘Chennai Super Kings’])
print(s)
Mumbai Indians Rohit
Chennai Super Kings Dhoni
# access element based on index value
print(s[‘Mumbai Indians’])
Rohit
Pandas DataFrame : Pandas dataframe is tabular data structure, synonymous to tables we create in databases.
# Lets create a data frame from dictionary
df = pd.DataFrame({‘Team’: [‘MI’, ‘CSK’, ‘DD’, ‘KKR’],
‘Captain’: [‘Rohit’, ‘Mahendra’, ‘Ravindra’, ‘Gautam’]})
print(df)
Team Captain
0 MI Rohit
1 CSK Mahendra
2 DD Ravindra
3 KKR Gautam
DataFrame itself deserves a full length post – to be covered later