Differentiation of Python Iterators

If you are interested to learn about the Python PIP

What is the Python Iterators?

An iterator is an object that contains a countable number of values. An iterator is an object that can be iterated upon, meaning that you can traverse through all the values. Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods __iter__() and __next__(). In Python, an iterator is an object which implements the iterator protocol, which means it consists of the methods such as __iter__() and __next__(). An iterator is an iterable object with a state so it remembers where it is during iteration

Iterator vs Iterable

Lists, tuples, dictionaries, and sets are all iterable objects. They are iterable containers which you can get an iterator from. All these objects have a iter() method which is used to get an iterator:

Example

Return an iterator from a tuple, and print each value:

mytuple = ("apple", "banana", "cherry")<br>myit = iter(mytuple) print(next(myit))<br>print(next(myit))<br>print(next(myit))<br>

Even strings are iterable objects, and can return an iterator:

Example

Strings are also iterable objects, containing a sequence of characters:

mystr = "banana"<br>myit = iter(mystr) print(next(myit))
<br>print(next(myit))<br>print(next(myit))<br>print(next(myit))
<br>print(next(myit))<br>print(next(myit))<br>

Looping Through an Iterator

We can also use a for loop to iterate through an iterable object:

Example

Iterate the values of a tuple:

mytuple = ("apple", "banana", "cherry") 
for x in mytuple:<br>  print(x)

Example

Iterate the characters of a string:

mystr = "banana" for x in mystr:<br>  print(x)

The for loop actually creates an iterator object and executes the next() method for each loop.

Create an Iterator

To create an object/class as an iterator you have to implement the methods __iter__() and __next__() to your object. As you have learned in the Python Classes/Objects chapter, all classes have a function called __init__(), which allows you to do some initializing when the object is being created. The __iter__() method acts similar, you can do operations (initializing etc.), but must always return the iterator object itself. The __next__() method also allows you to do operations, and must return the next item in the sequence.

Example

Create an iterator that returns numbers, starting with 1, and each sequence will increase by one (returning 1,2,3,4,5 etc.):

class MyNumbers:<br>  def __iter__(self):<br>    self.a = 1<br>    return self myclass = MyNumbers()
myiter = iter(myclass)
print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))

StopIteration

The example above would continue forever if you had enough next() statements, or if it was used in a for loop. To prevent the iteration to go on forever, we can use the StopIteration statement. In the __next__() method, we can add a terminating condition to raise an error if the iteration is done a specified number of times:

Example

Stop after 20 iterations:

class MyNumbers:<br>  def __iter__(self):<br>    self.a = 1<br>  
  return self   def __next__(self):<br>    if self.a &lt;= 20:<br>     
 x = self.a<br>      self.a += 1<br>      return x<br>    else:<br>     
 raise StopIteration myclass = MyNumbers()<br>myiter = iter(myclass) for x in myiter:<br>  print(x)

Basics of Python iteration

As a data scientist working in Python, you are guaranteed to come across various data-types stored in a range of containers such as lists, tuples and dictionaries. In Python, these containers are all iterable objects, meaning we can obtain an iterator from them (as in C++, Python strings are also iterable objects). Operating on these iterable data containers in a Python for loop requires an iterator method and execution of a next method to create an iterator and advance[1]. These built-in methods are shown below where iter(name) returns an iterator from a list and next(it) allows us to advance through the iterator and print out each element.

Python Generators and Iterators in 2 Minutes for Data Science Beginners -
name = ["Ciaran", "Cooney"]<br>it = iter(name)
<br>print(next(it))<br>print(next(it))#output<br>Ciaran<br>Cooney

Python comes with several built-in functions such as zip and map which facilitate iteration over data containers. These are very useful and time-saving tools once you have developed an intuition for when and how to use them. The zip function effectively works by using iter() and next() to to call and advance through each of the input arguments before returning an iterator which can return tuple containing input data with common indices.

a = zip([1,2,3], ['a','b','c'])<br>print(list(a))#output<br>[(1, 'a'), (2, 'b'), (3, 'c')]

Map applies a function to each element in an interable before advancing to the next. Here, iter() is called on the second argument and the input function is applied to the corresponding element. Next() is then called until the iterator is exhausted.

b = map(len, ['hello', 'world'])<br>print(list(b))

Building your own iterators

At some point in the future you may wish to make a class or object of your own into an iterator, perhaps to enhance the performance of a data processing pipeline. To do this, you will need to implement __iter__() and __next__() methods. The __iter__() method returns the iterator object and the __next__() method facilitates operations (here simply returning the element) at each iteration. It is important to be careful not to create an iterator that will continue to advance infinitely so we use an if-else statement and raise StopIteration when the iterator has been exhausted.

class MyClass():<br>    <br>    def __init__(self, container):
<br>        self.container = container<br>    <br>    def __iter__(self):
<br>        self.count = 0<br>      
  return self<br>    <br>    def __next__(self):<br>        if self.count &lt; len(self.container):<br>            x = self.container[self.count]<br>  
   self.count += 1<br>      
   return x<br>        else:<br>    
   raise StopIterationmyclass = MyClass(["Hello", "my", "name", "is", "Ciaran"])
<br>myiter = iter(myclass)for x in myiter:<br>    print(x)#output<br>Hello<br>my<br>name <br>is <br>Ciaran

Itertools

Although it’s always good to know what’s going on under the hood, the truth is that more often than not your interaction with iterators will be through the built-in functions and the itertools package. Itertools has so many great iterator tools, so it is well worth your time to have a rummage through the documentation to see what catches your eye.

One function I like is dropwhile() which allows you to make an iterator that drops elements from a iterable for as long as predicate is true, after which it returns all elements. Groupby() is a common iterator algorithm which returns consecutive keys and groups from the iterable. Another useful function it itertools is permutations(). As you might have guessed, this one returns permutations of the elements contained within the input iterable. The length of permutations can be constrained by a second argument, r (see code below), otherwise permuations will be the length of the input iterable. I have coded up some examples of using these functions:

print(list(dropwhile(lambda x: x&lt;=3, [1,2,3,4,5,6,7,8,9,3])))<br>
#output: [4, 5, 6, 7, 8, 9, 3]print(list(list((list(g), k)) for k, g in groupby([1,2,2,2,2,3,4,4,4,4,5,5,2,1,1,1,1])))<br>
#output: [[[1], 1], [[2, 2, 2, 2], 2], [[3], 3], [[4, 4, 4, 4], 4], [[5, 5], 5], [[2], 2], [[1, 1, 1, 1], 1]]print(list(permutations([1,2,3])))<br>[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]print(list(permutations([1,2,3], 2)))<br>[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

Iterator Algebra

The term “iterator algebra” is used in the itertools documentation to describe the general concept of combining iterator tools to improve overall code efficiency and performance [2]. Combining itertools functions can take a bit of thought at first and can get quite advance pretty quickly, but for this post, I am going to show you one simple example of how using itertools can speech up processing time.

Let’s consider a simple example where we want to take two lists containing positive integers, determine all possible combinations of elements across lists (not within lists) and return the sum of each combination. Below, I have implemented a typical function with a couple of for loops to run over the lists and perform the summing operations.

a = [1,2,3]<br>b = [4,5,6]<br>def sum_combinations(a, b):<br>  
  combinations, results = [], []<br>    for i in a:<br>   
     for j in b:<br>           
 combinations.append(tuple((i,j)))<br> 
           results.append(sum((i,j)))<br>  
  return combinations, resultscombs, res = sum_combinations(a,b)<br>print(combs, res)
#output<br>[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]<br>[5, 6, 7, 6, 7, 8, 7, 8, 9]

This is fine for the 3-element lists I used in this example. But what happens if we expand the inputs to contain 10000 integers each? To test this I imported the time module to see how long the function would run for on my admittedly less than special laptop:

import time<br>a = np.random.randint(5, size=10000)<br>b = np.random.randint(5, size=10000)
<br>start = time.time()<br>combs, res = sum_combinations(a,b)<br>stop = time.time()<br>print(f"time: 
{stop-start}")#output:<br>time: 108.07000184059143

Okay, 108s seems like a fairly long time to have to wait for some basic operations. Fortunately, we have an alternative: iterator algebra!

Here I use the itertools function product() along with the map function mentioned above. This function gives us a cartesian product of the input iterables, kind of like using nested for loops. We then use map to apply the sum function as we iterate through the inputs.

start = time.time()<br>res_1 = list(map(sum,itertools.product(a,b, repeat=1)))
<br>stop = time.time()<br>print(f"time: {stop-start}")#output: time: 34.44488835334778

Differentiation of Python Iterators
Show Buttons
Hide Buttons