Traversing Galaxies through Enumerable, Enumerator and Yield Return (C#)

Introduction

What’s going on, you may be wondering by reading title of this post.

Well it is that I am a little bit excited by this morning news of James Web Telescope reaching the L2 orbit. Or it may be that I am binge watching “Lost in Space” series on Netflix. But don’t worry, we will see a coding example of involving galaxies anyway.

Another reason to bring galaxies to the conversation is to talk about big amount of data. Based on current estimates on Nasa website, there are more than 200 billion galaxies in the observable universe. If we just wanna capture their names (once we named all of those) its a big list.

In this post, we will see, how to deal with data structure to work with big amount of data.

All of us have used IEnumerable interface based data structures in our daily C# codes. They are used by collections like Arrays and Lists to standardize methods of looping over them and performing actions, like filtering and selecting with LINQ statements.

List<T> is one of such common data structure.

List Example Code

Lets see a very simple example of working with List in C#.

This example is slightly modified based from Microsoft docs on this link. After reading this post, I will encourage to check the mentioned post on microsoft docs for more insights.

Back to our example code. This is a very common type of code in many C# applications involving lists. List type collections are perfectly fine when dealing with small amount of data. However, such collections can bring memory overhead when dealing with large amount of data.

Later in the post, we’ll see how to process large amount of data in our application by utilizing efficient iteration mechanism and Enumerables.

Enumerable and Enumerators

The term “Enumerable” defines an object that is meant to be iterated over, passing over each element once, in order.

All the IEnumerable interface requires is a method, GetEnumerator(), which returns an IEnumerator. So what do Enumerators do?

Enumerator, is simply a handle that is used when we are enumerating. This handle points at the current element, and can be moved to the point to the next.

Iterators are called Enumerators in C# and they implement IEnumerator interface. Basically this interface provide a MoveNext() method which goes for the next element and checks if the end of collection is reached.

It also has a Current property that allows access to current pointed element.

Reset() method allow you to move to enumerator back to its initial position.

Before the 1st element can be retrieved, MoveNext() method is needed as enumator points before the 1st element.

Enumerable is simply a class or datatype which implements some basic protocol that allows enumerator to traverse its items.

Simple Return Statement

Lets see another example involving List

This code is simple as well, here we are building and returning a list from a function. This code seems fine at first. However, problem is that if we increase the max argument, the list can grow very big because we return the whole list at once and do it while holding max even numbers.

Lets increase the number

If you run this code, you will see a delay in the initial output to console as well memory spike. Following picture shows the memory before and during the test (this can take up to 3GB of memory)

So, how can we optimize this processing, Enter Yield Return:

The Yield Return

Instead of building up the whole list and returning it at-once, we can use a yield return statement to return each element one at a time.

The yield return statement tells compiler that function is only allowed to return one element per call and it will be paused until another element is requested. When there are no more elements needed, it will simply get interrupted and not run to an end (i.e. exhausted).

check this link on Microsoft docs which provides more details about this statement.

Now, lets see an example which is utilizing Yield Return:

If you execute this code, you will notice that Console output is immediately responsive and there is very less memory consumption (approx. 10 Mb) because we are not building a big in-memory list to output.

Enumerable datatype provides you with means to create apps that work with large amounts of data to be kept at bay, prevent them from eating up all of your system resources.

Take / Skip and other LINQ Operator

Following code shows the usage of Standard LINQ operations with our sample code:

Summary

In this post, we learned about Enumerable collections and how those are traversed in general. We learned some issues with collection data structure when working with big amount of data. We then saw the usage of Yield return statement, which is very helpful in working with large amount of data without exhausting system resources.

The source code for the demos is available on this git repository.

Let me know if you have some questions or comments. Till next time, Happy Coding.

My Recent Books