Data Structures, Memory Management and Algorithms in C #

Vachagan Mirzoian
14 min readDec 14, 2021

--

The work of any program presupposes a change with the data stored in memory, that is, in the case of specific input data, receiving specific output data. Of course, the contents of the memory must always be changed as expected. Although the output value should always be the same as a result of changes in memory, there is still an academic debate about the sequence of actions to take with different parts of memory. This is otherwise called the program’s deterministic work. It turns out that memory, data գործող actions with them are of fundamental importance. Mathematics և programming languages ​​define the appropriate toolkit for all this. As a result, in order to work with information, we first conditionally separate 2 groups, which follow each other. They are:

1 ․ The interface or theoretical definitions that indicate the types of data to be stored, the possible actions to be taken with them և problems.

2 ․ The data structure or the practical representation of the interface that defines the way data is stored, represented, and problem-solving algorithms.

Interfaces, in their turn, have 2 main groups (page 86). They are:

1 ․ Set, whose elements must be unique, different from each other, for example {a, b, a}, in which there are only 2 types of elements, so there is no need to have repetitions. The series {a, b, c } and {a, c, b} are the same, that is, the arrangement of the elements does not matter և the series are unordered. In C#, sets are generally sorted, that is, sorted in ascending order. Computer science defined the Hash tree for this group. A more advanced version is the Hash table or Hash map structure.

2․ «Sequence» or list, a sequence in which the elements can be repeated, but at the same time we will have different collections, for example: (a, b, c) and (a, c, b) sequences are different and have 3 elements, that is, ordered.

In Static Sequence, the number of elements is constant, but the value of a particular element can change. A typical example is an array, the value of an arbitrary member of which can be read or changed at any time և randomly. The RAM, “RAM”, works on this principle. In the case of collections using indexes, the items are stored in memory directly next to each other.

In dynamic sequence you can get the value of the elements, and add a new element in any part as well, that is, change the length. A typical example is the Linked List. In the case of collections using reference, the elements are stored to different pieces of memory. This has a negative effect on speed of later readings of values.

The newly added item can be physically registered after the others, and only the pointers will change their value to show another member. Deleting an element shortens the list on the same principle. In the case of a dynamic set, it is difficult to apply an arbitrary element, because before reaching that element, you have to apply the first element and read all the references one by one.

It must be understood that these are just pure concepts at first, so each language sets its own rules of the game. Hence, any collection has its advantages and disadvantages. It is not possible to change the length of the array but its content can be obtained quickly in Θ (1) time. In the case of the list, the opposite is true: the length can be changed, but it is difficult to get the content requiring Θ (n) time. There is also the concept of Dynamic Array, implemented as ArrayList in C#, which is in the range of these two, any element can be read in Θ (1), but can only be added from the end. The table shows the speed of possible operations with structures.

With the practical implementation of the data structure, it becomes clear that a pointer can be used in explicit or implicit manner, especially when we needs to be able to change its length. A pointer was explicitly used in the following code.

Each time different parts of the memory are allocated, after finishing the work program have to clear the memory with the help of Garbage Collector. This is a non-deterministic process, which is especially evident when having a Weak reference. It assumes that the object can be used, but can also be erased from memory, which, simply put, implies the existence of coincidences. Some languages, including C # and CLR, manage memory automatically. We will get acquainted with such an example in the next part. The result is as follows:

At the physical level, the data can be recorded sequentially, but according to the graphical representation, the data structures can be Linear and Non-Linear. Array, Stack, Queue, List, Linked List concepts and their modifications are linear. List is based on Array, Linked List with its variation Doubly Linked List work using link references. Graph and Tree are data structures closely related with the ideology of Hash Table, Dictionary and other structures whose elements are characterized by being different from zero and having a unique value, i.e. Sets. The Binary Tree is made up of cells called “Nodes”, each of which has a maximum 2 sub-cells. In the case of Binary Search Tree left sub-node is smaller than the right one. Binary AVL Search Tree has also a balance factor, which is the height difference of left and right subtypes. It is clear that in the case of trees, duplication of the element is not encouraged, as by definition the cells can be arranged in ascending order, as in the case of the Binary Search Tree.

It is necessary to distinguish between the concepts of “Array” and “Data Structure”. To better understand, let’s look at the work of the following code.

With “Person[]” and “int[]” commands, we declare an array of Person and Int types whose elements are accessed by indexing when we type “ arrayOfPerson[0] = new Person() { };” and “arrayOfInt[1] = 10;”.

In the same manner we create “LinkedList<Person>” and “LinkedList<int>” collections, whose “listOfPerson.AddFirst(new Person { });;” and “listOfInt.AddFirst(9);” lines add a new element.

Using the “LinkedList<Person>[]” or “LinkedList<int>[]” command we create an array of collections whose elements are linked list collections that are initially null referenced so before adding an item to the list, “arrayOfLinkedListOfPerson[0] = new LinkedList<Person>();” command must be used. The instruction is to allocate space from memory. Then writing the “arrayOfLinkedListOfPerson[0].AddFirst(new Person() { });” command we add a value to the first element of the array, to the list.

With the “LinkedList<Person[]>” and “LinkedList<int[]>” command, we announce a list containing Person[] և int[] arrays. The “linkedListOfArrayOfPerson.AddLast(new Person[]{new Person { }, new Person { } });” and “linkedListOfArrayOfInt.AddLast(new int[] { 1,1});” commands add arrays to the lists. Each element of the array can be either of Person [] type containing name, surname and e-mail address, or be of type of int [] and contain integers.

In C#, the System.Collections namespace contains definitions for ArrayList, BitArray, Hashtable, Queue, SortedList and Stack, the System.Collections.Specialized namespace is for HybridDictionary, ListDictionary, StringCollection and BitVector32, and the System.Collections.Generic namespace is for Dictionary<TKey ,TValue>, LinkedList <T>, List <T>, Queue <T>, SortedDictionary <TKey, TValue>, SortedSet <T> and Stack <T> types (Pro C # 9 with .NET 5, Foundational Principles and Practices in Programming, Page 371).

All ArrayList elements can be accessed equally and arbitrarily with Random Access. It can contains different types of data at the same time. Because it is an array, it allows its contents to be indexed at time Θ (1) by indexing. A new item is added from the end, and all the contents are stored in memory sequentially. After consuming the allocated memory volume, all the elements of the array are copied to another, larger section. This process overloads the memory.

An example of a data structure using a pointer is LinkedList. Each Node can be physically positioned away from others. The cell contains 2 pieces, one for the value and the other for the next cell. It takes time Θ (n) to get the value, that is, 198 steps are taken to get the 198th cell. Deleting or adding the first and last element takes time Θ (1), and deleting the remaining elements takes time Θ (n) + Θ (1). The Doubly linked list cell contains 3 pieces for the original value, to refer to the next and previous cells, so it requires more memory. The first and last elements are available immediately, at time O (1). As cells do not have to be stored contiguously . For this reason, registering a new cell is quick, but getting a specific cell is relatively slow, as program has to read all the cells that existed before. Having a duplicate link solves security issues, because if a singly LinkedList cell is damaged, the connection with the next one is lost, and in case of duplicate links, we have a “Plan B”. In the case of Unrolled Linked List, each cell can have more than one original value. Let’s write the following code.

In the intLinkedList chain, the number 40 is entered, followed by 41, up to 50. From 40, which is the end of the chain, 30, 31, etc. are added. As a result of the operation of this code, the following is printed:

List resembles an array, so it allows its specific elements to be indexed. It is implemented based on LinkedList or Dynamic Array.

A new number is added to the list of randomly generated numbers with Insert (), deleted with RemoveAt (), and inverted with Reverse (). The result is as follows:

The Stack is like a pile of stones, so it’s clear that the top is available first. It works on the principle of “LIFO” or “Last in, First Out”, the action at a specific moment refers to the last added element.

The Push and Pop commands apply only to the top. Stack’s underlying type is LinkedList or array. The programmer seems to be dealing with a list or an array, but he is limited in his actions and can only add value on one side. The stack may have a limited volume if implemented on an array. As an abstraction, Stack refers to the principles of allocating memory on a hardware level as well. A memory location is allocated for each thread. This memory is used to store fixed length variables — “Value types”. The code is as follows:

The numbers are added in increasing order, but since the Stack operates on a LIFO or Last in, First Out principle, the action performed at a particular moment refers to the last element added.

Queue is a collection in which data is added from one side and operations are performed on the opposite side. It is like a common queue of people and works on the principle of “FIFO” or “First In, First Out”.

The Doubly linked list is the underlying type, because in that case the adding and removing the value at the endpoints is done in O(1) time, which is very effective. Double-ended queue is a more general abstraction, which allows to perform the same operation both at the end and at the beginning. That is, Queue and Stack are special cases of Double-ended queue. Another case is the Priority queue, which in its concept is similar to List and can be implemented with an unsorted array. Double-ended priority queue allows to easily delete the largest or smallest elements, as it has pre-set quality settings. Implemented on the basis of a Self-balancing binary search tree

As a result of code work we have the following result.

There are data structures whose elements cannot be duplicated or null. These are the collections operating on the principle of “Key — value”. The main abstraction of this group is the Associative array, which is performed with a hash table or binary trees. Hashing is the logical method for dealing with a structure, transforming arbitrary sized keys into an array of fixed size indices. That is, the method converts the key field to a “Hash code”, which in its turn is used as an index. In the case of hash structures, when adding a new element, the existing elements are not copied from one part of memory to another. Being in a contiguous memory locations is not required, unlike arrays, which do defragmentation when needed. Combining Hashing and encryption a Blockchain is achieved. The practical implementations of the Associative array are Dictionary, SortedDictionary, and SortedList. SortedSet contains unique values ​​as well but does not have a key. Works as follows:

In the declared Dictionary, Add () inserts integers and Person type information. The Remove(2) command deletes the line whose key is 2. As a result, we have the following.

SortedDictionary is another type of data structure that operates on a key-value basis. But in this case the elements are arranged in ascending order of the key, although the elements are added irregularly. In SortedSet, items are also sorted in ascending order, regardless of the value entering order.

As a result, we have rows of keys in ascending order.

SortedList and SortedDictionary differ only in a performance. In both cases, receiving an element takes time O (log n). To add or remove a value in SortedDictionary is O (log n), in SortedList it is usually O (n). There is a performance code comparison for performance of adding, removing, and finding.

As code shows the performance of the same instructions depends on the hardware architecture of the computer and the operating system. For this reason, determining the speed of the algorithm by time or by the number of processor instructions and ticks is not a wise option. Therefore, the efficiency of algorithms is measured by the number of steps performed for output value. The following table shows the speed grades for operations performed on data structures․

For example, finding the largest or smallest element in a sorted array is done in one step, as the values are already sorted in ascending or descending order. In the case of unsorted arrays, all values must be checked, i.e. 500 comparisons will be performed on a set containing n = 500 elements. It is denoted as O(n)

The most commonly used form of nonlinear data structures is the Self-balancing binary search tree. The smaller element is kept on the left, the biger one is kept on the right. In this example we add the number 1 to the original structure. We compare the new value with the root, which is 25, then with 20 and so on. Then added number 4 replaces with 1. 8 injects between 9 և 4. The numbers 45 and 42 cause a replacement of nodes.

In this example, the height of the tree is 4 points. The maximum number of elements of binary tree having a height of 7 is 72 or 128. 7 steps are required to find the position of 1,, which corresponds to the equation “Log2 (128) = 7”.

Graph is also a collection of interconnected cells, the vertices. It is very convenient for modeling computer networks and logistics.

An example of a graph are the objects used in the program, which are stored in memory, linked to each other irregularly. When the process starts, the CLR allocates space for all objects in the Managed heap each time when it encounters the “new” keyword. When the virtual memory is exhausted, memory must be freed. During this time, the CLR freezes all the threads in the process and starts scanning objects.

Object generations are used to improve Garbage Collector work. The newly created object is of generation 0 and is more likely to be cleaned. If not cleaned, all objects are defragmented and copied in memory contiguously, becoming generation 1. Surviving Garbage Collector one more time, the objects become generation 2. Objects with a volume of more than 85,000 bytes can be only generation 2, but they are called generation 3 sometimes.

Object have 3 statuses during the process:

1 ․ Roots that are directly accessible to the application, are the head of references for objects, such as global variables, although not supported by C #, local variables, static fields and Finalization queue, stored in Managed heap.

2. Live object, active objects still are accessible by root;

3. Dead objects, inaccessible objects, not used by the program.

In the following example, we have objects containing links to each other, for each of which the number of inbound links is stored.

One of the objects is not used at some point. As a result, the variable indicating the number of links becomes 0.

GarbageCollector clears the memory locations that are occupied by unused objects.

But this approach does not work for objects with cyclic references, as each object will always be called by someone else. One of the algorithms that overcome this problem is called “Mark and Sweep”. In these cases, a Queue is formed for each vertex, which indicates its neighbors, that is, the elements to which it refers.

The neighbors of R are B ն C. B has 1 neighbor, C, which is already registered. C’s neighbors are D և E. D does not have neighbors. Adjacent to E is F, and to F is G. G no longer has links, so GC can delete the objects that do not contain links from the root. This algorithm does not perform fragmentation, objects registered in different parts of memory continue to be scattered. In the case of the “Stop and Copy” algorithm, it is the objects that are registered in Queue, not the links. This algorithm may work slowly, but it ensures that the later data acquisition from collection is done quickly.

In high level, every data element of all kinds of each data structures is directly or indirectly unique. Even in the case of arrays, there is a unique value, the index. The main note is that the index is implicitly managed by programming language and its compiler, and in the case of explicitly defined keys, it is up to the programmer to prevent duplication.

The final version of code can be found on GitHub

AdvancedConcepts and DataStructures

--

--

Vachagan Mirzoian
Vachagan Mirzoian

Written by Vachagan Mirzoian

Acumatica ERP Developer, Biz-Tech Services, Inc.

No responses yet