data_structures

#define data_structures: \ I--------------------------------------------------------------------------------------------------------------------------\ I--------------------------------------------------------------------------------------------------------------------------\ I--------------------------------------------------------------------------------------------------------------------------\ I /$$$$$$$ /$$ /$$$$$$ /$$ /$$ \ I | $$__ $$ | $$ /$$__ $$ | $$ | $$ \ I | $$ \ $$ /$$$$$$ /$$$$$$ /$$$$$$ | $$ \__//$$$$$$ /$$$$$$ /$$ /$$ /$$$$$$$ /$$$$$$ /$$$$$$$ \ I | $$ | $$ |____ $$|_ $$_/ |____ $$ | $$$$$$|_ $$_/ /$$__ $$| $$ | $$ /$$_____/|_ $$_/ /$$_____/ \ I | $$ | $$ /$$$$$$$ | $$ /$$$$$$$ \____ $$ | $$ | $$ \__/| $$ | $$| $$ | $$ | $$$$$$ \ I | $$ | $$ /$$__ $$ | $$ /$$ /$$__ $$ /$$ \ $$ | $$ /$$| $$ | $$ | $$| $$ | $$ /$$\____ $$ \ I | $$$$$$$/| $$$$$$$ | $$$$/| $$$$$$$ | $$$$$$/ | $$$$/| $$ | $$$$$$/| $$$$$$$ | $$$$//$$$$$$$/ \ I |_______/ \_______/ \___/ \_______/ \______/ \___/ |__/ \______/ \_______/ \___/ |_______/ \ I--------------------------------------------------------------------------------------------------------------------------\ I--------------------------------------------------------------------------------------------------------------------------\ I--------------------------------------------------------------------------------------------------------------------------I • or "containers" — from the perspective of size a container can be static or dynamic: • awfully loosely used with the generic idea that static means fixed sized and dynamic means resizable • while this distinction makes sense from an implementation standpoint (ie. as is, can this structure be resized?) it starts bleeding when looked at as generic attribute "../Array" "../Vector" • it is generally agreed that an array is static-ly sized; a vector is a special type of array which can reallocate itself; therefor some arrays are dynamically sized which contradicts the first statement • the reallocatable subcategory array/vector trick can be played with anything Array: • a continuous block of memory holding homogeneous data • doesnt know its own size • by knowing the address of the first element all other elements become accessible by adding <int> * elem_size to it; therefor the first element equals: array head + 0*sizeof(element); this is the main reason for 0 indexing in computer science • the address of the first element is sometimes referred to as the array head ○ components • array head • allocated memory • element size • const element access time • fixed size • can be wasteful on space {name field allocated to 128 bytes, while most users will use less then 10} • easy to segfault • while many languages call their versions of lists "arrays", they often implement features defying the ABOVE description (bound checking; dynamic resizing), there by trashing the mentioned pros and cons { // example in C // @COMPILECMD gcc $@ -O0 -nostartfiles -o $* int my_array[4] = {0xd, 0xe, 0xa, 0xd}; // notice how i am deliberatly spelling "dead", so you can regocnize the pattern in the object dump _start(){ return my_array[2]; } // disassembly $ objdump -s -j .data -d -j .text e e: file format elf64-x86-64 Contents of section .text: 1000 f30f1efa 554889e5 8b05fa2f 00005dc3 ....UH...../..]. Contents of section .data: 4000 0d000000 0e000000 0a000000 0d000000 ................ Disassembly of section .text: 0000000000001000 <_start>: 1000: f3 0f 1e fa endbr64 1004: 55 push %rbp 1005: 48 89 e5 mov %rsp,%rbp 1008: 8b 05 fa 2f 00 00 mov 0x2ffa(%rip),%eax # 4008 <my_array+0x8> // +8, which is 2*sizeof(int) 100e: 5d pop %rbp 100f: c3 ret Disassembly of section .data: 0000000000004000 <my_array>: 4000: 0d 00 00 00 0e 00 00 00 0a 00 00 00 0d 00 00 00 ................ } +-----------------------------------------------------------------------------------------------------------------------------------------------+ | _MEMORY_ | +-----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+ | 0xnnnn0000 | 0xnnnn0004 | 0xnnnn0008 | 0xnnnn000c | |:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:| | Byte-0 | Byte-1 | Byte-2 | Byte-3 | Byte-0 | Byte-1 | Byte-2 | Byte-3 | Byte-0 | Byte-1 | Byte-2 | Byte-3 | Byte-0 | Byte-1 | Byte-2 | Byte-3 | |:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:| | /* my_array[0] */ | /* my_array[1] */ | /* my_array[2] */ | /* my_array[3] */ | | 13 | 13 | 10 | 13 | +-----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+ Λ _ __ ___ __ | ┃ / \ ┃ / | ┃ /_ | ┃ | \ │ ━━╋━━ | \ | ━━╋━━ ^| | ━━╋━━ / /. ━━╋━━ .> | Array head ┃ \_/ ┃ |_| ┃ |___| ┃ |__/ Terminated_array: • an array where the end of data is marked with a special value • the terminating value is most often NULL ('\00') • the length doesnt have to be stored, it can always be calculated by reading sequentially and counting the distance to the termination (at the price of extra compute, of course) • the termination may lay before the last last element allocated for, there by signaling the rest of the elements are not assigned, ie. invalid • the most famous example is the C string; this is relevant because most UNIX like operating systems expect such strings for system-calls // C string example // implicitly NULL terminated string char my_c_string[4] = "gnu"; // explicitly NULL terminated string char my_c_string[4]; my_c_string[0] = 'g'; my_c_string[1] = 'n'; my_c_string[2] = 'u'; my_c_string[3] = '\00'; ○ components: • array head • allocated memory • element size • terminator • saving updating and passing around a length is not required for traversals • botching the termination will cause overruns • not knowing the lenght without traversal can range from annoying to being a performace concern +-------------------------------------------------------------------------------------------+ | _MEMORY_ | +----------------------+----------------------+----------------------+----------------------+ | 0xnnnn0000 | 0xnnnn0001 | 0xnnnn0002 | 0xnnnn0003 | |:--------------------:|:--------------------:|:--------------------:|:--------------------:| | Byte-0 | Byte-1 | Byte-2 | Byte-3 | |:--------------------:|:--------------------:|:--------------------:|:--------------------:| | /* my_c_string[0] */ | /* my_c_string[1] */ | /* my_c_string[2] */ | /* my_c_string[3] */ | | 'g' | 'n' | 'u' | '\00' | +----------------------+----------------------+----------------------+----------------------+ Λ _ __ ___ 1 TERM __ 1 | ┃ / \ ┃ / | ┃ /_ |1 ┃ | \1 │ ━━╋━━ | \ | ━━╋━━ ^| | ━━╋━━ / /.1 ━━╋━━ .> |1 Array head ┃ \_/ ┃ |_| ┃ |___|1 ┃ |__/1 this diagramm dissmisses that on a hardware level the bytes might actually be reversed Parallel_Arrays:"SoA" (Struct of Arrays) • when 2 or more arrays store related data at the same index • most often objects are used for tasks parallel arrays could be • strangely, not a single language has explicit support for them +-----------------+-----------------+-----------------+-----------------+ string array Animal_name | "Quoka" | "Okapi" | "Glyphoglossus" | "Tsuchinoko" | +-----------------+-----------------+-----------------+-----------------+ int array Strangeness_score | 1 | 1 | 3 | 10 | +-----------------+-----------------+-----------------+-----------------+ + 0 + 1 + 2 + 3 • ideal data alignment • iterating over only one "field" is better for caching • without language support, its error prone Vector: not very well defined as a general struct; the given definition is C++ based • "dynamic array" • an array capable of reallocating itself with a different size • stores how much memory is allocated and how many elements are assigned — when a new element should be stored but there is insufficient memory allocated: 1. the vector allocates a new, larger array 2. copies its data 3. appends the new element \_ swapable or can be done concurrently 3. frees its old array / • the growth on each reallocation is arbitrary • for performance its wise to grow with multiple element slots on each reallocation • the growth size might not even be const ○ components • array head • allocated memory • element size • size • allocator Stack: # a stack oriented programming language "Forth""verem"^HU • FILO (first in last out) container • dynamic sized • the most recently inserted element of the stack is called the top • only the top of the stack can be accessed • some stack implementations allow for all elements to be read however this is not required • removing the top is called pop-ing • adding an element and there by making it the new top is called push-ing Visualization: // Empty stack ┐ ┌ // We have a hole drawn from the side │ │ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ // Insertion +-----+ +--- | 3 | | +-----+ | V ┐ ┌ ┐ ┌ // Now the element with the value of 3 is on top │ │ │ │ // One can read it or take it out │ │ │ │ │ │ │ │ │ │ │ │ │ │ ... │ │ │ │ │+-----+│ │ │ │| 3 |│ │ │ │+-----+│ └───────┘ └───────┘ // ------- +-----+ +--- | 5 | | +-----+ | V ┐ ┌ ┐ ┌ // Now the element with the value of 5 is on top │ │ │ │ // One can read it or pop it (take it out), │ │ │ │ // but not cannot remove 3 unless 5 is │ │ │+-----+│ // removed too beforehand │ │ │| 5 |│ │ │ ... │+-----+│ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ // ------- +-----+ +--- | 1 | | +-----+ | V ┐ ┌ ┐+-----+┌ // The element with the value of 1 is the new top. │ │ │| 1 |│ │ │ │+-----+│ │+-----+│ │+-----+│ │| 5 |│ │| 5 |│ │+-----+│ ... │+-----+│ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ // Removal A | | | ┐+-----+┌ ┐ ┌ // The element with the value of 1 is popped. │| 1 |│ │ │ // 5 becomes the top again │+-----+│ │ │ // NOTE: no other element could have been possibly removed. │+-----+│ │+-----+│ │| 5 |│ │| 5 |│ │+-----+│ ... │+-----+│ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ // ------ A | | | ┐ | ┌ ┐ ┌ // The element with the value of 5 is popped. │ | │ │ │ // 3 becomes the top again │ | │ │ │ // NOTE: no other element could have been possibly removed. │+-----+│ │ │ │| 5 |│ │ │ │+-----+│ ... │ │ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ // Reordering // Swap 1. The top value {1} is copied out and popped Variable-1 Variable-2 Variable-1 Variable-2 ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┃ ┃ ┃ ┃ ┃ +-----+ ┃ ┃ ┃ ┃ ┃<----+ ┃ ┃ ┃ | 1 | ┃ ┃ ┃ ┃ ┃ | ┃ ┃ ┃ +-----+ ┃ ┃ ┃ ┗━━━━━━━━━┛ | ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ | ┐+-----+┌ ┐ ┌ │| 1 |│ │ │ │+-----+│ │ │ │+-----+│ │+-----+│ │| 5 |│ │| 5 |│ │+-----+│ ... │+-----+│ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ 2. The top value {5} (the previous second) is copied out and popped Variable-1 Variable-2 Variable-1 Variable-2 ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┃ +-----+ ┃ ┃ +-----+ ┃ ┃ +-----+ ┃ ┃ +-----+ ┃ ┃ | 1 | ┃ +---->┃ | 5 | ┃ ┃ | 1 | ┃ ┃ | 5 | ┃ ┃ +-----+ ┃ | ┃ +-----+ ┃ ┃ +-----+ ┃ ┃ +-----+ ┃ ┗━━━━━━━━━┛ | ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ | ┐ | ┌ ┐ ┌ │ | │ │ │ │ | │ │ │ │+-----+│ │ │ │| 5 |│ │ │ │+-----+│ ... │ │ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ 3. The old top {1} is inserted back. Variable-1 Variable-2 Variable-1 Variable-2 ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┃ +-----+ ┃ ┃ +-----+ ┃ ┃ ┃ ┃ +-----+ ┃ ┃ | 1 | ┃-----+ ┃ | 5 | ┃ ┃ ┃ ┃ | 5 | ┃ ┃ +-----+ ┃ | ┃ +-----+ ┃ ┃ ┃ ┃ +-----+ ┃ ┗━━━━━━━━━┛ | ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ V ┐ ┌ ┐ ┌ │ │ │ │ │ │ │ │ │ │ │+-----+│ │ │ │| 1 |│ │ │ ... │+-----+│ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ 4. The desired top {5} is inserted back. Variable-1 Variable-2 Variable-1 Variable-2 ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━━━━┓ ┃ ┃ ┃ +-----+ ┃ ┃ ┃ ┃ ┃ ┃ ┃ +-----┃ | 5 | ┃ ┃ ┃ ┃ ┃ ┃ ┃ | ┃ +-----+ ┃ ┃ ┃ ┃ ┃ ┗━━━━━━━━━┛ | ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━━━━┛ V ┐ ┌ ┐+-----+┌ │ │ │| 5 |│ │ │ │+-----+│ │+-----+│ │+-----+│ │| 1 |│ │| 1 |│ │+-----+│ ... │+-----+│ │+-----+│ │+-----+│ │| 3 |│ │| 3 |│ │+-----+│ │+-----+│ └───────┘ └───────┘ // The process of swapping <int> elements is similar if you think these drawings look phallic you are both childish and correct // Towers of Hanoi • ancient puzzle Rules: • there are 3 rods • the first contains <int> amount of disks • each disk is smaller in diameter than the one bellow it — Goal: • moving all disks to the last rod in the same order as they initially are — moving: • only 1, the top disk of any rod may be lifted and placed to another rod • a disk can only be placed (even temporarily) to a larger disk • solvable with arbitrary num of disks // Initial state of Towers of Hanoi with 4 disks ┌┐ ┌┐ ┌┐ ││ ││ ││ XXXX ││ ││ 888888 ││ ││ @@@@@@@@ ││ ││ ########## ││ ││ ━━━━━━┷┷━━━━━━ ━━━━━━┷┷━━━━━━ ━━━━━━┷┷━━━━━━ • the rods can be interpreted as stacks • each stack must be ordered at all times • no outer (swap) memory // ?!; solution Object: • soydevs have the irritating tendency to call every language having syntax support for objects and inheritance "object oriented" between 2 dilations; which could be an acceptable definition if only they would bother to ever apply any other similar bump sticker {"parallelization oriented"; "event oriented"} • a collection of data treated as a single entity • the definition of an object (or its blueprint if you will) is called a group • a class is a subtype of a group, but is often used interchangeably or in place of "group" out of convenience • data complying to the definition of a specific type of object is refered to as an instance • a function creating an instance of a struct is called a constructor • a function deleting an instance of a struct is called a destructor { // Defining an object type which can // be used to describe quokas // (which is the worlds cutest animal, // just so you know) // Using C++. struct Quoka { // specifying that im creating a group which is named Qouka string name; // Textual data unsigned int age; // Positive whole number bool is_male; // Binary value describing its sex }; // Now i have defined how a quoka instance must look like, // however, nothing has been brought to existence. // I must call a constructor to create a quoka. Quoka my_quoka(); // creating a quoka named my_quoka; // just C++ syntax stuff get over it // From the machines point of view, this is (roughly) equivalent to // creating the variables by hand, ie: string my_quoka_name; unsigned int my_quoka_age; bool my_quoka_is_male; // In practice, class Quoka specifies how much space to allocate // for something called a quoka and what (variable) name means what offset // A bit more visually: Quoka() { sub $20, %rsp // ((assuming)) sizeof(Quoka) == 20 } main() { -- %rbp Quoka () # { string ~ padding unsigned ~ padding bool ~ padding # } -- %rsp } // It's important to note that usually all variables are allocated // with the appropriate padding (see processor addressing) to maximize // performance, however explicit grouping of variables allow for // convenient ways to selectively optimize storage } — a named piece of data of an object is called a member variable — member variables can (usually be) referred to by the following syntax: <object><delimiter><variable> { Quoka.name // Examples of this syntax: // C // C++ // C# // Java // Javascript Quoka->name // Examples of this syntax: // C pointers // C++ pointers // PHP Quoka["name"] // Examples of this syntax: // Bash // Javascript } • a function which cannot be defined without a group is called a member function; the naming comes from the fact that most languages recommend or force such functions to be declared along with the contents of the group; this definition could be controversial, but the only one which is free from the limitations of any single syntax { // This is valid and universally accepted member function in C++ struct Quoka { /* code repetition */ string name; unsigned int age; bool is_male; /**/ void pet(){} // dummy function that doesn't actually do anything }; // C does not allow for functions to be defined inside struct definitions, // but the following is logically equivalent: /* code repetition */ struct Quoka { string name; unsigned int age; bool is_male; }; /**/ void pet(Quoka *q){} // NOTE: you must have noticed that we are passing a Quoka explicitly in this version, // however languages (including C++) tend allow implicit passing of // the object being referred to to member functions; // so while the syntax has changed they are perfectly equivalent in practice // Therefore, even tho no C programmer would ever call pet() a member function, // looking at from the greater scheme of things: it is a member function. } • the constructor and the destructor are member functions • functions which take an object as their first argument (implicitly or explicit-ly) are called methods • the destructor is a methodmember variables and member functions are collectively referred to as members +-----------------------------------------------+ | MEMBERS | | {is_male} | | {name} | | {age} | | +-------------------------------------+ | | | FUNCTIONS | | | | {constructor} | | | | +---------------------------+ | | | | | METHODS | | | | | | {destructor} | | | | | +---------------------------+ | | | | | | | +-------------------------------------+ | | | +-----------------------------------------------+ Linked_list: • a web of homogeneous elements where each element holds information regarding the location of other elements • elements are called nodes • nodes can be arbitrary scattered in memory • the location of the first node is called the list head • whether the head is stored implicitly or explicit-ly, doesnt make a difference • the location of the last node is called the list tail • the list tail is rarely stored explicit-ly — to signal the end of the list a value with this special meaning is defined: • usually const — an element pointers value can be used for the task: • sentinel value • if an element points to the sentinel value, it signals that the list is over • most often the const memory address 0x0 (almost always given a special alias {NULL, nullptr, nill}) • all traversal operations will have to compare against this value //?!; reorder? • the number of times other nodes refer to a node is called its in-degree • the number of valid node references a node holds is called its out-degree • const insertion time \_ traversal of element position not calculated in • const removal time / • fast over all const insertion at the beginning • linearly increasing access time for nodes Forward: • each node knows the location of the next one ○ definition by degree • there is exactly one node with an in-degree of 0 (the first) • there is exactly one node with an out-degree of 0 (the last) // except if cyclical termination is used; ?! • all other degrees are 1 • for the access the <int>th node, first <int>-1 pointers must be traversed { // forward linked list of numbers in C struct fil { int value; // actual data fil *next = NULL; // pointer to the memory address of the next element; // takes up the terminating signal as a value by default }; // method for reading an arbitrary element int access(fil *e, unsigned int n){ for(int i = 0; i < n; i++){ if(this->next == NULL){ // the list was over indexed /* Arbitrary error handling */ } e = e->next; } return e; } // recursive method for reading an arbitrary element int access(fil *e, unsigned int n){ // ?!; NOTE: this is probably the best example ever for recursion; reuse it! if(n == 0){ return this->value; } if(this->next == NULL){ // the list was over indexed /* Arbitrary error handling */ } return access(this->next, n-1); } // abstract graph 0──────0 │ Head │ -------------+ +---------+ +---------+ +---------+ +-------------+ 0──────0 | | | | | | | | | V | V | V | V | V /* ADDRESS */ +------------+ | +------------+ | +------------+ | +------------+ | 0────────0 /* */ | 0xnnnn38b6 | | | 0xnnnn5c02 | | | 0xnnnndf84 | | | 0xnnnn2a76 | | │ 0x0000 │ /* */ +------------+ | +------------+ | +------------+ | +------------+ | 0────────0 | int value: | | | int value: | | | int value: | | | int value: | | | 3 | | | 5 | | | 1 | | | 7 | | | | | | | | | | | | | | | fil *next: | | | fil *next: | | | fil *next: | | | fil *next: | | | 0xnnnn5c02 |---+ | 0xnnnndf84 |---+ | 0xnnnn2a76 |---+ | NULL |---+ | | | | | | | | +------------+ +------------+ +------------+ +------------+ // layout in memory ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃┌──────┐ ┌───────────┐┃ ┃│ Head │-------+ │ my_fil[3] │┃ ┃└──────┘ | └───────────┘┃ ┃ V A | ┃ ┃ ┌───────────┐ | | ┃ ┃ ? │ my_fil[0] │ | | ┃ ┃ └───────────┘ | | ┃ ┃ | | | ┃ ┃┌───────────┐ | ? | | ┃ ┃│ my_fil[1] │<-+ | ++┃ ┃└───────────┘ | |┃ ┃ | | ? |┃ ┃ | +-+ |┃ ┃ | ? | |┃ ┃ | | |┃ ┃ | ┌───────────┐|┃ ┃ ? +---------->│ my_fil[2] │|┃ ┃ └───────────┘|┃ ┃ |┃ ┃ ? |┃ ┃ ? V┃ ┃ @@@@@@@@┃ ┃ @ NULL @┃ ┃ @@@@@@@@┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ } Doubly: • each node knows the location of the next and previous one • makes backtracking possible {for reverse search} ○ definition by degree • there is exactly one node with an in-degree of 1 (the first) • there is exactly one node with an out-degree of 1 (the last) // except if cyclical termination is used; ?! • all other degrees are 2 { // doubly linked list of numbers in C struct fil { int value; // actual data fil *next = NULL; // pointer to the memory address of the next element; // takes up the terminating signal as a value by default fil *prev = NULL; // pointer to the memory address of the previous element; // takes up the terminating signal as a value by default }; /* abstract graph */ +----------+ +----------+ +----------+ 0──────0 | | | | | | │ Head │ ------------+ | +----------+ | +----------+ | +----------+ +-------------+ 0──────0 | | | | | | | | | | | | | | | V V | | V V | | V V | | V | V /* ADDRESS */ +------------+ | | +------------+ | | +------------+ | | +------------+ | 0────────0 /* */ | 0xnnnn38b6 | | | | 0xnnnn5c02 | | | | 0xnnnndf84 | | | | 0xnnnn2a76 | | │ 0x0000 │ /* */ +------------+ | | +------------+ | | +------------+ | | +------------+ | 0────────0 | int value: | | | | int value: | | | | int value: | | | | int value: | | A | 3 | | | | 5 | | | | 1 | | | | 7 | | | | | | | | | | | | | | | | | | | | fil *next: | | | | fil *next: | | | | fil *next: | | | | fil *next: | | | | 0xnnnn5c02 |---+ | | 0xnnnndf84 |---+ | | 0xnnnn2a76 |---+ | | NULL |---+ | | | | | | | | | | | | | +---| fil *prev: | +---| fil *prev: | +---| fil *prev: | +---| fil *prev: | | | | NULL | | 0xnnnn5c02 | | 0xnnnn5c02 | | 0xnnnndf84 | | | +------------+ +------------+ +------------+ +------------+ | | | +-------------------------------------------------------------------------------------------------------+ } Chunk_list: • each node consist of arrays of the data wished to be stored Tree: • the head node is called the root • terminating node, which points to no valid elements is called a leaf • the distance of a node from the root (the number of nodes that must be traversed to access it) is called its level or dept; the numbering of leaves can be 0 or 1 based; 1 based makes more sense if the head is stored explicitly, but both are valid either way; ill use 1 based • the distance of a node to the deepest node that can be accessed through it is called its heigh • leaves have a height of 0 • the height of the root is the level of the deepest node (or that -1 if counted from 1) • the height of the root is the height of the tree ○ formal definition • there is exactly one node with in-degree of 0 (the root) • all other nodes have in-degree of 1 • there is exactly one path from the root to any leaf — the tree data structure has many variations, some interesting ones are: • trie (for prefix search) • octree (for graphics) • rope (for text editors) { // graph of generic tree LEVEL O ROOT 1. /|\ / | \ o o o 2. / / \ \ 0 o 0 o 3. | / \ 0 0 0 4. 0s mark leaves // generic tree with heights LEVEL O "HEIGHT: max(level)-1=3" 1. / \ / \ / \ "HEIGHT: 1" o o "HEIGHT: 2" 2. | / \ 0 0 o "HEIGHT: 1" 3. "HEIGHTS: 0" | 0 4. } • the nodes a node points to are called its children • the node that points to a node is called its parentparents point to their children • nodes which have the same parent are called siblings Binary: • each node can have 2 children max • children are named left child and right child • the tree is strictly binary if each node has either 2 or 0 children { struct Node{ int value; Node* left = NULL; Node* right = NULL; } } { O O O / \ / \ / \ 0 0 0 o / \ / \ o o 0 0 / \ / \ 0 0 0 0 } — the tree is full binary of <int> if: • each node on level <int> is a leaf and • each node on level <int>-1 has left and right children — the tree is complete binary if: • each node on level height-2 has 0 or 2 children and • every node on level height-1 has 2 children or only a left child — balanced tree: • every nodes subtrees height differ by at most 1 { struct Node{ int value; Node* left = NULL; Node* right = NULL; } } traversal: — the order of the following operations are arbitrarily mixable; they all have an associated char as a short-hand: • continue-ing with the left child 'L' • continue-ing with the right child 'R' • readint the current node 'N' +===================+ I Order I Name I +===================+ | LRN | postorder | +-------------------+ | LNR | inorder | +-------------------+ | RLN | N/A | +-------------------+ | RNL | N/A | +-------------------+ | NLR | preorder | +-------------------+ | NRL | N/A | +-------------------+ // Implementation of inorder traverse void traverse(Node* root){ // NOTE: root, as in root the subtree currently traversed if(root = NULL){ return; } traverse(root->left); printf("%d\n", root->value); // print the value to a new line; traverse(root->right); // NOTE: printing is an example; } // a function pointer could have been passed to perform any operation reusing traverse() — the tree is full binary of <int> if: • each node on level <int> is a leaf • each node on level <int>-1 and below has left and right children {// full binary tree of 2 | full binary tree of 3 | full binary tree of 3 | O O O / \ / \ / \ 0 0 / \ / \ o o / \ / \ / \ / \ 0 0 0 0 o o / \ / \ / \ / \ 0 0 0 0 } — the tree is complete binary if: • each node on level height-2 has 0 or 2 children and • every node on level height-1 has 2 children or only a left child • the tree is ordered or other wise called a binary search tree if each nodes, left child holds a smaller value than itself and its right child holds a larger value than itself { // Here the numbers represent the nodes value 8 / \ / \ / \ / \ 3 10 / \ \ / \ \ 1 6 14 / \ / 4 7 13 } Balanced_tree: • every nodes subtrees height differ by at most 1 — balance factor: • each node has one • calculated so that the tree can be kept balanced • height(right_subtree) - height(left_subtree) • -1, 0 or 1 { // NOTE: <num>f is the nodes balance factor O 1f / 0 0f O -1f / \ / \ / \ / \ 0f o o -1f / \ / / \ / 0f o 0f o 0 0f / \ / \ 0 0 0 0 0f 0f 0f 0f // unbalanced tree O \ o \ o / 0 } Hash_table: • often used interchangeable with the terms map or dictionary, tho technically these later define the behaviour while, hash table refers to the implementation Components: • bucket array • bucket containers • hash function Hash_function: <function> : <set-1> -> range(0, len(<bucket array>)) • a function which will a valid index inside the bounds of the bucket array to arbitrary input • the more evenly it distributes the better // Using a numbers with the bucket array length is // a very simplistic example of hashing (not practical) // NOTE: the example presumes a table with a bucket array length of 5 int rem10(int i) { return i % 5; // Modulo operator in C } • a perfect hash function maps exactly 1 hash to every element of a set of inputs ( this set ought to be a subset of all inputs) • a minimal perfect hash function is a perfect hash function that maps the inputs to a dense range of integers of [0,...|S|-1] Rehashing: • the larger ${elements num}/${bucket num} is, the slower operating in the table gets • operation speed can be increased by reallocating all data to a new table • a new hash function is chosen and every element is hashed again • the bucket array size usually increased to 2 times its original size /* Keys */ | /* Values */ ============+==================================================================================================================== ┏━━━┓ | 0────────0 ┌─────────────┐ ┌─────────────┐ ┃ 0.┃ | │ Head-0 │ -----> │ Element-0-0 │ -----> │ Element-0-1 │ -----> NULL ┗━━━┛ | 0────────0 └─────────────┘ └─────────────┘ | ┏━━━┓ | 0────────0 ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┃ 1.┃ | │ Head-1 │ -----> │ Element-1-0 │ -----> │ Element-1-1 │ -----> │ Element-1-2 │ -----> │ Element-1-3 │ -----> NULL ┗━━━┛ | 0────────0 └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ | ┏━━━┓ | 0────────0 ┃ 2.┃ | │ Head-2 │ -----> NULL ┗━━━┛ | 0────────0 | ┏━━━┓ | 0────────0 ┌─────────────┐ ┃ 3.┃ | │ Head-3 │ -----> │ Element-3-0 │ -----> NULL ┗━━━┛ | 0────────0 └─────────────┘ | ┏━━━┓ | 0────────0 ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┃ 4.┃ | │ Head-4 │ -----> │ Element-4-0 │ -----> │ Element-4-1 │ -----> │ Element-4-2 │ -----> NULL ┗━━━┛ | 0────────0 └─────────────┘ └─────────────┘ └─────────────┘ |