machine_learning

#define machine_learning: \ I------------------------------------------\ I------------------------------------------\ I \ I /$$$$$$ /$$$$$$ \ I /$$__ $$|_ $$_/ \ I | $$ \ $$ | $$ \ I | $$$$$$$$ | $$ \ I | $$__ $$ | $$ \ I | $$ | $$ | $$ \ I | $$ | $$ /$$$$$$ \ I |__/ |__/|______/ \ I------------------------------------------\ I------------------------------------------I # accurate cartoon depiction https://xkcd.com/1838/ # introductory resource http://neuralnetworksanddeeplearning.com # "Machine Learning in C" (from scratch) https://www.youtube.com/watch?v=PGSba51aRYU&list=PLpM-Dvs8t0VZPZKggcql-MmjaBdZKeDMw # Machine Learning from scratch in Python https://github.com/agvxov/neural_network_from_scratch # music theme for the chapter https://www.youtube.com/watch?v=j8wHeVbPQI8 " \ Breakthroughs that lead the Singularity could be reached by anyone. \ It could happen literally anywhere. If a few geniuses set up shop \ in Theodore Kaczynski’s former log cabin in Montana, \ the Singularity could begin there. \ " - Michell Heisman, Suicide Note, 1708. • "But bro, when will i ever use matrix operations and calculus?\ Those are such a waste of time!" Here. At the same time, in fact. Enjoy hell, stalker child. • think about it this way: we are trying to construct a function based on given inputs and outputs; most of the time these input-output pairs are partial too, meaning there are inputs for which we do not know the output for; mathematically speaking, we have no bloody clue what we are doing; if such function already exist we do not know about it; for this reason, we wish to deploy some method that approximates our desired function as much as possible; we construct a model based on our known input-output pairs, start with a random function and a way to measure how well it performs compared to the desired function by comparing it's and the desired outputs, then using derivatives we find what direction to tweak the values of our approximation function to get closer to our desired function NEURONS: • roughly imitates biological (human) neurons Perceptron: https://www.youtube.com/watch?v=4Gac5I64LM4 also referes to "single layer neural network" ○ components 1. Inputs 2. Weights 3. Bias 4. Threshold 5. Output • the original virtual neuron • each input is binary, only the weights are fractions • every input and its corresponding weight is factored; then summarized; bias is added; this sum is judged based on a threshold value • any minor change in the weights will most likely result in major changes in the output • one could set up the weights by hand or bruteforce them, but both are tedious input-1 \ ‾‾\ __ input-2 ----| D>---> output / ‾‾ input-3 /‾‾ / if Σⱼxⱼwⱼ =< threshold then 0 output { \ if Σⱼxⱼwⱼ > threshold then 1 Logical_neuron: https://en.wikipedia.org/wiki/Activation_function ○ components 1. Inputs 2. Weights 3. Bias 4. Activation function 5. Deactivation function 6. Output • the output is a fraction, with most activation functions, between 0 and 1 I₁ ____ \ * W₁ ‾‾‾‾‾\__ .───┬───. * W₂ \│ │ '─. I₂ --------------│ ∑ │ f[] >---- __/│ │ .─' * W₃/ '───┴───' I₃ ____/‾‾‾‾‾ — the original activation function is the Sigmoid function: 1 ──────────── 1 + exp(x) • a minor change in a weight only results in a minor change in the output • the deciding property of an activation function's fitness is its shape { // Shape of the Sigmoid function 1.0 ├─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ ___...| │ _..--`` │ ,' │ │ ,‾ │ - │ │ . │ - │ │ _' │ _-' │ │___...--"" 0.0 ┼─────────────────┼─────────────────┴ . -5 0 5 } NEURAL_NETWORKS:"NN" • this new and revolutionary technology that will give us AGI within 2 more weeks without a paradigm shift is from 1957 ○ components: 1. Neurons 2. Architecture 3. Loss function 3. Learning algorithm ○ typical visual representation of the neural network /* Neuron-1 Neuron-2 _____ _____ / \ Weight / \ | |____________| | | | | | \_____/ \_____/ */ Physical architecture of single perceptron: __ input ---| D>---> output ‾‾ Virtualized architecture of single perceptron: _ _ _ { }--->{ }--->{ } ‾ ‾ ‾ ┃Input┃Hidden┃Output┃ ┃layer┃layers┃ ┃/* _ _ │ ( )-│-( ) │ │ ‾\\_//‾ \_ │ _ /|\ _ \ _ │ ( )-|-( )--│( ) │ ‾ \|/ ‾ _/ ‾ │ _//‾\\_ / │ ( )-│-( ) │ │ ‾ ‾ */ Feedforward Network /* _ _ ( )---( ) ‾\\_//‾ \_ _ /|\ _ \ _ ( )-|-( )---( ) ‾ \|/ ‾ _/ ‾ _//‾\\_ / ( )---( ) ‾ ‾ */ ━━━━━━━━━━━━━━━━━━━━━▶ Dataflow Architecture: • the method by which neurons are logically ordered, in practice this means a web formed based on output to input piping between neurons Layers: • a layer is a group of neurons which do not communicate with eachother, however do share their input and output neurons • the input layer is a virtual layer that corresponds to the input • the output layer is a virtual layer that corresponds the output values of the last physical layer • a hidden layer is a layer between the input and the output layers { /* "The activation function is applied once per layer" - said my prof to my greatest surprise. I'm no expert at all, but even the most simplistic article will tell you that in a neuron, after the weighted inputs are added together, the activation function is applied. So clearly, the activation function is applied in a layer as many times as the number of neurons, right? Well, kinda, however a (fully connected) layer can be expressed as a matrix: let n := the number of neurons let m := the number of the number of inputs (ie. the number of weights per neuron) myLayer := __ __ | | | W₁₁ W₁₂ ... W₁ₙ | | | | W₂₁ W₂₂ ... W₂ₙ | | | | ... ... '-. ... | | | | Wₘ₁ Wₘ₂ ... Wₘₙ | | | |__ __| Now, if we make the activation function a matrix operation -and we should too, because GPUs are outstanding at that-, it is in fact applied once. */ } • a network where each layers outputs are fed as the next input, but not elsewhere and always in one direction, is called a feedforward network • a non-feedforward network, where feedback loops are implemented is called a recurrent networkrecurrent networks are more similar to the human brain than feedforward networksfeedforward networks are easier to work with, so they enjoy the privilege of being more researched Network_of_Perceptron_neurons: XOR_problem: • cause of the first AI winter • it is said that a perceptron is unable to learn calculating the logical operaiotion explusive or • Σⱼxⱼwⱼ is a linear equation; meaning in the plain it creates a line • only linearly separable problems are solvable { // Boolean values projected to the plain 1 │ x x │ │ │ │ 0 │ x x ┼────────────── 0 1 // Our perceptron is able to place a single line on this plain // and label by it 1 │ x x │ A │--.. │ ``--.. │ B ``-- 0 │ x x ┼────────────── 0 1 // As represented like this, OR, AND and NAND are classifiable // OR AND NAND 1 │ x x 1 │ x '. x 1 │ x '. x │'. │ '. true │ '. false │ '. true │ '. │ '. │ '. │ false '. │ true '. │false '. │ '. │ '. 0 │ x '. x 0 │ x x 0 │ x x ┼────────────── ┼────────────── ┼────────────── 0 1 0 1 0 1 // NOTE: notice how the angle of the line is arbitrary, // there are multiple configurations that work just as well // Now, if we wanted to do the same with XOR, we would be in trouble. // At least 2 lines would be required: 1 │ x/ x / │? / ??? / │ / / │/ / │ ?? / ?? 0 │ x / x ┼────────────── 0 1 // Now, i wish to show you a trick. // Assume the below data: ▲ X X │ X │ X X A │A A │ AA X ◀────────┼────────▶ A A│ A X X │ X │ X │X ▼ // Clearly, there is no single line to separate 'A's from 'X's, // yet the border seems very trivial to us, if only could draw a cirle... // However, we could represent our data in an alternative way, // say in a polar coordinate system, where horizontally we // represent the distence from the origo, and vertically the // angle closed with the original (positive) X axes. ▲ π │ X │ A X │ X │ A X │ A X │ AAA X ┼─────────────▶ R │ A X │A X │ │ A X │ X │ X ▼ -π // With this transposed data set, we could do a linear separation and // there by teach it to a perceptron. // You may ask, why can't we do something similar with the XOR problem? // Well, we could. For example, if we order by the difference of the // input values, we would get something that is linearily separable. // So, you can trick a perceptron into solving the logical operation XOR. // However, the take away is that thats not the point. // We could only trick the perceptron because we knew the transformation perfectly // and applied some smart transformation. // In the real word we hardly know the function of differenciating cats from // dogs, and perhaps there exists no representation where thats linear. // The XOR problem is the difficulty it shows, not about how we were missing // an AI logic gate that we would have needed for something. } Training:"learning"/"fitting" • the process of reassigning weights with the intend of gaining better outputs • overfitting is the phenomenon when a model has adapted to the learning data so well, that it is unable to perform good on other data • the more complex the model, the more probable overfitting is Supervision: Supervised_learning: • learning data is labeled Unsupervised_learning: • learning data is not labeled • the model forms its own concepts in the form of clusters • requires significantly more data for effective training • the resulting models tend to be more creative {more reliable on data which was not in the learning set; creates better AI art} Learning_rate: • when its calculated what direction to converge to, the value of the learning rate indicates the amount of change that should take effect • the learning rate doesnt actually "know" how much to change, its a (-n educated) guess • the learning rate could cause the model to continously over shoot the optimal values or to converge way too slow — typical value interpretations: // why do i feel like as if i were making notes of zodiac signs? ... - 0.01 // small 0.01 - 0.1 // medium 0.1 - ... // large optimizer: • an object or function which is in charge of dynamically chaning the learning rate • consults the loss — common optimizers: • "Stochastic Gradient Descent "ADAptive Moment estimation""Nonlinear ADAM" Weight_updating: • traditionally weights are updated once in every epoch • when weights are updated after each data point, that is called online learning; its often the most simplistic approach when the dataset of an epoch cannot fit into memory at once Random: • the brute forcing of weights • can work ok-ish on very small networks • basically useless, mostly for demonstration purposes or to serve as a baseline Finite_difference: △f(x) = f (x + b) − f (x + a) • derivative approximation Backpropagation: https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fd https://neptune.ai/blog/backpropagation-algorithm-in-neural-networks-guide https://pyimagesearch.com/2021/05/06/backpropagation-from-scratch-with-python/ • learning algorithm based on gradient descent and utalizing the Leibniz chain rule Reinforcement: • used when arriving to the right conclusion requires a number of steps • the desired result is either unknown or very hard to create a dataset from • an agent monitors the environment and guesses the next best action to take — a policy is used to determine when the agent performed well: • akin to loss calculation, but no gradient descent is necessary • when the policy only rewards the end result, its possible that the agent fails to ever arrive there by pure chance, making it unable to optimize its solution • when the policy rewards small actions, its possible that it will overfit to partial solutions, coming up with the most retarded of practices • its hard to come up with a good policy for complex problems • models are NOT portable between environments • more hyperparameter sensitive than supervised learning { observation ┏━━━━━━━┓ action ┌───────────▶┃ Agent ┃────────┐ │ ┗━━━━━━━┛ │ │ ▲ │ │ │reward │ │ │ │ │ ┏━━━━━━━━━━━━━┓ │ └─────────┃ Environment ┃◀────┘ ┗━━━━━━━━━━━━━┛ } Fine_tuning:"transfer learning" • common technique • an already trained model being adapted to a more specific task • being given a pretrained model and fine-tuning it is significantly cheaper and faster than training from scratch • the fine-tuning can be done on proprietary or obscure data • full fine-tuning is fine-tuning that uses an identical process to the initial training • partial fine-tuning is fine-tuning where only a select subset of the weights are updated, the rest are kept intact; usually the outer layers are updated and the intuition of the deep layers are reused • additive fine-tuning is fine-tuning where new parameters are inserted; sometimes entire layers are added; this helps the model retain its intelligence similar to full fine-tuning, but is significantly cheaper • prompt tuning involves preprocessing the user prompt; the preprocessing is usually done by another, significantly faster model that appends keywords, examples to the desired output format, tone or bias • RAG involves vector searching a document based on the user input and further prompt tuning with this additional context; traditionally not considered fine-tuning Dataset: • the available data at during development time to train/test on — the data set is usually split: • training data; fed to the machine while it learns • testing data; allocated for testing after learning is finished; useful for finding out how well the model does on data that it has never seen before, but in quality is equal to the training data Augmentation: • the process of generating more training data from the initial training data • used for avoiding overfitting — usually done by applying basic transformations to the dataset • rotation • zoom • flipping PCA:"Pricipal Component Analisys" • in datasets, often times the same variable is encoded multiple times • finding and removing redundancy in data • "reducing dimension while perserving the variance present" • in the context of NNs, it referes to optimizing the input for training times { downsizing images to the edge of recognizability; removing noise and color from images; stripping one of height in cms/inches of horse when both are available } Batching: • packing the dataset in smaller collections • each batch is used independently to adjust weights • smaller memory footprint • the training data can be arbitrary large and still processable • more frequient weight adjustments (might have a positive effect on model performance) • less accurate estimation Tokenization: • encoding for NNs • neural networks can only understand arrays of numbers; yet for their usefulness feeding something more complex would be ideal • any data to be fed to a NN must be encoded • in the case of average NNs, this means the data points are resized to fit the bounds of the activation function and is flattened so that every data point has its own input neuron in a 1 dimensional manner { // Assume an image (BELOW is trying to be the classing Windows XP hills background) ┌───────┐ │ # # │ │/''--__│ │_---'''│ └───────┘ // We could encode pixel data as grayscale, // leaving us with pixel values 0-255. // Assume our activation function is the Signmoid. // The Sigmoids value span is 0-1. // Every pixel will be representable as: ${GRAYSCALE_VALUE} ──────────────────── 255 // Or, sticking with our ascii art, we could take the ascii values (box omitted). // Using ISO ascii, we will only need 128 (7 bit) values.. ${ASCII_VALUE} ────────────────── 127 // Lets write out each ascii value: ' ' (#32); ' ' (#32); ' ' (#32); '#' (#35); ' ' (#32); '#' (#35); ' ' (#32); '/' (#47); ''' (#39); ''' (#39); '-' (#45); '-' (#45); '_' (#95); '_' (#95); '_' (#95); '-' (#45); '-' (#45); '-' (#45); ''' (#39); ''' (#39); ''' (#39); // Lets force these values between bounds (with rounding here): ' ' (#0.25); ' ' (#0.25); ' ' (#0.25); '#' (#0.28); ' ' (#0.25); '#' (#0.28); ' ' (#0.25); '/' (#0.37); ''' (#0.31); ''' (#0.31); '-' (#0.35); '-' (#0.35); '_' (#0.75); '_' (#0.75); '_' (#0.75); '-' (#0.35); '-' (#0.35); '-' (#0.35); ''' (#0.31); ''' (#0.31); ''' (#0.31); // Now flatten it into a 1 dimension: ' ' (#0.25); ' ' (#0.25); ' ' (#0.25); '#' (#0.28); ' ' (#0.25); '#' (#0.28); ' ' (#0.25); '/' (#0.37); ''' (#0.31); ''' (#0.31); '-' (#0.35); '-' (#0.35); '_' (#0.75); '_' (#0.75); '_' (#0.75); '-' (#0.35); '-' (#0.35); '-' (#0.35); ''' (#0.31); ''' (#0.31); ''' (#0.31); // Omitting the visualization meta data for humans, we are left with the following array, // which could easily be easily fed to a neural network: [0.25, 0.25, 0.25, 0.28, 0.25, 0.28, 0.25, 0.37, 0.31, 0.31, 0.35, 0.35, 0.75, 0.75, 0.75, 0.35, 0.35, 0.35, 0.31, 0.31, 0.31] } Natural_language: • when tokenizing natural language, per character encoding is usually not the best idea • tokenizing by words / word segments can yield better results and require smaller networks { > # Word tokenization in Tensorflow > import tensorflow as tf > from tensorflow import keras > from tensorflow.keras.preprocessing.text import Tokenizer > examples = ['Heyo world!', 'Goodbye cruel world'] > print(Tokenizer.fit_on_texts(examples).word_index) > t = Tokenizer() > t.fit_on_texts(examples) > print(t.word_index) {'world': 1, 'heyo': 2, 'goodbye': 3, 'cruel': 4} } Autoencoder: //?!; organize • encodes/decodes • translates data to a more efficient representation then attempts to reconstruct it Encoding Decoding _ _ |i|' '|o| |n| ' _ ' |u| |p| '|#|' |t| |u| .|#|. |p| |t| . ‾ . |u| | |. A .|t| ‾ | ‾ dense representation input ~ output • it teaches the AI to do its own PCA (see AT ?!; "principal component analisys") • can be used to remove noise {damaged images} from input • often used as a component for larger systems (?!) FCN:"Fully Connected neural Network" Convolution: "?!/Image recognition/kernel" • kernel operation • ${N} dimensional (usually 2) • a CNN or "Convolutional Neural Network" contails atleast one convolutional layer • retains spacial information • generally good at computer vision tasks • smaller kernels generally perform better • with stride results in an output of different size ○ hyperparameters • kernel size • strides (kernel shift amount) (>1 further reduces the output size) • activation • padding (dummy border for the input to modify {preserve} output size) sizeof(output) := sizeof(input) - sizeof(kernel) + 1 ₘ₋₁ ₘ₋₁ y₍ᵢ,ₕ₎ := ∑ ∑ f₍ₖ,ₗ₎ * x₍ᵢ₊ₖ,ₕ₊ₗ₎ ᵏ⁼⁰ ˡ⁼⁰ +--+--+ | 1| 2| ┌──────────+--+--+────────┐ │ | 2| 1| │ │ +--+--+ │ │ │ │ Kernel │ │ │ #=====#--+--+ #==#--+--+ I 1| 2I 2| 1| I13I11| 6| I--+--I--+--+ #==#--+--+ I 3| 2I 1| 0| |11|11|11| #=====#--+--+ +--+--+--+ | 1| 2| 3| 4| |12|22|26| +--+--+--+--+ +--+--+--+ | 3| 1| 1| 3| +--+--+--+--+ Input Output — max pooling: • simply outputs the max value inside the kerner RNN:"Recurrent Neural Network" • neurons are layout in a self feeding architecture • the information flow is recursive • traditionally used to solve sequence-to-sequence (seq2seq) problems {translation} Transformers: https://www.youtube.com/watch?v=iDulhoQ2pro arXiv:1706.03762 • modified feedforward networks • have the advanteges of RNNs • unlike RNNs they can be easily parallelized on a large scale Multihead_attention: ┌────┴────┐ │ Concat │ └─────────┘ ▲ │ ┌┼┐ │││ ┌───────││┴──────────┐ ┌────────│┴──────────┐│ ┌─────────┴──────────┐││ │ Scaled Dot-Product ││┘ │ Attention │┘ └─┬───────┬────────┬─┘ ┌───┘││ │││ ││└──┐ │┌───┘│ │││ │└──┐│ ││┌───┘ │││ └──┐││ ┌──││┴───┐ ┌──││┴───┐ ┌──││┴───┐ ┌───│┴───┐│ ┌───│┴───┐│ ┌───│┴───┐│ ┌────┴───┐│┘┌────┴───┐│┘┌────┴───┐│┘ │ Linear │┘ │ Linear │┘ │ Linear │┘ └────────┘ └────────┘ └────────┘ ▲ ▲ ▲ │ │ │ Query Key Value Architecture: Output Probabilities ▲ │ ┌─────────┐ │ Softmax │ └─────────┘ ▲ │ ┌────────┐ │ Linear │ └────────┘ ▲ ├─┐ ├┐│ ┌──────││┼──────────┐ ┌───────│┼──────────┐│ ┌────────┼──────────┐││ │ ┌──────┴──────┐ │││ │ │ Add & Norm │<┐ │││ │ └──────┬──────┘ │ │││ ┌────────────────┐ │ ┌──────┴──────┐ │ │││ ┌────────────────┐│ │ │ Feed │ │ │││ ┌────────────────┐││ │ │ Forward │ │ │││ │││ │││ │ └─────────────┘ │ │││ ┌──────││┼──────────┐ │││ │ ▲ │ │││ ┌───────│┼──────────┐│ │││ │ ├────────┘ │││ ┌────────┼──────────┐││ │││ │ ┌─────────────┐ │││ │ ┌──────┴──────┐ │││ │││ │ │ Add & Norm │<┐ │││ │ │ Add & Norm │<┐ │││ │││ │ └──────┬──────┘ │ │││ │ └──────┬──────┘ │ │││ │││ │ ┌──────┴──────┐ │ │││ │ ┌──────┴──────┐ │ │││ │││ │ │ Masked │ │ │││ │ │ Feed │ │ │││ │││ │ │ Multi-Head │ │ │││ │ │ Forward │ │ │││ │││ │ │ Attention │ │ │││ │ └─────────────┘ │ │││ ││└───│ └─────────────┘ │ │││ │ ▲ │ │││ │└────│ ▲ ▲ ▲ │ │││ │ ├────────┘ │││ └─────┼───┴────┘ ├───┘ │││ │ ┌─────────────┐ │││ │ ┌─────────────┐ │││ │ │ Add & Norm │<┐ │││ │ │ Add & Norm │<┐ │││ │ └──────┬──────┘ │ │││ │ └──────┬──────┘ │ │││ │ ┌──────┴──────┐ │ │││ │ ┌──────┴──────┐ │ │││ │ │ │ │ │││ │ │ Masked │ │ │││ │ │ Multi-Head │ │ │││ │ │ Multi-Head │ │ │││ │ │ Attention │ │ │││ │ │ Attention │ │ │││ │ └─────────────┘ │ │││ │ └─────────────┘ │ │││ │ ▲ ▲ ▲ │ │││ │ ▲ ▲ ▲ │ │││ │ └────┼────┘ │ ││┘ │ └────┼────┘ │ ││┘ │ ├────────┘ │┘ │ ├────────┘ │┘ └────────┼──────────┘ └────────┼──────────┘ ▲ ▲ │ │ /‾\ /‾\ | + | Positonal - Encodings | + | \_/ \_/ ▲ ▲ │ │ ┌───────────┐ ┌───────────┐ │ Input │ │ output │ │ Embedding │ │ Embedding │ └───────────┘ └───────────┘ ▲ ▲ │ │ Inputs Outputs shifted right LLM: https://thqihve5.bearblog.dev/the-modern-library-of-babel/"Large Language Models" — the context window is the largest input a model can take; since they have no other "mental" storage, this is practically their memory span; measured in tokens Hyperparameter_optimization: • a hyperparameter is a configurable setting of the model that is not fine-tuned during training {architecture; activation function} • the problem with hyperparameter optimization in the field of AI is that we have no mathematical way of knowing how different hyperparameters will perform, except for eval-ing them of course, but thats expensively expensive • educated guessing while eval-ing a few different setups is the best we can do Coding_with_AI: — vibe coding: • the act of coding with an LLM and forgetting about the existence of the code • meant to convey a sense of relief and programming without obstacles • the term is often used by otherwise tech-illiterate people • one-way-ticket to introduce security vulnerabilities into your code which have been otherwise dead since the late 2000's • the coining of the term marks the official beginning of the competency crisis • for high purity, stable environments LLMs far excel at fetching documentation • can be used as a linter • can be used for spotting obvious bugs which the developer misses due to hyper-fixation on other parts { calling "chroot" instead of "chdir" } • the smaller the language, the more confident you can be in the quality of the AIs suggestion — incredible as a (very expensive) transpiler: // ?!; example • you provide pseudo code • you can allow the AI to implement smaller functions based on its signature • works especially well for scripting • the code will come out as high quality as if it was hand written, but the annoyances of scripting were skipped • all typos will be fixed • all libraries are resolved • all library function names are guessed based on the description • all argument orders are corrected Tips: • do not ask the model to omit comments; even tho its extensive usage of comments is annoying, it helps with its accuracy • indentation and proper usage of newlines is enough for the model to judge the span of code blocks { /* Very simple example: * This is a real need that came up while * writting the CDD to HTML pipeline. */ // Prompt: complete the following pseudo code: perl script strict mode function usage print name "<file>" die if no arguments print usage end if lines = open argv[1] if error usage while last line matches regex "#define .*" delete last line print lines // Output (comments stripped): #!/usr/bin/perl use strict; use warnings; sub usage { print "$0 <file>\n"; die; } @ARGV == 1 or usage(); open my $fh, '<', $ARGV[0] or usage(); my @lines = <$fh>; close $fh; while (@lines && $lines[-1] =~ /#define .*/) { pop @lines; } print @lines; /* The output is literally perfect. * It looks exactly like what I wanted. * In fact, -I'm currently not great with Perl- * I had no idea you can call just `pop`. * Obviously, this is a trivial script. * If your concern is "how often does something * like this come up?", my answer is that, * maybe you should create more such situations. * UNIX-ize your work process and you will be able * continuously harness the power of AI without * ever creating spaghetti. * If your concern is "well, how much time did it really safe?", * then whats important to realize that my knowledge * of the language did not bottleneck my development speed. * There were no seconds waster remembering symbol names * or going back to add missing semi-colons. * My typing speed and auto-complete also became irrelevant. */ }