filetypes
#define filetypes: \
I-------------------------------------------------------------------------------------------------------\
I-------------------------------------------------------------------------------------------------------\
I-------------------------------------------------------------------------------------------------------\
I /$$$$$$$$ /$$ /$$ /$$$$$$$$ \
I | $$_____/|__/| $$ |__ $$__/ \
I | $$ /$$| $$ /$$$$$$ | $$ /$$ /$$ /$$$$$$ /$$$$$$ /$$$$$$$ \
I | $$$$$ | $$| $$ /$$__ $$ | $$| $$ | $$ /$$__ $$ /$$__ $$ /$$_____/ \
I | $$__/ | $$| $$| $$$$$$$$ | $$| $$ | $$| $$ \ $$| $$$$$$$$| $$$$$$ \
I | $$ | $$| $$| $$_____/ | $$| $$ | $$| $$ | $$| $$_____/ \____ $$ \
I | $$ | $$| $$| $$$$$$$ | $$| $$$$$$$| $$$$$$$/| $$$$$$$ /$$$$$$$/ \
I |__/ |__/|__/ \_______/ |__/ \____ $$| $$____/ \_______/|_______/ \
I /$$ | $$| $$ \
I | $$$$$$/| $$ \
I \______/ |__/ \
I-------------------------------------------------------------------------------------------------------\
I-------------------------------------------------------------------------------------------------------\
I-------------------------------------------------------------------------------------------------------I
• "file formats"
EXTENSIONS: EXTENSIONS:
name.extension
• the traditional way to identify filetypes
• in a file name, everything after a dot is an extension
• extensions are usually abbreviations of file types
{ script.sh -> SHell script }
MIMETYPE: MIMETYPE:
type/subtype
• "media type"
• IANA defined file type classification system
• binary mimetypes have magick bytes associated with them which go at the
start of the file for identification purposes
{
text/plain
video/mp4
application/octet-stream
}
Binary: Binary:
pass
Executable: Executable:
ELF:
#include <elf.h>
• "Executable and Linkable Format"
• de facto standard on *nix
• cross architecture
— header:
0x7F 'E' 'L' 'F'
word size byte
○ extensions
.elf
.o
.out
.prx
.puff
.ko
.mod
.so
Tools:
readelf <flag>+ <file>+
Fat_binary:
• contains the bytecode for multiple architectures
• the header or an all compatible jmp is responsible for selecting
the right chapter for the current architecture
• used for convinience of distribution
Plain_text: Plain_text:
Value: Value:
{
<value>
}
• the file contains only a single value
• the "key" (identifier) it belongs to is deducted from the file name
• UNIX like systems heavily use this format for their virtual files
{
───────┬──────────────────────────────────────────
│ File: /proc/sys/kernel/panic_on_oops
───────┼──────────────────────────────────────────
1 │ 0
───────┴──────────────────────────────────────────
}
List: List:
{
<value>(<separator><value>)+
}
• some number of values segregated by a special token
• <separator> is most commonly a new line ('\n')
{
numpy
pandas
matplotlib
requests
scikit-learn
tensorflow
beautifulsoup4
flask
django
pytorch
}
CSV: CSV:
• "Comma Separated Values"
• used for storing spreadsheets and databases
• 2 dimensional list files
• all fields are separated by commas
• the first line may store the name of the <int>th column
{ [Field_name1](,[...])
[Field_value1](,[...])
}
cfg: cfg:
{
(
<key>=<value>
)
}
• may or may not be whitespace sensitive around the '='
ini: ini:
{
(
<label>
<assignment>+
)+
}
○ <label>
[<string>]
• signals in what context the assignments BELOW should be interpreted
• serves as a sort of namespace
○ <assignment>
<string-key> = <string-value>
• eliminates redundancy which would be caused by key (name) prefixing
• statements are context dependent -> harder and less flexible to parse
# Qterminal ini file (partial)
[General]
AskOnExit=false
fontSize=10
[MainWindow]
ApplicationTransparency=5
pos=@Point(200 100)
size=@Size(640 480)
JSON: JSON:
• "JavaScript Object Notation"
• deserialization format designed for storing Javascript objects as string
• has no standard way to handle methods/inheritance
• has no hard ties to Javascript
• extremely versatile and flexible
• adopted for many languages
○ extensions
.json
Values:
• starts with white space and ends with white space { 3 }
— string
• "[...]"
— number
• int
• float
• scientific representation
— bool
• true
• false
• null
— array
• [[white_space][value](,[white_space][value](...))[white_space]]
— object
• key-value pairs
• {[white_space]<string>[white_space]:[value](,[white_space]<string>[white_space]:[value](,[...]))}
yaml:
https://web.archive.org/web/20240309094004/https:
• "Yet Another Markup Language"/"Yaml Ain't Markup Language"
• absolute cancer, TOTAL YAML DEATH (see AT the link ABOVE)
○ extensions
.yml
.yaml
• super set of JSON
○ special symbols
# : comment
. --- : document start
... : document end
○ supports 2 syntax styles
— block:
• indentation determines the document structure
• tabs are not allowed
• a single space (difference) is enough
• list items are prefixed by "- "
— flow:
• guarantees JSON compatibility
Extention_refrence_table: Extention_refrence_table:
c : C/C++ file (see AT "/C++/Files")
cc : C/C++ file (see AT "/C++/Files")
C : C/C++ file (see AT "/C++/Files")
cs : C# File (see AT "/C#/Files")
cp : C/C++ file (see AT "/C++/Files")
cpp : C/C++ file (see AT "/C++/Files")
cxx : C/C++ file (see AT "/C++/Files")
c++ : C/C++ file (see AT "/C++/Files")
i : C/C++ file (see AT "/C++/Files")
ii : C/C++ file (see AT "/C++/Files")
o : C/C++ file (see AT "/C++/Files")
h : C/C++ file (see AT "/C++/Files")
hpp : C/C++ file (see AT "/C++/Files")
py : Python script file (see AT "/Python")
pyc : Compiled Python script (see AT "/Python")
sh : Shell script file (see AT "/Bash")
md : Markdown file (see AT "/Documentation/Markdown")
tex : LaTeX document (see AT "/Latex")
html : HTML file (see AT "/HTML")
css : CSS file (see AT "/CSS")
js : JavaScript script file (see AT "/JavaScript")
raw : Raw 3D Mesh; ascii plain text
obj : Waveform 3D object; ascii plain text
ply : Stanford University poligon object file
metadata
#define metadata:: \
__ __ _ _ _ \
| \/ | | | | | | | \
| \ / | ___| |_ __ _ __| | __ _| |_ __ _ \
| |\/| |/ _ \ __/ _` |/ _` |/ _` | __/ _` | \
| | | | __/ || (_| | (_| | (_| | || (_| | \
|_| |_|\___|\__\__,_|\__,_|\__,_|\__\__,_| I
— get fucked buddy:
• the tooling is terrible
• arbirary limitations that does not even fit the task
— seriously now, it should be this easy:
• magic-byte + plaintext dictionary
• reserve single chars of 7-bit ascii for special (encoded/tokenized) keys
• non-alpha order is undefined
• (perhaps) align pairs
The above should be reasonably fast, storage efficient and takes
an afternoon to implement, but big corpo choose sloppy self-deprecating headers
instead for some reason? i dont get it.
id
#define id3::: \
___ ___ ____ \
|_ _| \__ / \
| || || |_ \ \
|___|___/___/ I
• the support is shit
○ fields
+---------------+--------------+
| Field | Bytes |
+---------------+--------------+
| header | 3 |
| title | 30 |
| artist | 30 |
| album | 30 |
| year | 4 |
| comment | 28[7] or 30 |
| zero-byte[7] | 1 |
| track[7] | 1 |
| genre | 1 |
+---------------+--------------+
○ genre
• stored as an int
+---------+------------------------+
| Number | Genre |
+---------+------------------------+
| 00 | Blues |
| 01 | Classic rock |
| 02 | Country |
| 03 | Dance |
| 04 | Disco |
| 05 | Funk |
| 06 | Grunge |
| 07 | Hip-Hop |
| 08 | Jazz |
| 09 | Metal |
| 10 | New Age |
| 11 | Oldies |
| 12 | Other |
| 13 | Pop |
| 14 | Rhythm and Blues |
| 15 | Rap |
| 16 | Reggae |
| 17 | Rock |
| 18 | Techno |
| 19 | Industrial |
| 20 | Alternative |
| 21 | Ska |
| 22 | Death metal |
| 23 | Pranks |
| 24 | Soundtrack |
| 25 | Euro-Techno |
| 26 | Ambient |
| 27 | Trip-Hop |
| 28 | Vocal |
| 29 | Jazz & Funk |
| 30 | Fusion |
| 31 | Trance |
| 32 | Classical |
| 33 | Instrumental |
| 34 | Acid |
| 35 | House |
| 36 | Game |
| 37 | Sound clip |
| 38 | Gospel |
| 39 | Noise |
| 40 | Alternative Rock |
| 41 | Bass |
| 42 | Soul |
| 43 | Punk |
| 44 | Space |
| 45 | Meditative |
| 46 | Instrumental Pop |
| 47 | Instrumental Rock |
| 48 | Ethnic |
| 49 | Gothic |
| 50 | Darkwave |
| 51 | Techno-Industrial |
| 52 | Electronic |
| 53 | Pop-Folk |
| 54 | Eurodance |
| 55 | Dream |
| 56 | Southern Rock |
| 57 | Comedy |
| 58 | Cult |
| 59 | Gangsta |
| 60 | Top 40 |
| 61 | Christian Rap |
| 62 | Pop/Funk |
| 63 | Jungle |
| 64 | Native US |
| 65 | Cabaret |
| 66 | New Wave |
| 67 | Psychedelic |
| 68 | Rave |
| 69 | Show tunes |
| 70 | Trailer |
| 71 | Lo-Fi |
| 72 | Tribal |
| 73 | Acid Punk |
| 74 | Acid Jazz |
| 75 | Polka |
| 76 | Retro |
| 77 | Musical |
| 78 | Rock ’n’ Roll |
| 79 | Hard rock |
| 80 | Folk |
| 81 | Folk-Rock |
| 82 | National Folk |
| 83 | Swing |
| 84 | Fast Fusion |
| 85 | Bebop |
| 86 | Latin |
| 87 | Revival |
| 88 | Celtic |
| 89 | Bluegrass |
| 90 | Avantgarde |
| 91 | Gothic Rock |
| 92 | Progressive Rock |
| 93 | Psychedelic Rock |
| 94 | Symphonic Rock |
| 95 | Slow rock |
| 96 | Big Band |
| 97 | Chorus |
| 98 | Easy Listening |
| 99 | Acoustic |
| 100 | Humour |
| 101 | Speech |
| 102 | Chanson |
| 103 | Opera |
| 104 | Chamber music |
| 105 | Sonata |
| 106 | Symphony |
| 107 | Booty bass |
| 108 | Primus |
| 109 | Porn groove |
| 110 | Satire |
| 111 | Slow jam |
| 112 | Club |
| 113 | Tango |
| 114 | Samba |
| 115 | Folklore |
| 116 | Ballad |
| 117 | Power ballad |
| 118 | Rhythmic Soul |
| 119 | Freestyle |
| 120 | Duet |
| 121 | Punk Rock |
| 122 | Drum solo |
| 123 | A cappella |
| 124 | Euro-House |
| 125 | Dancehall |
| 126 | Goa |
| 127 | Drum & Bass |
| 128 | Club-House |
| 129 | Hardcore Techno |
| 130 | Terror |
| 131 | Indie |
| 132 | BritPop |
| 133 | Negerpunk |
| 134 | Polsk Punk |
| 135 | Beat |
| 136 | Christian Gangsta Rap |
| 137 | Heavy Metal |
| 138 | Black Metal |
| 139 | Crossover |
| 140 | Contemporary Christian |
| 141 | Christian rock |
| 142 | Merengue |
| 143 | Salsa |
| 144 | Thrash Metal |
| 145 | Anime |
| 146 | Jpop |
| 147 | Synthpop |
| 148 | Abstract |
| 149 | Art Rock |
| 150 | Baroque |
| 151 | Bhangra |
| 152 | Big beat |
| 153 | Breakbeat |
| 154 | Chillout |
| 155 | Downtempo |
| 156 | Dub |
| 157 | EBM |
| 158 | Eclectic |
| 159 | Electro |
| 160 | Electroclash |
| 161 | Emo |
| 162 | Experimental |
| 163 | Garage |
| 164 | Global |
| 165 | IDM |
| 166 | Illbient |
| 167 | Industro-Goth |
| 168 | Jam Band |
| 169 | Krautrock |
| 170 | Leftfield |
| 171 | Lounge |
| 172 | Math Rock |
| 173 | New Romantic |
| 174 | Nu-Breakz |
| 175 | Post-Punk |
| 176 | Post-Rock |
| 177 | Psytrance |
| 178 | Shoegaze |
| 179 | Space Rock |
| 180 | Trop Rock |
| 181 | World Music |
| 182 | Neoclassical |
| 183 | Audiobook |
| 184 | Audio theatre |
| 185 | Neue Deutsche Welle |
| 186 | Podcast |
| 187 | Indie-Rock |
| 188 | G-Funk |
| 189 | Dubstep |
| 190 | Garage Rock |
| 191 | Psybient |
+---------+------------------------+
exif
#define exif::: \
I _____ _____ ___ \
I | __\ \/ /_ _| __| \
I | _| > < | || _| \
I |___/_/\_\___|_| I
• "EXchangeable Image File format"
• widely used
— used for storing:
• camera information and settings
• date
• location
• thumbnail
• notes
• copyright
— only defined to be applicable to a handful of formats:
• jpeg
• png
• webp
• tiff
• wav (and variations)
• tools (usually) refuse to operate on non-standard compatible filetypes
Programgs:
exif
exiftool
imageMagik
audio
#define audio:: \
I---------------------------------\
I _ _ \
I /\ | (_) \
I / \ _ _ __| |_ ___ \
I / /\ \| | | |/ _` | |/ _ \ \
I / ____ \ |_| | (_| | | (_) | \
I /_/ \_\__,_|\__,_|_|\___/ \
I---------------------------------I
mp
#define mp3::: \
__ __ ___ ____ \
| \/ | _ \__ / \
| |\/| | _/|_ \ \
|_| |_|_| |___/ I
• compressed
• quality is largely dependent on bit rate
— common bit rates:
128
160
192
256
• metadata: ID3
flac
#define flac\
___ _ _ ___ \
| __| | /_\ / __| \
| _|| |__ / _ \ (__ \
|_| |____/_/ \_\___| I
• "Free Lossless Audio Codec"
• lossless
m
#define m4a\
__ __ _ _ _ \
| \/ | | | /_\ \
| |\/| |_ _/ _ \ \
|_| |_| |_/_/ \_\ I
m
#define m4a\
__ _____ __ \
\ \ / /_\ \ / / \
\ \/\/ / _ \ V / \
\_/\_/_/ \_\_/ I
• "WAVeform audio file"
wma
#define wma\
__ ____ __ _ \
\ \ / / \/ | /_\ \
\ \/\/ /| |\/| |/ _ \ \
\_/\_/ |_| |_/_/ \_\ I
• "Windows Media Audio"
aac
#define aac\
_ _ ___ \
/_\ /_\ / __| \
/ _ \ / _ \ (__ \
/_/ \_\/_/ \_\___| I
• "Advanced Audio Coding"
image
#define image:: \
I--------------------------------------\
I _____ \
I |_ _| \
I | | _ __ ___ __ _ __ _ ___ \
I | | | '_ ` _ \ / _` |/ _` |/ _ \ \
I _| |_| | | | | | (_| | (_| | __/ \
I |_____|_| |_| |_|\__,_|\__, |\___| \
I __/ | \
I |___/ \
I--------------------------------------I
bmp
#define bmp::: \
___ __ __ ___ \
| _ ) \/ | _ \ \
| _ \ |\/| | _/ \
|___/_| |_|_| I
• "BitMaP"
• stores images as an uncompressed array of pixel values
• alpha capable
• lossless
• very widely supported
• implementations are simple
• no decompression -> speed
• ideal for performance sensitive tasks (games)
(exception: ultra HD graphics where even the compressed textures
will take 100s of GiBs to store)
• very large image sizes
gif
#define gif::: \
___ ___ ___ \
/ __|_ _| __| \
| (_ || || _| \
\___|___|_| I
• "Graphic Interchange Format"
• each pixel value is stored as a reference (index value) to a color table
• supports animation
• known as THE animated image format
• only 256 different colors are allowed at any frame
• its techically lossless, but fucks up the colors
○ processing
convert to YCbCr ->
segment ->
— 127 ->
DCT ->
huffman ->
quantification
• widely supported
• reasonable implementation comprexity
• reasonable sizes
• reasonable decoding times
• ideal for transfering unimportant images (memes) over slow connections (early internet)
for slow machines (early PCs)
• mangles colors ruin the image quality
• not ideal for long term storage
png
#define png::: \
___ _ _ ___ \
| _ \ \| |/ __| \
| _/ .` | (_ | \
|_| |_|\_|\___| I
• "Portable Network Graphics"/"PNG is Not GIF"
• RGB only
• lossless
• mildly difficult implementation
• mildly expensive encoding
jpg
#define jpg\
#define jpeg::: \
_ ___ ___ ___ \
_ | | _ \ __/ __| \
| || | _/ _| (_ | \
\__/|_| |___\___| I
• "Joint (P)hotographic Experts Group"
• 8x8 blocks
• 3 channels
• YUV
• compresses well
• lossy ( lossless variant is available)
JP2:
• "JPeg 2000"
• no blocks
• exif metadata swapped out for XML
• 256 channels
• EBCOT compression
• it is possible to store different parts of the same picture using different quality
• decodable multiple resolutions (saving computational time {for thumbnails})
• high implementation complex
• high decoding times
• terrible support (i dont think i saw one in my life)