What is what? · bei.pm
This section contains articles about file formats and reverse engineering.
However, the situation is as follows:
There are many programming languages out there, and many people know certain things by completely different names - or have no idea about the existence of fundamental concepts at all because their programming language abstracts them away.
tl;dr:
My notation is loosely based on C99 <stdint.h>
. Anyone familiar with this notation will certainly manage with mine.
Integer
Integers are simply whole numbers, that is, numbers without a decimal component.
In data formats, integers are generally defined within a fixed range of numbers, essentially a resolution. I specify this in bits, as a "byte" and related types (word, qword, etc.) are usually platform-dependent.
Furthermore, there is a distinction between natural numbers (ℕ, meaning unsigned - Unsigned) and whole numbers (ℤ, meaning signed - Signed).
This information is indicated by a sign in the identifier (u
or s
).
It is possible for signed integers to be represented either as ones' complement or as twos' complement.
Unless otherwise specified, twos' complement is used, as it is the preferred representation in modern computing.
I denote unsigned numbers in my documentation as uint
, followed by the specification of precision in bits.
I denote signed numbers in my documentation as sint
, also followed by the specification of precision in bits.
I refrain from using the data type "char" for characters, as strings typically represent only chains of integer values with a special interpretation.
These are therefore represented as uint(8)[].
Examples:
Notation | C99 stdint.h Equivalent |
Description | Number Range |
---|---|---|---|
uint(16) | uint16_t | Unsigned Integer, 16 Bit Length | 0 - 65,535 |
sint(8) | int8_t | Signed Integer, 8 Bit Length, Two's Complement | -126 - 127 |
uint(24) | uint32_t:24 | Unsigned Integer, 24 Bit Length | 0 - 16,777,216 |
Fixed-point values
Fixed-point values are numerical values from the spectrum of Rational Numbers (Q), which thus have a decimal point and decimal places.
For fixed-point values, the position of the decimal point is fixed by the data type - hence the name.
This also results in a fixed numerical range for numbers of this data type; mathematically expressed, the number space is finite.
In practice, this data type is primarily used on platforms without sufficiently fast floating-point hardware units, as the computation of fixed-point values can be performed using integer units.
The data type is also employed, for instance, by database management systems when fixed requirements need to be met.
One might think of systems for the permanent storage of financial data; most currencies are limited to 2 decimal places.
(However, it is not wise to use fixed-point values for this; it is smarter to store the smallest currency unit directly as an integer and leave the rest of the representation level to the system.)
Similar to integer specifications, I specify the resolution of the number before and after the decimal point for fixed-point values:
ufixed(9,7)
refers to a data type that reserves 9 bits for the integer part before the decimal point and 7 bits for the fractional part after the decimal point; in total, it is 16 bits wide and can thus cover a range from (0,0) to (511,127) as a vector of two independent integers.
However, this interpretation would waste 28 numbers in its decimal representation, as one would likely limit oneself to a maximum of (511,99) in practice.
Instead of a direct interpretation of the fixed-point value as a vector of 2 separate integers - which almost always results in an unused data area when converting to decimal numbers and a manual carry - the fractional part can also be interpreted as a fraction of its entire resolution.
Taking the aforementioned ufixed(9,7)
as an example, this results in a fraction with a denominator of 27 - the number range then goes from 0.00 to 511 + 126⁄127.
To convert to a decimal representation, the fractional part would therefore need to be divided by 128.
This variant allows for simpler arithmetic operations, as the carry occurs automatically, making this variant generally preferred.
However, this variant has the disadvantage that the decimal places in the decimal representation no longer have a guaranteed resolution, meaning a single decimal place no longer represents the value 0.01
, but 0.007874
, which will lead to corresponding rounding errors.
The interpretation used will be documented accordingly at the point of use.
Floating-point values
Floating-point values are mathematically more complex expressions, where an integer with fixed resolution is represented via a mathematical term such that the fractional part is effectively formed by shifting - thereby directly aligning with scientific notation.
The most common way to implement this was standardised with IEEE 754 and has since been internationally recognised.
A floating-point value typically consists of the following components:
Sign (0 or 1 ) |
Exponent | Mantissa |
While the sign can be easily derived as a binary yes/no information, the actual number is formed through the equation
Mantissa * 2Exponent
In addition, there are a number of constant values that cover special cases of rational numbers - including ±∞
and NaN
("not a valid number").
Floating-point values are particularly useful when the precision is not so important, as this type of value inevitably leads to rounding errors and thus loss of accuracy. Typically, floating-point values are used, for example, to define coordinates, such as vertex vectors in 3D models or Bézier/Spline curves for visual representation purposes.
In data formats, floating-point values are specified as float(Mantissa, Exponent)
.
If a format deviating from IEEE 754 is used, this is indicated accordingly.