https://github.com/torvalds/linux/blob/master/include/math-e...
I wouldn't want to lose the Linux humor tho!
I found this snapshot of it, though it's not on the real Dilbert site: https://www.reddit.com/r/linux/comments/73in9/computer_holy_...
fuckin bravo
And I was tasked with reading a tape with binary data in 8-bit format. Hilarity ensued.
10-bit C, however, ..........
9-bit bytes are pretty common in block RAM though, with the extra bit being used for either for ECC or user storage.
1. bytes are 8 bits
2. shorts are 16 bits
3. ints are 32 bits
4. longs are 64 bits
5. arithmetic is 2's complement
6. IEEE floating point
and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!
Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.
1. u8 and i8 are 8 bits.
2. u16 and i16 are 16 bits.
3. u32 and i32 are 32 bits.
4. u64 and i64 are 64 bits.
5. Arithmetic is an explicit choice. '+' overflowing is illegal behavior (will crash in debug and releasesafe), '+%' is 2's compliment wrapping, and '+|' is saturating arithmetic.
6. f16, f32, f64, f80, and f128 are the respective but length IEEE floating point types.
The question of the length of a byte doesn't even matter. If someone wants to compile to machine whose bytes are 12 bits, just use u12 and i12.
How big is a bit?
The latter value is clearly less common than 0 and 1, but how much less? I don't know, but we have to conclude that the true size of a bit is probably something more like 1.00000000000000001 bits rather than 1 bit.
A quarter nybble.
ARM is similar, ARM processors define a word as 32-bits, even on 64-bit ARM processors, but they are also byte addressable.
As best as I can tell, it seems like a word is whatever the size of the arithmetic or general purpose register is at the time that the processor was introduced, and even if later a new processor is introduced with larger registers, for backwards compatibility the size of a word remains the same.
Lots of CISC architectures allow memory accesses in various units even if they call general-purpose-register-sized quantities "word".
Iirc the C standard specifies that all memory can be accessed via char*.
byte = 8 bits
short = 16
int = 32
long = 64
float = 32 bit IEEE
double = 64 bit IEEE
On the C++ side, I sometimes use an alias that contains the word "short" for 32-bit integers. When I use them, I'm explicitly assuming that the numbers are small enough to fit in a smaller than usual integer type, and that it's critical enough to performance that the assumption is worth making.
It's unbelievably ugly. Every piece of code working with any kind of integer screams "I am hardware dependent in some way".
E.g. in a structure representing an automobile, the number of wheels has to be some i8 or i16, which looks ridiculous.
Why would you take a language in which you can write functional pipelines over collections of objects, and make it look like assembler.
If you do care, then isn't it better to specify it explicitly than trying to guess it and having different compilers disagreeing on the size?
But it’s not alone in that mistake. All the languages invented in that era made the same mistake. (C#, JavaScript, etc).
When D was first implemented, circa 2000, it wasn't clear whether UTF-8, UTF-16, or UTF-32 was going to be the winner. So D supported all three.
https://thephd.dev/conformance-should-mean-something-fputc-a...
Me? I just dabble with documenting an unimplemented "50% more bits per byte than the competition!" 12-bit fantasy console of my own invention - replete with inventions such as "UTF-12" - for shits and giggles.
Writing geophysical | military signal and image processing applications on custom DSP clusters is suprisingly straightforward and doesn't need C++.
It's a RISC architecture optimised for DSP | FFT | Array processing with the basic simplification that char text is for hosts, integers and floats are at least 32 bit and 32 bits (or 64) is the smallest addressable unit.
Fantastic architecture to work with for numerics, deep computational pipelines, once "primed" you push in raw aquisition samples in chunks every clock cycle and extract processed moving window data chunks every clock cycle.
A single ASM instruction in a cycle can accumulate totals from vector multiplication and modulo update indexes on three vectors (two inputs and and out).
Not your mama's brainfuck.
Honest question, haven't followed closely. rand() is broken,I;m told unfixable and last I heard still wasn't deprecated.
Is this proposal a test? "Can we even drop support for a solution to a problem literally nobody has?"
* p2809 Trivial infinite loops are not Undefined Behavior * p1152 Deprecating volatile * p0907 Signed Integers are Two's Complement * p2723 Zero-initialize objects of automatic storage duration * p2186 Removing Garbage Collection Support
So it is possible to change things!
Don’t break perfection!! Just accumulate more perfection.
What we need is a new C++ symbol that reliably references eight bit bytes, without breaking compatibility, or wasting annnnnny opportunity to expand the kitchen sink once again.
I propose “unsigned byte8” and (2’s complement) “signed byte8”. And “byte8” with undefined sign behavior because we can always use some more spice.
“unsigned decimal byte8” and “signed decimal byte8”, would limit legal values to 0 to 10 and -10 to +10.
For the damn accountants.
“unsigned centimal byte8” and “signed centimal byte8”, would limit legal values to 0 to 100 and -100 to +100.
For the damn accountants who care about the cost of bytes.
Also for a statistically almost valid, good enough for your customer’s alpha, data type for “age” fields in databases.
And “float byte8” obviously.
Finally! A language that can calculate my S3 bill
https://lscs-software.com/LsCs-Manifesto.html
https://news.ycombinator.com/item?id=41614949
Edit: Fixed typo pointed out by child.
I do believe you meant to write "cardinal sin," good sir. Unless Qt has not only become sentient but also corporeal when I wasn't looking and gotten close and personal with the C++ standard...
> It's a desktop on a Linux distro meant to create devices to better/save lives.
If you are creating life critical medical devices you should not be using linux.
https://theminimumyouneedtoknow.com/
https://lscs-software.com/LsCs-Roadmap.html
"Many of us got our first exposure to Qt on OS/2 in or around 1987."
Uh huh.
> someone always has a use case;
No he doesn't. He's just unhinged. The machines this dude bitches about don't even have a modern C++ compiler nor do they support any kind of display system relevant to Qt. They're never going to be a target for Qt. Further irony is this dude proudly proclaims this fork will support nothing but Wayland and Vulkan on Linux.
"the smaller processors like those in sensors, are 1's complement for a reason."
The "reason" is never explained.
"Why? Because nothing is faster when it comes to straight addition and subtraction of financial values in scaled integers. (Possibly packed decimal too, but uncertain on that.)"
Is this a justification for using Unisys mainframes, or is the implication that they are fastest (not that this is even close to being true - as any dinosaurs are decomissioned they're fucking replaced with x86 Xeon hardware running emulation, I don't think Unisys makes any non x86 hardware anymore) because of 1's complement? Anyway, may need to refresh that CS education.
There's some rambling about the justification being data conversion, but what serialization protocols mandate 1's complement anyway, and if those exist someone has already implemented 2's complement supporting libraries for the past 50 years since that has been the overwhelming status quo. We somehow manage to deal with endianness and decimal conversions as well.
"Passing 2's complement data to backend systems or front end sensors expecting 1's complement causes catastrophes."
99.999% of every system MIPS, ARM, x86, Power, etc for the last 40 years uses 2's complement, so this has been the normal state of the world since forever.
Also the enterpriseist of languages, Java somehow has survived mandating 2's complement.
This is all very unhinged.
I'm not holding my breath to see this ancient Qt fork fully converted to "modified" Barr spec but that will be a hoot.
> A byte is 8 bits, which is at least large enough to contain the ordinary literal encoding of any element of the basic character set literal character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is bits in a byte.
But instead of the "and is composed" ending, it feels like you'd change the intro to say that "A byte is 8 contiguous bits, which is".
We can also remove the "at least", since that was there to imply a requirement on the number of bits being large enough for UTF-8.
Personally, I'd make a "A byte is 8 contiguous bits." a standalone sentence. Then explain as follow up that "A byte is large enough to contain...".
Less than a decade ago I worked with something like that: the TeakLite III DSP from CEVA.
At the same time, a `byte` is already an "alias" for `char` since C++17 anyway[1].
std::cout << (int8_t)32 << std::endl; //should print 32 dang it
std::cout << (std::byte)32 << std::endl;
because there is no default operator<< defined.- CHAR_BIT cannot go away; reams of code references it.
- You still need the constant 8. It's better if it has a name.
- Neither the C nor C++ standard will be simplified if CHAR_BIT is declared to be 8. Only a few passages will change. Just, certain possible implementations will be rendered nonconforming.
- There are specialized platforms with C compilers, such as DSP chips, that are not byte addressable machines. They are in current use; they are not museum pieces.
Honestly kind of surprised it was relavent as late as 2004. I thought the era of non 8-bit bytes was like 1970s or earlier.
Yes, indexing strings of 6-bit FIELDATA characters was a huge headache. UNIVAC had the unfortunate problem of having to settle on a character code in the early 1960s, before ASCII was standardized. At the time, a military 6-bit character set looked like the next big thing. It was better than IBM's code, which mapped to punch card holes and the letters weren't all in one block.
[1] https://www.unisys.com/siteassets/collateral/info-sheets/inf...
so delegating such by now very very edge cases to non standard C seems fine, i.e. seems to IMHO not change much at all in practice
and C/C++ compilers are anyway full of non standard extensions and it's not that CHAR_BIT go away or you as a non-standard extension assume it might not be 8
Which is the real reason why 8-bits should be adopted as the standard byte size.
I didn't even realize that the byte was defined as anything other than 8-bits until recently. I have known, for decades, that there were non-8-bit character encodings (including ASCII) and word sizes were all over the map (including some where word size % 8 != 0). Enough thought about that last point should have helped me realize that there were machines where the byte was not 8-bits, yet the rarity of encountering such systems left me with the incorrect notion that a byte was defined as 8-bits.
Now if someone with enough background to figure it out doesn't figure it out, how can someone without that background figure it out? Someone who has only experienced systems with 8-bit bytes. Someone who has only read books that make the explicit assumption of 8-bit bytes (which virtually every book does). Anything they write has the potential of breaking on systems with a different byte size. The idea of writing portable code because the compiler itself is "standards compliant" breaks down. You probably should modify the standard to ensure the code remains portable by either forcing the compiler for non-8-bit systems to handle the exceptions, or simply admitting that compiler does not portable code for non-8-bit systems.
Given that Wikipedia says UNIVAC was discontinued in 1986 I’m pretty sure the answer is no and no!
Its OS, OS 2200, does have a C compiler. Not sure if there ever was a C++ compiler, if there once was it is no longer around. But that C compiler is not being kept up to date with the latest standards, it only officially supports C89/C90 - this is a deeply legacy system, most application software is written in COBOL and the OS itself itself is mainly written in assembler and a proprietary Pascal-like language called “PLUS”. They might add some features from newer standards if particularly valuable, but formal compliance with C99/C11/C17/C23/etc is not a goal.
The OS does contain components written in C++, most notably the HotSpot JVM. However, from what I understand, the JVM actually runs in x86-64 Linux processes on the host system, outside of the emulated mainframe environment, but the mainframe emulator is integrated with those Linux processes so they can access mainframe files/data/apps.
To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?
Another part of it is the fact that it's a lot easier to represent stuff with hex if the bytes line up.
I can represent "255" with "0xFF" which fits nice and neat in 1 byte. However, now if a byte is 10bits that hex no longer really works. You have 1024 values to represent. The max value would be 0x3FF which just looks funky.
Coming up with an alphanumeric system to represent 2^10 cleanly just ends up weird and unintuitive.
On the other hand, if computing settled on a three-valued logic (e.g. 0/1/«something» where «something» has been proposed as -1, «undefined»/«unknown»/«undecided» or a «shade of grey»), we would have had 9 bit bytes (a power of three).
10 was tried numerous times at the dawn of computing and… it was found too unwieldy in the circuit design.
Is this true? 4 ternary bits give you really convenient base 12 which has a lot of desirable properties for things like multiplication and fixed point. Though I have no idea what ternary building blocks would look like so it’s hard to visualize potential hardware.
I have certainly heard an argument that ternary logic would have been a better choice, if it won over, but it is history now, and we are left with the vestiges of the ternary logic in SQL (NULL values which are semantically «no value» / «undefined» values).
Nybble - 4 bits
Byte - 8 bits
Snyack - 16 bits
Lyunch - 32 bits
Dynner - 64 bits
(Ok,. I guess there's a difference between bits and hob-bits)
The land of x86 goes to great pains to eliminate the concept of a word at a silicon cost.
Or are you suggesting to increase the size of a byte until it's the same size as a word, and merge both concepts ?
const char word[] = {‘w’, ‘o’, ‘r’, ‘d’};
assert(sizeof word == 4);
Specifically, has there even been a C++ compiler on a system where bytes weren't 8 bits? If so, when was it last updated?
#define SCHAR_MIN -127
#define SCHAR_MAX 128
Is this two typos or am I missing the joke?Jean-Luc Picard
I would be amazed if there's any even remotely relevant code that deals meaningfully with CHAR_BIT != 8 these days.
(... and yes, it's about time.)
This is so old it predates ANSI C; it's in K&R C. It used to show up on various academic sites. Now it's obsolete enough to have scrolled off Google. I've seen copies of this on various academic sites over the years, but it seems to have finally scrolled off.
I think we can dispense with non 8-bit bytes at this point.
Edit: I see TFA mentions them but questions how relevant C++ is in that sort of embedded environment.
For some DSP-ish sort of processors I think it doesn't make sense to have addressability at char level, and the gates to support it would be better spent on better 16 and 32 bit multipliers. ::shrugs::
I feel kind of ambivalent about the standards proposal. We already have fixed size types. If you want/need an exact type, that already exists. The non-fixed size types set minimums and allow platforms to set larger sizes for performance reasons.
Having no fast 8-bit level access is a perfectly reasonable decision for a small DSP.
Might it be better instead to migrate many users of char to (u)int8_t?
The proposed alternative of CHAR_BIT congruent to 0 mod 8 also sounds pretty reasonable, in that it captures the existing non-8-bit char platforms and also the justification for non-8-bit char platforms (that if you're not doing much string processing but instead doing all math processing, the additional hardware for efficient 8 bit access is a total waste).
It’s tricycle and tripod, not ticycle.
One fun fact I found the other day: ASCII is 7 bits, but when it was used with punch cards there was an 8th bit to make sure you didn't punch the wrong number of holes. https://rabbit.eng.miami.edu/info/ascii.html
Parity is for paper tape, not punched cards. Paper tape parity was never standardized. Nor was parity for 8-bit ASCII communications. Which is why there were devices with settings for EVEN, ODD, ZERO, and ONE for the 8th bit.
Punched cards have their very own encodings, only of historical interest.
I've only programmed in high level programming languages in 8-bit-byte machines. I can't understand what you mean by this sentence.
So in a 36-bit CPU a word is 36 bits. And a byte isn't a word. But what is a word and how does it differ from a byte?
If you asked me what 32-bit/64-bit means in a CPU, I'd say it's how large memory addresses can be. Is that true for 36-bit CPUs or does it mean something else? If it's something else, then that means 64-bit isn't the "word" of a 64-bit CPU, so what would the word be?
This is all very confusing.
https://man7.org/linux/man-pages/man3/fgetc.3.html
fgetc(3) and its companions always return character-by-character input as an int, and the reason is that EOF is represented as -1. An unsigned char is unable to represent EOF. If you're using the wrong return value, you'll never detect this condition.
However, if you don't receive an EOF, then it should be perfectly fine to cast the value to unsigned char without loss of precision.