luke.leighton at gmail.com
Thu Sep 12 04:21:14 BST 2019
On Wednesday, September 11, 2019 at 9:18:29 PM UTC+1, Bruce Hoult wrote:
With regard to Luke's suggestion, it might be reasonable to add a mode
> saying you're prepared to accept worse then 0.5 ULP accuracy, perhaps
> with a few options: 1, 2, 4, 16 or something like that.
will add that to https://libre-riscv.org/zfpacc_proposal/
> implementation would always be free to calculate to a higher ULP than
> you asked for.
added, mentioning proviso that speed/latency benefits are lost by doing so.
> We do intend to have half precision FP in the scalar instruction set,
> as it doesn't make sense to have it in the vector instruction set but
> not scalar. The problem is there are both FP16 (scaled down IEEE with
> 5 bit exponent and 10 bit fraction) and bfloat16 which is FP32 with
> the lower 16 bits truncated (i.e. the same 8 exponent bits as FP32,
> but 23-16 = 7 fraction bits).
well... looking at the IEEE754-2019 standard, bfloat16 isn't listed. OT
for this thread, i'd be inclined to suggest bfloat16 would best be its own
extension. to save on opcode-duplication, a separate FCSR bit could be
used (yes, it means setting it, performing the op, then setting it back,
which is a pain).
also, although initially i thought that it would be impossible to perform
conversions from FP32 to bfloat16 because there's no opcode for it, it
could be done using integer LD (or FMV.X.W) followed by clearing (~&ing)
the low 16 bits then FMV.W.X to get it back again.
is that a sequence worth adding an entire new opcode for?
(FP64 to bfloat16 and FP16 to bfloat16 would - hypothetically - be covered
by setting the FCSR bit, which would treat FP32 as if it was bfloat16
within the ALU, and consequently FCVT.D.S would, instead of performing FP64
to FP32, would perform FP64 to bfloat16 and so on).
all a bit awkward, but there's just not enough bits in the fmt field to
specify bfloat16 as well.
More information about the libre-riscv-dev