[Libre-soc-dev] validating utf-8 using vector processing

Jacob Lifshay programmerjake at gmail.com
Tue Oct 20 02:37:09 BST 2020


On Mon, Oct 19, 2020 at 4:40 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Tue, Oct 20, 2020 at 12:13 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> >
> > https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/
>
> neat trick.  spot the 8th bit in batches, process it afterwards.

Turns out that is a somewhat different link than the one I thought I
linked to, the one I intended to link to covers full vectorized
validation including non-ASCII UTF-8, rather than just vectorizing a
check-for-ASCII.

I think this is it:
https://arxiv.org/pdf/2010.03090.pdf

Jacob



More information about the Libre-soc-dev mailing list