Web lists-archives.com

Bug#927362: ITP: blingfire -- lightning fast Finite State machine and REgular expression manipulation library




Package: wnpp
Severity: wishlist
Owner: Mo Zhou <lumin@xxxxxxxxxx>

* Package name    : blingfire
  Version         : git-HEAD
  Upstream Author : Microsoft
* URL             : https://github.com/Microsoft/BlingFire
* License         : MIT
  Programming Lang: C++, Python, Perl, Batch, etc
  Description     : lightning fast Finite State machine and REgular expression manipulation library

Blingfire provides more than a fast natural language tokenizer. From the
benchmarking data its tokenizing speed seems to be much faster than that
of SpaCy or NLTK.  Unlike NLTK or SpaCy, Blingfire seemingly works
without downloaded blobs. This tool might be useful to Enrico[1] as
well, and would possibly make him happy[2].

I'll first give it a try and put it to DUPR. And decide whether this
should really enter the archive after code inspection.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925294
[2] If we don't think too much about the upstream name.