Re: Microsoft Does It Again
- Date: Tue, 21 Aug 2018 21:16:44 +0300
- From: Reco <recoverym4n@xxxxxxxxx>
- Subject: Re: Microsoft Does It Again
On Tue, Aug 21, 2018 at 06:28:57PM +0200, tomas@xxxxxxxxxx wrote:
> On Tue, Aug 21, 2018 at 07:02:32PM +0300, Reco wrote:
> > On Tue, Aug 21, 2018 at 05:48:31PM +0200, tomas@xxxxxxxxxx wrote:
> > > tomas@trotzki:~$ apt search ooxml
> > > Sorting... Done
> > > Full Text Search... Done
> > > docx2txt/stable,stable,stable 1.4-0.1 all
> > > Convert Microsoft OOXML files to plain text
> > Not relevant. Input is xlsx.
> Well, xlsx *is* OOXML (I like to call it "MOOXML" as in
> "Microsoft's..." -- you get the idea :)
That's like saying that apples and oranges are both fruits.
I.e. that's truth, but one does not compare apples to oranges usually.
Both docx and xlsx are zip archives with xml inside. Their parsing is
different, and applying parsing rules from one to another yields no
Parsing docx is easy, even I can do it (and did it, actually).
Parsing xlsx with all its gross formulas (sp?), macros and arcane date
formats is the definition of pain. I gave it up and became a happy