Bashcpio - pure bash (almost) cpio archive extractionSun 05 June 2016 by Fred Clift
Ok - let's get this out of the way. Is this important? No. Is it groundbreaking? No. Can I even explain how cool it is to non-technical friends? No. Could it ever possibly be useful? Maybe! see below.
I just spent a few evenings writing a pure-bash cpio extraction implementation. My target was centos/fedora/redhat rpms, which means gzip/xz compression, written with the so-called 'New-Ascii' header format. The other formats wouldn't be that hard to support but I would guess that 98% of any use of cpio in the year 2016 is related to some kind of rpm package.
And so is born bashcpio - check it out on github:
Also, it's not quite 'pure' bash - but no bash script ever is. I worked hard to ensure that the external dependencies were kept to a bare minimum.
In addition to an actual bash binary, you need a few external dependencies:
dd, mkdir, dirname, chown, chmod, date, touch, ln
There are a couple that might be considered optional. I could probably write some tortured complicated code to replace dirname that has no dependencies on external binaries, though I have not (yet?) done so. If you don't care about time-stamps on files and directories that you extract, date and touch could both go away. The rest though, are pretty difficult to avoid using if you want it to work at all.
Bash (and some other shell utilities that you might consider, like awk and sed) all have this nasty habit of "eating" null bytes in anything they read. So for instance, you can't place the the content of a binary file in a bash variable. You'll get everything up to the first null value. This makes dealing with arbitrary binary data with bash problematic at best. If however, you can be reasonably sure that some portions of your file will be ascii, then you can carefully operate on those parts with bash, using the assistance of dd.
In this case, the header format I'm trying to read is guaranteed to be 100 bytes, followed by a C-style string, and a null terminator, followed by some padding, a bunch of arbitrary data and some padding. The saving grace is that you know where the ascii header starts, how long it is, and inside there is data about how long the file name is, and how long the file is. With that, you can read a header and then safely read the filename, the sizes, and all the other header info.
How? I an pick apart the file with dd, carefully using the count= and skip= flags.
All the rest of the external dependencies are for use in creating and modifying the extracted files, setting ownership, group, permissions, mtime, etc.
I'm toying with the idea of adding a centos target to crouton
created/maintained by David Schneider, with many many other contributors.
Crouton is a tool for making (debian-derived) linux chroots that will run under chromeos in dev mode. (I'll talk more about this in some later blog post). Crouton is an awesome tool and it's quite useful, but I use a lot of yum rpm and not a lot of apt and deb.
On the chromeos devices there is a very minimal linux install that does little besides running chrome browser full time. Many common things you'd expect to be at your fingertips at a bashprompt are missing. Crouton fixes this. And it bootstraps itself using pretty much only bash and an 'ar' extractor written in bash, since that is close to all that is available to start with.
Seeing the clever ar.sh implementation, I got thinking about how one would do the same with rpm based distros... and bashcpio was born. I started with a careful inventory of what shell binaries I had access to, and bash + dependencies does the trick. This of course does not give you a centos chroot - it is only one of several necessary tools. But I had fun making it work, except when I was trying to get the padding right. Ugh, how irritating. I'm no expert bash programmer though I know a lot more now than I used to.
My initial implementation only took about 80x as long as the x86_64 debian trusty binary that I got out of a crouton chroot. The current implementation is much faster now and it now only takes approximately 19 times as long at the C-binary. To get a feel for it's performance, I grabbed the 350ish packages that make up a current centos 7 minimal install. It takes less than 4 minutes to unpack them.
Please feel free to send me feedback on how crazy you think I am, or on what glaring deficiency my code has. I'm eager to talk about the ship I built in a bottle.
Apple mail app TLS deficiencies revisited
A quick update:
Mac OS X El Capitan has hit the streets so I retested. The mail app still can not make any TLS connections using TLSv1.1 or 1.2.
Also a /u/Tulsagrammer on Reddit pointed out to me that a the mystery of why this is so ...read more
Frustration with Apple mail app on IOS and Yosemite
At work I recently have been irritated by a problem that was exposed with Apple's mail app, both on IOS and Yosemite. Among other things, I maintain an imap server (using dovecot ) for our office email.
First some background
Dovecot makes it easy to enable TLS, and disallow unencrypted ...read more
What do you get from a $200 multimeter?
I was recently discussing with friends how I like Apple computers (e.g. the MacBook Pro I use for work) but have a hard time justifying the extra cost when in many cases I can get something 'good enough' for half the price.
And sometime around that conversation, the fact ...read more
Not Sharing python command history between python2 and python3
I'm finally getting around to switching most-of-the-time to python3.
One minor annoyance was that I had to fix my simple .pythonrc.py file that I use to turn on command history for python2.
In python 3, it appears that readline, tab-completion, and command history between sessions are automatically enabled ...read more
Chromecast - This is why we can't have nice things
There was a recent chromecast firmware update from google. This wasn't the long awaited new functionality promised at IO. It ostensibly has better tab-casting performance and some bug fixes (e.g. change in some apis related to subtitles, among others).
There has been some speculation that the reason for ...read more
Chromecast - steps closer to a python native api
So, after seeing this: https://gist.github.com/TheCrazyT/11263599 I got more interested in being able to speak the native chromecast api from python.
That lead me to this presentation by Sebastian Mauer: http://www.slideshare.net/mauimauer/chrome-cast-and-android-tv-add14 especially slide 20, which lead to a bit more google ...read more
Chromecast - Displaying arbitrary URLs using pychromecast
Continuing on with my Chromecast experiments... I have been playing with a python library on github by Paulus Schoutsen called pychromecast. At various points in it's life it has been able to interact with Chromecast devices to do a variety of things. With the official SDK release and firmware ...read more
Chromecast - both cool and frustrating
I recently purchased a Chromecast device. For the price it's a great media streamer. You can control it from your Android or IOS smartphone from many different apps. For chrome browser, you need to install a browser plugin. There are many websites that, when viewed in Chrome browser give ...read more
Notes on chrome remote debugging
These are mostly notes for me, but you might find them useful also.
On my laptop, I run chrome and usually have many, many tabs open across a few windows. Google searching for me usually ends up with me open-in-new-tabbing the first 5 or 10 links concurrently... It bugs me ...read more
Why the Nook won my dollars over Kindle
Summer Vacation Time.
So I vacation on the beach every summer. I read a lot. This year I decided I would try out an ebook reader. Because the plan is to be outdoors on the beach, E-Ink readers seem to be the way to go. There are two big names ...read more
Managing Lots Of Pregenerated HTML And Other Files With Pelican
For one of the Pelican-managed websites I maintain, I have a lot of files that I don't really want to manage with Pelican, and in some cases, I can't easily, without lots of gymnastics.
In this case, I have about 30k html files that are a dump of ...read more
Why I wont fix your computer, part 3
Why I wont fix your computer, part 2
Embedding PHP in Pelican-generated Static pages
So, I wanted to make a simple website with pelican. But I had a little legacy php code that I still wanted to function.
It sure would be nice, I thought, if I could take my php app and with only small tweaks make it work as PART of pelican ...read more
Trying out Pelican static site generator.
I'm running a couple of websites with pelican static website generator because it's easy to maintain, and lightweight, and kind of futurer-proof.
So I've tried a bunch of tools over the years to make personal web pages. I hand-rolled html (for a class) in the ...read more
Why I wont fix your computer, part 1
How not to upgrade a server
I was working at a unix admin at a private university. A research lab wanted an OS upgrade on their lab NFS and web server, which was indirectly my responsibility. User data, webserver, web content, all on an external (scsi) drive. I showed up ...read more