Formatting Gemtext for Gopher [II] ================================== Recently I have been contemplating mirroring my capsule on Gopher. To do that I would need to recreate the landing page as a Gophermap but what about the posts? Since they are plain text I initially thought that I could perhaps just present the .gmi files directly. Sadly not. * Gopher posts should be hard line wrapped (ideally less than 80 characters), while Gemtext is not wrapped. * I have included a bunch of UTF-8 characters in my posts. For widespread compatibility in Gopher, 'pure' ASCII would be better. * While the raw formatting of Gemtext is quite neat, I feel it could still be improved when read directly. In summary, I need to reformat Gemtext before serving the posts on Gopher. On the plus side, since one of big benefits of Gemtext is its simplicity, I quickly realised I could do that with just a few lines of shell script, using 'sed' and 'fmt'. TL;DR ~ Sample Gemtext file: /files/gmi2txt-sample.gmi ~ Sample Gemtext file - reformatted: /files/gmi2txt-sample.txt ~ Shell script filter to reformat .gmi files [EDIT 2022-01-05 12:27: tweaked]: https://gist.githubusercontent.com/ruario/3bd570d265ca5a42cb039092ed4f1299/raw/5323b3880c55d0f679eb24400b053865dfbb413c/gmi2txt.sh [EDIT 2022-01-08] To use, make it executable, then you pipe or redirect the Gemtext in. $ ./gmi2txt.sh < yourfile.gmi Handling non-ASCII ------------------ Some Gopher servers and clients can actually handle UTF-8 but it is by no means universal and likely fairly rare, at least on the (reader's) client side, which I cannot control. I did recently note that if I look at recent posts by Alex Schroeder on his Gopher site in either VF-1 or Lagrange, I sometimes see characters like "EUR" and even the odd emoji. Interestingly, if I browse the same site using a client like Lynx, the characters get replaced--[EDIT 2022-01-05] Lynx can support UTF-8: S. Comments - 2022-01-05 03:32. No doubt there is some 'magic' going on the server side, to understand what the client is capable of and then doing automatic replacements as needed. ~ 2021-12-25 Donations - Alex Schroeder: gopher://alexschroeder.ch:70/0page/2021-12-25%20Donations ~ 2021-12-26 The confusing world of Reddit - Alex Schroeder: gopher://alexschroeder.ch:70/0page/2021-12-26%20The%20confusing%20world%20of%20Reddit So there are two ways I could handle UTF-8 characters. * Find out more about Alex's setup * Roll my own conversion The latter is perhaps not as daunting as it sounds, since I would be making this for my own personal usage and thus only need to handle the characters that I regularly use. The other nice thing with doing this myself is that I can decide exactly what characters are replaced with and I can create a uniform experience across all Gopher clients. Simply piping through sed would allow me to convert a bunch of characters, e.g. '-e s/[:D:D:D]/:D/g'. Yes some of the 'subtlety' of those different emojis is lost but... 'does it matter?'. I could try and think of a clever (ASCII only) emoticon for something like '[shrugs]' or I could just do 's/[shrugs]/[shrug]/g'. Alternatively, I may decide that my usage of emojis is largely for decoration and wipe them out altogether (s/[:D:D:D[shrugs]]//g). If I do it myself, I can also update and tweak these replacements and deletions going forward as my usage and opinions on the matter change. Handling Gemtext ---------------- Gemtext is designed to be parsed at line level. [EDIT 14:58: clarified] Seven of the eight line types (roughly equivalent to:

,

,

, [