Formatting Gemtext for Gopher ♊


Recently I have been contemplating mirroring my capsule on Gopher. To do that I would need to recreate the landing page as a Gophermap but what about the posts? Since they are plain text I initially thought that I could perhaps just present the .gmi files directly. Sadly not.



In summary, I need to reformat Gemtext before serving the posts on Gopher. On the plus side, since one of big benefits of Gemtext is its simplicity, I quickly realised I could do that with just a few lines of shell script, using 'sed' and 'fmt'.


TL;DR


Sample Gemtext file

Sample Gemtext file - reformatted

Shell script filter to reformat .gmi files [✍ 2022-01-05 12:27: tweaked]


[✍ 2022-01-08] To use, make it executable, then you pipe or redirect the Gemtext in.


$ ./gmi2txt.sh < yourfile.gmi

Handling non-ASCII


Some Gopher servers and clients can actually handle UTF-8 but it is by no means universal and likely fairly rare, at least on the (reader's) client side, which I cannot control. I did recently note that if I look at recent posts by Alex Schroeder on his Gopher site in either VF-1 or Lagrange, I sometimes see characters like "€" and even the odd emoji. Interestingly, if I browse the same site using a client like Lynx, the characters get replaced—[✍ 2022-01-05] Lynx can support UTF-8: § Comments - 2022-01-05 03:32. No doubt there is some 'magic' going on the server side, to understand what the client is capable of and then doing automatic replacements as needed.


2021-12-25 Donations - Alex Schroeder

2021-12-26 The confusing world of Reddit - Alex Schroeder


So there are two ways I could handle UTF-8 characters.



The latter is perhaps not as daunting as it sounds, since I would be making this for my own personal usage and thus only need to handle the characters that I regularly use. The other nice thing with doing this myself is that I can decide exactly what characters are replaced with and I can create a uniform experience across all Gopher clients.


Simply piping through sed would allow me to convert a bunch of characters, e.g. '-e s/[😀😃😁]/:D/g'. Yes some of the 'subtlety' of those different emojis is lost but… 'does it matter?'. I could try and think of a clever (ASCII only) emoticon for something like '🤷' or I could just do 's/🤷/[shrug]/g'. Alternatively, I may decide that my usage of emojis is largely for decoration and wipe them out altogether (s/[😀😃😁🤷]//g). If I do it myself, I can also update and tweak these replacements and deletions going forward as my usage and opinions on the matter change.


Handling Gemtext


Gemtext is designed to be parsed at line level. [✍ 14:58: clarified] Seven of the eight line types (roughly equivalent to: <h1>, <h2>, <h3>, [<ul>]<li>, <a>, <blockquote>, <pre>) start with a recognisable pattern, so it is easy to detect and apply different formatting to each of them. The last one (similar to: <p>) can start with any character but this is detected by virtue of not being one of the others.


Heading: Level 1


These lines start with a single '#'.


# My Post

I feel it looks clearer to remove this and underline them with '='.


My Post
=======

Heading: Level 2


These lines start with two '#' characters.


## Subsection

Again, a simple underline with '-' looks very clean and is arguably more in keeping with Gopher.


Subsection
----------

Heading: Level 3


These lines start with three '#' characters.


### Sub Subsection

Starting and ending these lines with '-' retains the clean feel of the other two heading types, while keeping them less prominent.


-Sub Subsection-

Lists


Lists begin with '* ' (including the space).


* Item 1: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
* Item 2

I indent any wrapping that takes place. In addition, I add newline spacing between each bullet. This makes multiline, wrapped bullet points more readable (IMHO).


* Item 1: Lorem ipsum dolor sit amet, consectetur adipiscing elit,
  sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

* Item 2


Links start with '=>' followed by the link and the title/description. However, this can make them long and can 'bury' the URL slightly.


=> gopher://example.com This is a cool site

I would like the URLs to stand out by being on their own line. This is particularly important on Gopher where most clients do not extract links embedded in pure text or make them directly 'clickable'. By having them on their own line you get the next best thing, as you can quickly select a complete line via a triple click, thus making them far easier to copy and paste [✍ 2022-01-05: clarified the benefit].


~ This is a cool site:
  gopher://example.com

Quotes


Quotes start with '>'.


 > Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Strictly speaking it would probably be 'most correct' to keep these largely as they are, since this concept of quoting is widely understood. Then I would only need to handle adding extra '>' characters when wrapping. However, I am rather taken with the way Lagrange (and several websites) display quotes, with a single opening quote character and intended lines. This is actually not too hard to replicate, using two grave characters '``' to simulate an opening 'curly' quote character and an extra newline after the quote to give a bit of space before regular text continues.


 ``
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
  tempor incididunt ut labore et dolore magna aliqua.


Preformatted [✍ 14:52: mistakenly skipped initially]


Preformatted lines are slightly different from the others in that they begin AFTER a line that starts with '```' and end BEFORE the next such line. Here I will remove the three grave characters and just indent the rest of lines by two, so that they do not align with regular text and thus visually stand out.


They will then align with lists, links and quotes (which makes things look neat and tidy) but can still be differentiated because they have no leading characters ('*', or '~') and no ' ``' from the proceeding line (in the case of my proposed quote display).


I will not wrap them or even attempt to filter non ASCII characters, leaving them mostly pure. The only downside is the two leading spaces may need to be removed from the start after copy and pasting them but on the flip side this is relatively trivial to deal with in any decent editor. Additionally, for many use cases (e.g. certain code types) that could even be skipped (as they would treated as non functional indentation).


Regular lines


Anything that does not match one of the above starting sequences is a regular line and needs only simple wrapping.


Other changes


In addition to making the text better suited for display, I also need to rewrite all internal (capsule specific) URLs and remove the the navigation links I add to the bottom of my posts but I think it makes more sense to do that in an additional script, that I can pipe the results of the first one through. 🤔


Am I missing anything? Thoughts and comments are welcome!


P.S. As a bonus for those that made it this far.


This post converted—how meta is that?



An extra, even more basic example


[✍ 2022-01-05: I rewrote this whole section again with more clarity and a warning]


Here is a more basic version of this script that does a bit of 'fancy' wrapping to Gemtext (with indentation for links and lists, and extra '>' characters for the additional newlines within quotes). Again the idea would be for potential display on Gopher. It does not impose any other significant formatting changes.


It is worth noting that due to wrapping and indentation, after this conversion you no longer have valid Gemtext. Just lightly formatted plaintext that superficially looks like Gemtext. You cannot permentantly alter your files like this with the intention of then serving the exact same source over both Gemini and Gopher. Such files, served over Gemini with a .gmi extension would likely have issues with unexpected wrapping, and longer link lines and lists would display incorrectly.


Since there are no character replacements, it is assumed that a person who might want to use something like this would avoid using large amounts of non-ASCII, expect their readers to have a UTF-8 capable client, or use some other server side character translation system like Alex's [§ Handling non-ASCII].


Sample Gemtext file - wrapped

Shell script filter to wrap .gmi files [✍ 2022-01-05 13:00: simplified]



Comments


2022-01-04 16:32 (UTC+1)


Omar Polo (yumh):


Convertire text/gemini in testo semplice 🇮🇹


ℹ Takes some of the concepts above and creates a new version.


2022-01-05 03:32 (UTC+1)


James Tomasino:


https://github.com/jamestomasino/dotfiles-minimal/blob/master/.profile#L135


You can tell lynx to use [utf-8]. :) It works with gopher sites too.

Nice [conversion] script, though. Good work


2022-01-05 22:21 (UTC+1)


Sandra Snan (Idiomdrottning):


How about turning them into Gopher maps? So hyperlinks could work.


Yes, I had considered this and I know that others, like Tomasino always do this. I also recall reading Solderpunk's "The true spirit of gopher" post where he talks about how according to RFC1436 this is "semantic abuse of gopher" and yet he also says, "building type 1 only gopher holes pretty much just works, and it *does* offer a nicer user experience […]" (i.e. he is not actually critical of it)


The true spirit of gopher: ¶ 12 → 13 - Solderpunk


In the end it is clear that this is what lead him to Gemini and I already have that in my Gemini blog. So for Gopher I think I would like it to be intentionally different, as part of the reason for even having a Gopher version. i.e. the Gemini version is true to Gemini and the Gopher one more "faithful" to Gopher.


That said, whilst reading about these topics I did note that some modern Gopher clients like VF-1 and GemiNaut make links in plain text directly usable by pattern matching for obvious URLs.


Linkification in Gopher clients

Linkification in Gopher clients (Part 2) – OK, I am just stupid


Thus I decided that plain text for the Gopher articles works well enough. Those who are more old school get what they expect and those who are more modern are more likely to run a modern client that will handle links for them anyway.


In addition, I did carefully think about how I would display links in posts, '§ Handling Gemtext - Links' includes, "By having them on their own line you get the next best thing, as you can quickly select a complete line via a triple click, thus making them far easier to copy and paste."


In summary, is this the right way? I don't know but this is how I *currently* think I want to do it. That said I did look at gophermap generation briefly, so perhaps I will change my mind. 😉


2022-01-06 13:57 (UTC+1)


Sandra Snan (Idiomdrottning):


Please note that I did see "By having them on their own line you get the next best thing, as you can quickly select a complete line via a triple click, thus making them far easier to copy and paste." I might be a sloppy reader, but not to that extent! ♥

I just didn't think this was a counterargument to the gophermaps thing (to the extent that the other stuff you bring up is). I don't have a gopherhole of my own so it's not that I have that much of a say about Gopher! ♥


Fair enough and I hope I did not offend you. To be honest I am amazed that anyone got through my boring post! 😉


I will add that I had some thoughts about going the "Gophermaps all the way down" route and I am a fickle creature, so who knows, perhaps I will change my mind and do just that. 😆


2022-01-08 14:46 (UTC+1)


Szczeżuja:


I have one lame question for your script:

[…]

How it should be used? There are no information in the source, I made some tries without success.


Oh sorry, it is a filter […] pipe or redirect the Gemtext in:


$ ./gmi2txt.sh < yourfile.gmi

(would print the converted text to screen)


or to save to a file:


$ ./gmi2txt.sh < yourfile.gmi > yourfile.txt

2022-01-10 11:38 (UTC+1)


Formatting Gemtext for Gopher - further tweaks ♊



📝 Comment

🔙 Gemlog index

🔝 Capsule index



/gemlog/