I explain how using Large Language Models (a form of AI) can boost your shell productivity. The basic idea is to have LLMs generate shell commands or scripts, rather than researching them with traditional approaches (such as a search engine). I discuss the advantages and disadvantages of LLMs over classic research approaches. I conclude by discussing CLI tools that integrate LLMs into your shell, avoiding the context switch between shell and web browser.
Introduction
For developers and DevOps engineers, the shell (or on macOS, the Terminal) is a frequently used tool. The biggest problem of a shell is the lack of usability. For instance, the syntax of Bash scripting is something you first need to get used to. Also, CLI-based tools are hard to use, because their (often long) list of short arguments (e.g. -s
, -t
, -p
, …) put a high mental load on the user, who is expected to know the argument’s existence and meaning by heart.
Just as an example, a CLI tool like ffmpeg (used to re-encode video or audio files) has dozens of options, but constructing the right command is quite difficult. It’s impossible to remember all the commands, tools and arguments. Also, the provided manuals typically read like reference documentation, providing little practical benefit (e.g. real-world examples).
Consequently, you need a good ad-hoc method to research shell commands that solve your specific problem at hand.
The classic approach
In the past, I exclusively used the following two approaches:
- Search for the problem using a search engine
- Use a format such as “<name of shell or tool> <short problem description>”, e.g. “bash find string in file”
- Helps discover new CLI tools whose name you did not know before
- Use Bash-specific “cheat sheet” tools like tldr or navi which are large, community-driven databases that contain useful Bash commands for various CLI tools, solving typical problems. See the following two screen shots.
These techniques are tried and true, but there is a new kid on the block: artificial intelligence (AI). More specifically: Large Language Models (LLMs).
A new approach, using AI
As you certainly already know, LLMs are a variant of AI that has become “famous” since December 2022, when OpenAI released ChatGPT to the public. The typical interaction method is to “chat” with the LLM, getting answers in a natural conversation, using a web browser (or mobile app) as interface.
As LLMs have been trained on a wide variety of topics, you can also research shell productivity tricks with them. This has pros and cons over the classic approaches discussed above:
- Pros
- LLMs provide answers to your specific problem. This is a real time-saver compared to the classic approaches discussed above, because there you need to first abstract your concrete problem to a generic problem description, sift through the results and adapt them to your concrete use case again.
- With LLMs, you can often forgo learning new tricks that you need rarely, because you can just ask the LLM for a solution the next time you encounter the problem. Still, learning tricks by heart that you need very often still makes sense, because knowing them by heart is the fastest method of retrieval.
- Cons
- The solutions the LLM gives you are not always correct (“hallucinations”), so you need to fall back to classic approaches if the LLM-solution does not work.
- You need to be mindful about the data you send to the LLM. For instance, your company policies may prohibit sending customer data, because the LLM may use your data as future training material, where it may leak to other unauthorized parties.
Examples of AI-based research
Example 1: research productivity tips
Here is a chat record using ChatGPT to research productivity tips for the shell, asking the following two questions:
- Please provide ten productivity tips for editing a command in the Bash shell. An example would be the shortcut “Ctrl+w” used to cut the word that is to the left of the cursor.
- Do you have other tips in that area?
The results are astonishingly good, covering many of the commands I presented in my other article about shell productivity. I was not even aware of some of them at first, such as fc
.
Example 2: research command with specific goal
Next, let’s look for a shell command that achieves a specific goal. Let’s ask ChatGPT for a video transcoder command with the following prompt: “Using a CLI tool such as ffmpeg in Bash, how can I reencode a video? Assume that I have two input files, a.mov and b.mov, which I want to encode to an output file named c.mp4 (which should use the high quality H265 codec). From a.mov I only want the audio track, from b.mov I only want the video track.”
Result: ffmpeg -i a.mov -i b.mov -c:v copy -map 1:v -c:a copy -map 0:a -c:v libx265 -crf 23 c.mp4
(along with a brief explanation of each parameter)
The command works as intended, although it is not perfect: the -c:v copy
argument is not necessary (because it is overwritten by the later -c:v libx265
argument), but does not hurt either. In any case, manually researching that command with the official ffmpeg manual (or using an Internet search) would definitely have taken longer than crafting the prompt.
Example 3: build shell scripts
Building shell scripts is a very common task for DevOps engineers. While I prefer more easy-to-use and powerful programming languages for scripting, such as Python, the respective interpreter binary may not be available in the target environment where the script needs to run. A shell, OTOH, is ubiquitously available.
Fortunately, you can use LLMs to generate shell scripts. As input, you can use either English language, or you write code in other languages (like Python) and have the LLM translate it to Bash. Here is an example:
Please write a Bash script that parses a plain text file that contains a table whose columns are separated by
semicolons. For each row in the input file, the script should output a line that contains the value of the last
column. Expect that in the input file, the number of columns per row is variable. The input file may also
contain rows that are entirely empty, which should be skipped (that is, do not print an empty line as ouput).
The input file may also contain lines that start with "#" - these lines should be ignored. In the output, trim
any leading or trailing white spaces.
Example input file:
here;are;values
which
have; differently
many;columns;so;we need;adaptability
Expected output:
values
which
differently
adaptability
Code language: plaintext (plaintext)
The generated code looks as follows:
awk -F';' '!/^#/ && NF{print $NF}' input_file | awk '{$1=$1};1'
Code language: Bash (bash)
This looks quite magic. Fortunately, you can just ask the LLM to explain what each awk
commands does.
Using LLMs right from the shell
Having to switch back and forth between the shell and the browser (to access the LLM interface) quickly becomes tiring. Fortunately, there are CLI tools that access an LLM right from your shell, eliminating this context switch. One of the most popular one is ShellGPT, but there are others, like aichat or llm.
The advantages of CLI tools are:
- No need to switch between browser and shell
- Pre-configured “roles” that prime the LLM to provide the answer in a specific form. For instance, ShellGPT’s code role (that is involved when you run something like
sgpt --code "<your prompt>"
) tells the LLM to only provide code (without “prose”), strip any Markdown formatting or explanations. A role is essentially just a string that is sent to the LLM first (sending your actual prompt as second string). See this video for details. - Configurable which LLM model to use (in case the provider offers many), e.g. GPT-4 vs GPT-3.5
- Control of the “temperature” (i.e., the creativity level) of the LLM
- Ability to pipe in documents without having to copy & paste them
- Example from ShellGPT:
docker logs -n 20 container_name | sgpt "check logs, find errors, provide possible solutions"
. - This is also great to let an LLM summarize webpages for you. However, you should first strip HTML tags and extract specific sections of the page via CSS selectors, see here for details.
- Example from ShellGPT:
The main disadvantage is that CLI tools require API access, which is typically not free (compared to the web browser based products, such as ChatGPT). I set up an OpenAPI account and configured a spending limit to avoid being surprised by high charges to my credit card.
Conclusion
With Large Language Models being on the rise, I expect real benefits in the work capacity and throughput. Generating commands or shell scripts works quite well, saving a lot of time. It will be interesting to see by how much error rates will be reduced over time, so that dealing with an LLM’s output turns from “trust but verify” to “trust”.