manページで覚えたいところだけいい感じに.txt
に保管しようにもデフォルトだと長すぎて編集がめんどくさいんだよな
※macOSです
アイデア
man xxx | backspase消去 | tab->space変換 | 正規表現でyyyにマッチ > xxx_yyy.txt
backspase
消去にはcol
,tab
->space
変換にはexpand
,正規表現でyyy
にマッチにはpcre2grep
…を使いたい
特に今回はセクションを判定して複数行を抜き取る動作をやりたいので,通常のgrep
とは異なるpcre2grep
を利用する
why pcre2grep
一応はgrep
でもマルチライン処理はできるらしいが,もちろんデフォルトでは対応していない,それをするには
PCRE
(Perl Compatible Regular Expressions)を-P
オプションで呼び出す必要がある- GNUによって用意されている
libpcre
が処理に噛んでくる
- GNUによって用意されている
-P
オプションはGNUのgrep
にのみ実装されている- そしてmacOSの
grep
はgnuのものではない- つまりmacOSではデフォルトで
grep -P
が使えない
- つまりmacOSではデフォルトで
homebrew
からgnu grep
を入れても良いが,- そもそもPCRE.orgが出してる
pcre2
をhomebrew
で入れれば良くね?- 規格を作ってる組織が出す公式ライブラリなんだから信頼できる
homebrew
でgit
を入れている場合はpcre2
が依存先になっているので既に入ってたりするpcre
は古くて非推奨なので(今もまだffmpeg
とか依存しているけど)なければpcre2
を入れよう
- そのライブラリにAPIとしての
pcre2grep
がついてくる(他にも色々ついてくるっぽい)
…ということでpcre2grep
を採用したい
why expand
tr
はちょっと難しそうだしexpand
があるならこっちの方が楽じゃん
タブをエクスパンドするからそう呼ばれるのか
テンプレート
man [コマンド名] | col -b | expand -t [1タブあたりの空白数] | pcre2grep -Moe ' {[タイトルの深さ(空白数)]}([タイトル]\n)( {[本文の深さ],}(\S* *)*\S*\n*)*'
各コマンドラインオプションの説明は省くがこれで過不足はない
.txt
に出力
man xxx | col -b | expand -t 4 | pcre2grep -Moe ' {3}(yyy\n)( {5,}(\S* *)*\S*\n*)*' > xxx_yyy.txt
リダイレクション演算子のうち>>
は追記で>
は置換らしい
使用例
% man ls | col -b | expand -t 4 | pcre2grep -Moe ' {3}(The Long Format\n)( {5,}(\S* *)*\S*\n*)*' The Long Format If the -l option is given, the following information is displayed for each file: file mode, number of links, owner name, group name, number of bytes in the file, abbreviated month, day-of-month file was last modified, hour file last modified, minute file last modified, and the pathname. If the file or directory has extended attributes, the permissions field printed by the -l option is followed by a '@' character. Otherwise, if the file or directory has extended security information (such as an access control list), the permissions field printed by the -l option is followed by a '+' character. If the -% option is given, a '%' character follows the permissions field for dataless files and directories, possibly replacing the '@' or '+' character. If the modification time of the file is more than 6 months in the past or future, and the -D or -T are not specified, then the year of the last modification is displayed in place of the hour and minute fields. If the owner or group names are not a known user or group name, or the -n option is given, the numeric ID's are displayed. If the file is a character special or block special file, the device number for the file is displayed in the size field. If the file is a symbolic link the pathname of the linked-to file is preceded by “->”. The listing of a directory's contents is preceded by a labeled total number of blocks used in the file system by the files which are listed as the directory's contents (which may or may not include . and .. and other files which start with a dot, depending on other options). The default block size is 512 bytes. The block size may be set with option -k or environment variable BLOCKSIZE. Numbers of blocks in the output will have been rounded up so the numbers of bytes is at least as many as used by the corresponding file system blocks (which might have a different size). The file mode printed under the -l option consists of the entry type and the permissions. The entry type character describes the type of file, as follows: - Regular file. b Block special file. c Character special file. d Directory. l Symbolic link. p FIFO. s Socket. w Whiteout. The next three fields are three characters each: owner permissions, group permissions, and other permissions. Each field has three character positions: 1. If r, the file is readable; if -, it is not readable. 2. If w, the file is writable; if -, it is not writable. 3. The first of the following that applies: S If in the owner permissions, the file is not executable and set-user-ID mode is set. If in the group permissions, the file is not executable and set-group-ID mode is set. s If in the owner permissions, the file is executable and set-user-ID mode is set. If in the group permissions, the file is executable and setgroup-ID mode is set. x The file is executable or the directory is searchable. - The file is neither readable, writable, executable, nor set-user-ID nor set-group-ID mode, nor sticky. (See below.) These next two apply only to the third character in the last group (other permissions). T The sticky bit is set (mode 1000), but not execute or search permission. (See chmod(1) or sticky(7).) t The sticky bit is set (mode 1000), and is searchable or executable. (See chmod(1) or sticky(7).) The next field contains a plus (‘+’) character if the file has an ACL, or a space (‘ ’) if it does not. The ls utility does not show the actual ACL; use getfacl(1) to do this.
おまけ
PCRE2対応の正規表現チェッカーならここが良さそうね