manページで特定のサブセクションだけ読みたい…読みたくない?

manページで覚えたいところだけいい感じに.txtに保管しようにもデフォルトだと長すぎて編集がめんどくさいんだよな
macOSです

イデア

man xxx | backspase消去 | tab->space変換 | 正規表現でyyyにマッチ > xxx_yyy.txt

backspase消去にはcoltab->space変換にはexpand正規表現yyyにマッチにはpcre2grep…を使いたい
特に今回はセクションを判定して複数行を抜き取る動作をやりたいので,通常のgrepとは異なるpcre2grepを利用する

why pcre2grep

一応はgrepでもマルチライン処理はできるらしいが,もちろんデフォルトでは対応していない,それをするには

  • PCREPerl Compatible Regular Expressions)を-Pオプションで呼び出す必要がある
    • GNUによって用意されているlibpcreが処理に噛んでくる
  • -PオプションはGNUgrepにのみ実装されている
  • そしてmacOSgrepgnuのものではない
    • つまりmacOSではデフォルトでgrep -Pが使えない
  • homebrewからgnu grepを入れても良いが,
    • ggrepとかいう名前で使うのは嫌だし
    • 開発終了のzipならともかくmacOSでもgnuでも現在進行形でメンテされてるgrepはちょっと怖くて併用できん
  • そもそもPCRE.orgが出してるpcre2homebrewで入れれば良くね?
    • 規格を作ってる組織が出す公式ライブラリなんだから信頼できる
    • homebrewgitを入れている場合はpcre2が依存先になっているので既に入ってたりする
      • pcreは古くて非推奨なので(今もまだffmpegとか依存しているけど)なければpcre2を入れよう
    • そのライブラリにAPIとしてのpcre2grepがついてくる(他にも色々ついてくるっぽい)

…ということでpcre2grepを採用したい

why expand

trはちょっと難しそうだしexpandがあるならこっちの方が楽じゃん
タブをエクスパンドするからそう呼ばれるのか

テンプレート

 man [コマンド名] | col -b | expand -t [1タブあたりの空白数] | pcre2grep -Moe ' {[タイトルの深さ(空白数)]}([タイトル]\n)( {[本文の深さ],}(\S* *)*\S*\n*)*'

コマンドラインオプションの説明は省くがこれで過不足はない

.txtに出力

 man xxx | col -b | expand -t 4 | pcre2grep -Moe ' {3}(yyy\n)( {5,}(\S* *)*\S*\n*)*' > xxx_yyy.txt

リダイレクション演算子のうち>>は追記で>は置換らしい

使用例

% man ls | col -b | expand -t 4 | pcre2grep -Moe ' {3}(The Long Format\n)( {5,}(\S* *)*\S*\n*)*'
   The Long Format
     If the -l option is given, the following information is displayed for each
     file: file mode, number of links, owner name, group name, number of bytes
     in the file, abbreviated month, day-of-month file was last modified, hour
     file last modified, minute file last modified, and the pathname.  If the
     file or directory has extended attributes, the permissions field printed by
     the -l option is followed by a '@' character.  Otherwise, if the file or
     directory has extended security information (such as an access control
     list), the permissions field printed by the -l option is followed by a '+'
     character.  If the -% option is given, a '%' character follows the
     permissions field for dataless files and directories, possibly replacing
     the '@' or '+' character.

     If the modification time of the file is more than 6 months in the past or
     future, and the -D or -T are not specified, then the year of the last
     modification is displayed in place of the hour and minute fields.

     If the owner or group names are not a known user or group name, or the -n
     option is given, the numeric ID's are displayed.

     If the file is a character special or block special file, the device number
     for the file is displayed in the size field.  If the file is a symbolic
     link the pathname of the linked-to file is preceded by “->”.

     The listing of a directory's contents is preceded by a labeled total number
     of blocks used in the file system by the files which are listed as the
     directory's contents (which may or may not include . and .. and other files
     which start with a dot, depending on other options).

     The default block size is 512 bytes.  The block size may be set with option
     -k or environment variable BLOCKSIZE.  Numbers of blocks in the output will
     have been rounded up so the numbers of bytes is at least as many as used by
     the corresponding file system blocks (which might have a different size).

     The file mode printed under the -l option consists of the entry type and
     the permissions.  The entry type character describes the type of file, as
     follows:

       -     Regular file.
       b     Block special file.
       c     Character special file.
       d     Directory.
       l     Symbolic link.
       p     FIFO.
       s     Socket.
       w     Whiteout.

     The next three fields are three characters each: owner permissions, group
     permissions, and other permissions.  Each field has three character
     positions:

       1.   If r, the file is readable; if -, it is not readable.

       2.   If w, the file is writable; if -, it is not writable.

       3.   The first of the following that applies:

              S     If in the owner permissions, the file is not
                executable and set-user-ID mode is set.  If in the
                group permissions, the file is not executable and
                set-group-ID mode is set.

              s     If in the owner permissions, the file is executable
                and set-user-ID mode is set.  If in the group
                permissions, the file is executable and setgroup-ID
                mode is set.

              x     The file is executable or the directory is
                searchable.

              -     The file is neither readable, writable, executable,
                nor set-user-ID nor set-group-ID mode, nor sticky.
                (See below.)

        These next two apply only to the third character in the last
        group (other permissions).

              T     The sticky bit is set (mode 1000), but not execute
                or search permission.  (See chmod(1) or sticky(7).)

              t     The sticky bit is set (mode 1000), and is searchable
                or executable.  (See chmod(1) or sticky(7).)

     The next field contains a plus (‘+’) character if the file has an ACL, or a
     space (‘ ’) if it does not.  The ls utility does not show the actual ACL;
     use getfacl(1) to do this.


おまけ

PCRE2対応の正規表現チェッカーならここが良さそうね

regex101.com