repo.mind.cth451 – On the boundary between byte streams and atoms. Language: zh_CN, en

On Telegram crashing under webview

Problem Statement

Telegram desktop client crashes with a SIGABRT whenever it tries to render a webpage with QT webview. This seems to be a problem unique to Wayland-based system running on proprietary NVIDIA userspace GL/EGL drivers. Telegram upstream is aware of the issue but isn’t going to find the root cause with their time.

At time of writing this is reproducible with the following versions:

Telegram desktop 6.8.2
NVIDIA egl-wayland 1.1.21
NVIDIA drivers 610.43.02
QT6 6.11.1

OriginCode has already opened a ticket with egl-wayland over a month ago, but so far the ticket doesn’t seem to have received any replies. No choice but to take a stab myself.

Debug working notes

First look at a previous core dump

Systemd core dumper managed to capture telegram’s dead memory space. Loading the core in GDB revealed a assertion error inside wlEglAcquireDisplay.

... [telegram aborting itself after an assertion failure]
#5 0x00007ffff066e4e5 in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=0x55555d94a1a0 "`\263w]UU") at assert.c:127
#6  0x00007fffcd421e1d in wlExternalApiLock () at ../src/wayland-thread.c:87
#7  0x00007fffcd4226c1 in wlEglAcquireDisplay (dpy=dpy@entry=0x55555d94a1a0) at ../src/wayland-egldisplay.c:1469
#8  0x00007fffcd4234dc in wlEglGetInternalHandleExport (dpy=<optimized out>, type=<optimized out>, handle=0x55555d94a1a0) at ../src/wayland-eglhandle.c:186
#9  0x00007fffc8331081 in ??? () at /usr/lib/libEGL_nvidia.so.0
#10 0x00007fffc82d62ce in ??? () at /usr/lib/libEGL_nvidia.so.0
#11 0x00007fffcd4295e3 in wlEglCreateStreamAttribHook (dpy=0x55555d94a1a0, attribs=0x7fffffffd150) at ../src/wayland-eglstream.c:200
#12 0x00007fffc83359e3 in ??? () at /usr/lib/libEGL_nvidia.so.0
#13 0x00007fffc82d6321 in ??? () at /usr/lib/libEGL_nvidia.so.0
#14 0x00007fffcd45d1ab in WaylandEglClientBuffer::setCommitted(QRegion&) () at /usr/lib/libQt6WaylandEglCompositorHwIntegration.so.6
#15 0x00007ffff30dc178 in QWaylandSurfacePrivate::surface_commit(QtWaylandServer::wl_surface::Resource*) () at /usr/lib/libQt6WaylandCompositor.so.6
... [A whole bunch of normal looking QT stuff. Presumed unrelated.]

Looking at the offending code for wlEglAcquireDisplay:

int wlExternalApiLock(void)
{
    if (pthread_once(&wlMutexOnceControl, wlExternalApiInitializeLock)) {
        assert(!"pthread once failed");
        return -1;
    }

    if (!wlMutexInitialized || pthread_mutex_lock(&wlMutex)) {
        assert(!"failed to lock pthread mutex");
        return -1;
    }

    return 0;
}

The whether-mutex-is-initialized flag shows no weird signs, but the very mutex it’s trying to acquire is already locked.

(gdb) print wlMutex
$1 = {__data = {__lock = 1, __count = 0, __owner = 530834, __nusers = 1, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\001\000\000\000\000\000\000\000\222\031\b\000\001\000\000\000\002", '\000' <repeats 22 times>, __align = 1}
(gdb) pipe info threads | grep 530834
* 1    Thread 0x7fffe86d3ac0 (LWP 530834) __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44

The current thread already locked this mutex, but is somehow attempting to lock it again. NVIDIA egl-wayland initializes its mutex with PTHREAD_MUTEX_ERRORCHECK, and a deadlock situation thus caused locking to fail with EDEADLK, triggering the assert().

Now the remaining question becomes where and why the fuck egl-wayland attempts to lock the same lock twice.

Tracking lock usage in GDB with overengineered python

In a hindsight it should be rather obvious that the earlier locking must have happened somewhere in the upper stack frames, but my monkey brain is too comfortable writing python plugins for GDB both in my day-time job and when I struggle with Hollow Knight.

Loaded the thing into GDB, initialize the breakpoints tracking where locks are taken and released, and launch a new Telegram process.

INFO:__main__:Mutex 00007fffcd432380 unlocked by thread 530834
INFO:__main__:Mutex 00007fffcd432380 locked by thread 530834
INFO:__main__:Mutex 00007fffcd432380 unlocked by thread 530834
INFO:__main__:Mutex 00007fffcd432380 locked by thread 530834
INFO:__main__:Mutex 00007fffcd432380 locked by thread 530834
Telegram: ../src/wayland-thread.c:87: wlExternalApiLock: Assertion !"failed to lock pthread mutex" failed.

Thread 1 "Telegram" received signal SIGABRT, Aborted.

Two successive lock-acquire on the same thread. My script has saved backtraces each of these lock-acquire. Should reveal more details.

[BEGIN lock attempt -2]
#0  wlExternalApiLock () at ../src/wayland-thread.c:79
#1  0x00007fffcd4292b9 in wlEglCreateStreamAttribHook (dpy=0x55555d94a1a0, attribs=0x7fffffffd140) at ../src/wayland-eglstream.c:82
#2  0x00007fffc83359e3 in ??? () at /usr/lib/libEGL_nvidia.so.0
#3  0x00007fffc82d6321 in ??? () at /usr/lib/libEGL_nvidia.so.0
#4  0x00007fffcd45d1ab in WaylandEglClientBuffer::setCommitted(QRegion&) () at /usr/lib/libQt6WaylandEglCompositorHwIntegration.so.
...

[BEGIN lock attempt -1]
#0  wlExternalApiLock () at ../src/wayland-thread.c:79
#1  0x00007fffcd4226c1 in wlEglAcquireDisplay (dpy=dpy@entry=0x55555d94a1a0) at ../src/wayland-egldisplay.c:1469
#2  0x00007fffcd4234dc in wlEglGetInternalHandleExport (dpy=<optimized out>, type=<optimized out>, handle=0x55555d94a1a0) at ../src/wayland-eglhandle.c:186
#3  0x00007fffc8331081 in ??? () at /usr/lib/libEGL_nvidia.so.0
#4  0x00007fffc82d62ce in ??? () at /usr/lib/libEGL_nvidia.so.0
#5  0x00007fffcd4295e3 in wlEglCreateStreamAttribHook (dpy=0x55555d94a1a0, attribs=0x7fffffffd150) at ../src/wayland-eglstream.c:200
#6  0x00007fffc83359e3 in ??? () at /usr/lib/libEGL_nvidia.so.0
#7  0x00007fffc82d6321 in ??? () at /usr/lib/libEGL_nvidia.so.0
#8  0x00007fffcd45d1ab in WaylandEglClientBuffer::setCommitted(QRegion&) () at /usr/lib/libQt6WaylandEglCompositorHwIntegration.so.6
...
[Lock-acquire attempt fails here and causes SIGABRT]

An unknown piece of code from proprietary NVIDIA libEGL_nvidia.so has issued a call to wlEglCreateStreamAttribHook and it somehow caused itself a deadlock. I guess its time to read a bit more about this egl-wayland and its purpose. From a random search on Google, I landed on an NVIDIA presentation about libEGL on XDC2016. Looks like we have some wild pointer chasing and circular calling ahead of us.

`egl-wayland` API trampoline madness

The commentary inside the code base isn’t very enlightening, but together with those slides from NVIDIA I can kinda make an educated guess on the execution flow (after installing QT6 symbols).

[Qt6] WaylandEglClientBufferIntegrationPrivate::initEglStream
         |
         | 0. QT makes EGL create_stream_attrib_nv() call with an "external"
         |    display.
         v
[libEGL_nvidia.so]
         |
         | 1. Device EGL calls "external platform API".
         v
[libnvidia-egl-wayland.so.1.1.21]
wlEglCreateStreamAttribHook(display, attribute)
         |
======== | ===== wlExternalApiLock() held.
H        |
H        | 2. With the external API lock held, it fetches some display
H        |    metadata through the following API.
H        |    - wl_eglstream_display_get()
H        |    - wl_eglstream_display_get_stream()
H        |    With this information, it invokes the device EGL again
H        |    to actually create the stream.
H        v 
H data->egl.createStreamAttrib(display, modifed_attrib)
H [libEGL_nvidia.so] Function pointer set during driver init.
H        |
H        | 3. Calls wlEglGetInternalHandleExport through external API.
H        v
H [libnvidia-egl-wayland.so.1.1.21]
H wlEglGetInternalHandleExport(display, EGL_OBJECT_DISPLAY_KHR, display)
H        |
H        v
H [libnvidia-egl-wayland.so.1.1.21]
H wlEglAcquireDisplay(display)
H        |
H        | 4. Attempts to lock the external API lock again with
H        |    wlExternalApiLock().
H        |
H <========== PTHREAD_MUTEX_ERRORCHECK deadlock assertion tripped.

The wlExternalApiLock() logically protects the global linked list of displays against data race so that a display or an associated stream cannot just disappear. Functions that validate whether streams/displays are valid or change the list internally take this lock. But does it make sense to keep holding it across the “device-platform-device-platform” trampoline in step 4?

The Fix

Whatever downstream “platform” external EGL API it calls seem to do a fairly good job to make sure incoming pointers remain valid regardless of who (from application or from platform side) calls these APIs. Looks like the way out is to just immediately release the lock once step 2 in the chart is done, so that the remaining device EGL calls can take their own lock as needed.

diff --git a/src/wayland-eglstream.c b/src/wayland-eglstream.c
index 3c40a0d..611e773 100644
--- a/src/wayland-eglstream.c
+++ b/src/wayland-eglstream.c
@@ -89,15 +89,17 @@ EGLStreamKHR wlEglCreateStreamAttribHook(EGLDisplay dpy,
     }
 
     if (err != EGL_SUCCESS) {
-        goto fail;
+        goto fail_unlock;
     }
 
     wlStream = wl_eglstream_display_get_stream(wlStreamDpy, resource);
     if (wlStream == NULL) {
         err = EGL_BAD_ACCESS;
-        goto fail;
+        goto fail_unlock;
     }
 
+    wlExternalApiUnlock();
+
     if (wlStream->eglStream != EGL_NO_STREAM_KHR ||
         wlStream->handle == -1) {
         err = EGL_BAD_STREAM_KHR;
@@ -237,12 +239,11 @@ EGLStreamKHR wlEglCreateStreamAttribHook(EGLDisplay dpy,
     wlStream->eglStream = stream;
     wlStream->handle = -1;
 
-    wlExternalApiUnlock();
-
     return stream;
 
-fail:
+fail_unlock:
     wlExternalApiUnlock();
+fail:
     wlEglSetError(data, err);
     return EGL_NO_STREAM_KHR;
 }

Submitted upstream as https://github.com/NVIDIA/egl-wayland/pull/194

Appendix

Crude python script used to make GDB dump where and who took and released the mutex.

Mashed up GDB python plugin



#!/usr/bin/env python3
import json

import logging

from collections import defaultdict

from dataclasses import dataclass
import gdb
logging.basicConfig(level=logging.INFO)

logger = logging.getLogger(__name__)
@dataclass

class Trace:

    kind: Literal["lock", "unlock"]
    thread_id: int

    stacktrace: List[str]
# Mutex address -> trace info

_trace_db: Dict[int, List[str]] = defaultdict(list)
_breakpoints: List[gdb.Breakpoint] = []
class MutexLockWatchpoint(gdb.Breakpoint):

    def __init__(self, trace_db: Dict[int, List[str]], track_lock: bool):

        if track_lock:

            super().__init__("wlExternalApiLock", gdb.BP_BREAKPOINT)

        else:

            super().__init__("wlExternalApiUnlock", gdb.BP_BREAKPOINT)

        self.trace_db = trace_db

        self.track_lock = track_lock
    def stop(self) -> bool:

        stacktrace = gdb.execute("bt", to_string=True)
        mutex_address = int(gdb.parse_and_eval("&wlMutex"))
        self.trace_db[mutex_address].append(

            Trace(kind="lock" if self.track_lock else "unlock", thread_id=gdb.selected_thread().ptid[0],

                  stacktrace=stacktrace.splitlines()))

        logger.info(

            f"Mutex {mutex_address:016x} {'locked' if self.track_lock else 'unlocked'} by thread {gdb.selected_thread().ptid[0]}")
def start_watch():

    global _breakpoints

    global _trace_db

    if len(_breakpoints) == 0:

        _breakpoints.append(MutexLockWatchpoint(_trace_db, True))

        _breakpoints.append(MutexLockWatchpoint(_trace_db, False))

    else:

        logger.warning("Watchpoints already set up. Skipping.")
def end_watch():

    global _breakpoints

    for bp in _breakpoints:

        bp.delete()

    _breakpoints = []
def dump_traces(out_file: str = None):

    global _trace_db
    mapped = {k: [{"kind": t.kind, "thread_id": t.thread_id, "stacktrace": t.stacktrace} for t in v] for k, v in _trace_db.items()}
    with open(out_file, "w") as f:

        json.dump(_trace_db, f, indent=2)

    logger.info(f"Traces dumped to {out_file}")
class MutexWatchCmd(gdb.Command):

    def __init__(self):

        super().__init__("mutex_watch", gdb.COMMAND_DATA)
    def invoke(self, arg, from_tty):

        argv = gdb.string_to_argv(arg)

        if len(argv) == 0:

            logger.info("Usage: mutex_watch ")

            return
        if argv[0] == "start":

            start_watch()

            return
        if argv[0] == "end":

            end_watch()

            return

if argv[0] == "dump": if len(argv) < 2: logger.info("Where to dump???") return dump_traces(argv[1]) return if __name__ == "__main__": MutexWatchCmd()

Notes on Linux 5.16 RC1 Deployment for Apple M1 Mac Mini

What works / doesn’t work?

10G Ethernet: Works.
WiFi: Can’t find wireless card.
Audio: Neither HDMI nor onboard audio works.
USB Audio: Works. If you use Apple USB-C to 3.5mm dongle, connect your headphones / speakers to the dongle first before plugging the dongle into Mac Mini.
USB3: Degraded speed on type C ports – Operates at 300Mbit/s. Type A ports are fully functional.
Graphics: fbcon / kmscon works without issue. Can start GUI with llvmpipe and is in fact fast enough to run gnome.
Built-in NVME drive: Block device nodes exist. My system is not deployed on the internal SSD. Actual usuability and performance remain untested.

Devices and Software

A linux host where you build the kernel and the bootloader.
A M1 Mac Mini. Mine has 10Gbit ethernet.
A USB drive to serve as a makeshift root device.
A USB-something to USB-C 3.0 cable for sideloading kernel and initrd.
A USB keyboard hooked on the Mac Mini.

GCC toolchain targetting at least arm64-noneabi. If your distro doesn’t have a cross compiler toolchain, grab one from Linaro. At the time of writing I’m using GCC 12 unstable snapshot 2021.11-1.
M1N1 bootloader. Grab from Asahi GitHub. At the time of writing, m1n1 is at commit 660d7482b9f8d9e99fc6414bc366623d092c3229. To use 10G ethernet, also apply patch in PR125.
Linux kernel 5.16 RC1. Clone or download a tarball snapshot.
- Apply patches in Janne Grunau’s PKGBUILD to the kernel source.
- Apply the patch that resets PCIe devices for proper bring up at startup.
- Apply the patch that sets the PCIe link speed.
- Apply the patch that nudges the 10G Ethernet card to use the right MAC address.
- My kernel configuration (add whatever you need)
AOSC OS arm64 root filesystem tarball. Grab from repo.aosc.io and dump onto a USB thumb drive formatted to ext4. Other distros should technically work, though initrd / boot command line options may vary.

Compiling

Build the kernel first. Make sure environment variable is set:

ARCH=arm64
CROSS_COMPILE= the path of the cross compiler tool chain.
INSTALL_MOD_PATH= /usr in the thumbdrive root file system.

Verify you have arch/arm64/boot/Image.gz, arch/arm64/boot/dts/apple/t8103-j274.dtb. Run make modules_install and make sure the kernel modules are correctly stored at <thumbdrive>/usr/lib/modules/<version-localversion>.

There’s no good way to generate an initrd without messing around with QEMU userspace emulation or going to another arm64 Linux machine. Since we are already in a chroot, you may configure root passwords, hostname / timezone and other stuff in there as well. See a random Stack Exchange post on how to do this.

Building m1n1 is very straight forward. Modify the first line in Makefile and set ARCH to ${CROSS_COMPILE}, then follow README at https://github.com/AsahiLinux/m1n1. Find compiled m1n1.macho in build/.

Follow AsahiLinux Dev Quick Start to provision disk space from Mac OS, transfer m1n1.macho to 1TR and use kmutil to use m1n1 as a custom boot object, connect Mac Mini to build host with a USB cable.

Sideloading Kernel and Initrd

m1n1 has a bunch of python scripts to interact with a Mac running m1n1. When the Mac mini is connected to the Linux PC it should appear as a USB serial device, something like /dev/ttyACM0.

Set environment variable M1N1DEVICE to /dev/ttyACM0. Invoke proxyclient/tools/linux.py to sideload a kernel like this:

python3 linux.py \
    -b 'root=/dev/sda1 rw earlycon console=tty0' \
    ${kernel_source}/arch/arm64/boot/Image.gz \
    ${kernel_source}/arch/arm64/boot/dts/apple/t8103-j274.dtb \
    /path/to/initramfs-5.16.0-rc1-cth-m1+.img

Bootable m1n1 + Kernel Payload

Use the following command to concatenate m1n1 and other stuff used to boot:

cat m1n1.macho \
    <(echo 'boot-args=root=/dev/sda1 rw earlycon console=tty0') \
    ${kernel_source}/arch/arm64/boot/Image.gz \
    ${kernel_source}/arch/arm64/boot/dts/apple/t8103-j274.dtb \
    /path/to/initramfs-5.16.0-rc1-cth-m1+.img \
> /tmp/m1n1+kernel.macho

The combined boot object can be used to boot directly into Linux without another PC.

Notes

Booting from iSCSI

At boot pass parameters root=UUID=XXXX ip=dhcp rd.iscsi.initiator=iqn.<host-iqn:identifier> netroot=iscsi:<server-ip>::::iqn.<server-iqn:target> instead of a normal root=/dev/XXX. Also make sure your dracut.conf includes appropriate modules:

add_dracutmodules+=" iscsi lvm "
# I am using LVM on iSCSI. Ignore if just using normal GPT
install_items+=" /usr/bin/lsusb /usr/bin/lspci /usr/bin/iperf3 "
# Add additional files as you wish

I have effectively included as many core system utilities as possible in case I need to operate on the root device before leaving initrd.

HDCP everywhere under Mac OS / 1TR

Due to lack of desk space and monitors, my Mac mini is hooked up to an HDMI capture card. If your capture card can’t terminate HDCP encryption, anything the Mac Mini displays when running Mac OS / 1TR will be unviewable over the capture device even when there are no “copyrighted” contents being displayed at all. (Dammit. Apple. Why?)

Once M1N1 loads, the capture card should report non-HDCP encrypted content. After Linux kernel taking over the framebuffer, HDMI capture will remain unencrypted and functional.

Domain expiry

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

This message is to certify that access to my old domain name cth451.tk will
terminate on Jan 5, 2021 due to registration expiry. If you haven't updated
your references to the current cth451.me, please do so ASAP.
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQSEpw/kxI2CHoPWNplrwXv80PB4YgUCX+uymgAKCRBrwXv80PB4
Yg9DAP0WegtNQoOi+vAYviayE+IY3t+tuaE2EnsIz+wj20aSQAD+PeDsSNs/mfi9
Adp3jPW/1PWvKKrWn8WyA97AI3YW1Ag=
=Z3pM
-----END PGP SIGNATURE-----

New GPG key for new smartcard

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

This message is to certify that my old RSA key

    3EA4556BFDA919931884606919F85E55F96644FC

is still valid, but won't be used as often. A new ed25519 key pair with the
following fingerprint has been generated for my new Yubikey. This new key has
been signed with the original one.

    84A70FE4C48D821E83D636996BC17BFCD0F07862

The old key might be revoked in the future if I decide to repurpose my old
Yubikey. Please use the new key for future correspondence.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEPqRVa/2pGZMYhGBpGfheVflmRPwFAl/QYmwACgkQGfheVflm
RPxlCw//fIiSgPvIGEg0UHJ2Wm10mrfCsL3lkrJo0nk4DgQvPgG0ssu//KDSTk7+
ZKN07pcvVAfsbhEOwewZblYzeHMwAj/X2kiUejqSbh+OUlge/Nq3rfLlyAAAbfjg
Y9cvP3toZgro7nmvMjZNNAuHqrMg56YHL5oFlCb3frBaN901yTOnRa3Asvd0u7Ad
YBrxNBjIffLtI/2ns2rp6KlZNKaxNzwaFos936BLDazbMrfehsclKHqo9WBDp/KQ
j4Jzrtmpiq/fAcqot7rcl81KUnQHw2CmvLyhn8vnsCBRQhzlqB7V4sla0ihZyem6
4HIQb+6KqxCcerWUZs0nz+RylB9wBBenrBaRzyGJXXKxKjhFGJG84vEwkFlonm7+
GH99qKdnX65ADQKYHCtH3VAo8HEG0+oEjqnh7wM4G5bdv+I8K+QjFnK+P+RFTTe6
r9SNQD0LxKb4B1nRtvSkvCbMDwGipHqakQpVfWHejLjTvjgF0r1GYWL3ZmKy+PqL
j/BDPFfl2YTjlx96QZA1izGKagRG/QlWsAsmaZa47FilcNxVb6Q2FhoF3y2oyiRZ
FAp6LkiN/kaesQzvZJBXSt0dtXq7PSNnxtiOwTev2gjxa5frAoJsOOc13vuyPAx7
yXG/ZAPSpgtQcPGLC9gYcy71qVzagQxLe33X5FrBJPezsWlvyXQ=
=FH/6
-----END PGP SIGNATURE-----

锈湖站 Rusty Lake Station

“裂车马上就要进站了本次裂车终点站菇岛请乘客们在屏蔽门完全打开后先下后上依次登车带小孩的乘客请照看好您的小孩以免发生危险”
上海 Minecraft 地铁

站台标识参考（没有狗皮膏药的）上海地铁 1 号线，列车本体涂装基本参考五号线的旧车 05C01，加上了真车上没有的车厢连接通道。本图和我服务器的锈湖站并不相同。哪天心情好了就把以前的旧坑地铁 mod 拿出来重新做成这个样子好了（笑）

本篇出图其实一开始是放在 Pixiv 的，但是经 Homan 提醒，现在的 Pixiv 以不允许游客查看作品的方式逼迫用户注册或者登陆。今后的作品将不会再投稿到 Pixiv，我会在本站提供（足够）高清的出图。

Station signages and other visual elements are heavily based on the current design in Shanghai Metro system. The trains themselves are a reference to the 05C01 trains (now demoted to servicing the branch line only). Note that the Rusty Lake in this station is entirely different from the one currently in my Minecraft server (by Homan).

This work was originally published on Pixiv a while earlier. However, Pixiv is marking my work as “sensitive” out of no particular reason. According to yahoo chiebukuro, Pixiv is forcing users to sign in before they can view the work in the name of “sensitive content”. As a result, my future works will be posted on this personal website instead of Pixiv.

GPG Key Renewing – 2020

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Old Key Fingerprint: 336E9D9A9B80E1BEFFE4CBB994333B156BD49DC0
New Key Fingerprint: 3EA4556BFDA919931884606919F85E55F96644FC

This message is to certify that my old RSA key is expiring on Sep 18 2019. A new key pair with fingerprint above has been generated and will replace the expiring key soon. The new key will carry a signature from the original one.

Chai, Sept 5 2019
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEM26dmpuA4b7/5Mu5lDM7FWvUncAFAl1xJ98ACgkQlDM7FWvU
ncA9GhAAm5aJzHDKgmAVqqB9ipK1+UbYJqmY4NiBTDowOo5hb3ueN0Rbxu7dS2zD
UQKbov8YkHjNkIUWCacx7tQ+NAJpf6Carbr2AwoXE2cvdLPWxmVwtqe4CZkwkWPQ
Y7alq7B399Ewtc5T2cdNrfY+QjdQTVwBH8If6SbyOox8fg+rES4LPXAuQVNwJ6li
ZUNTHUbOJdOzQf0RDKJB0SHlVB1kH9hvYczEQSbsDkfJwqnSWkKBPTT19WDA8lMN
WaPNS2b2Q+gtVvUjH6pa7g77CCALgEmmSzdMQrHvudzOYg4oJGUcQG9C3bzAn6UO
iZlWaKi1plI2GbGRpYnQki9PVkH1g+K9FOokuLUUuiP/kcmIN84CiplnXCxqYrfN
ammag1dXRqWQnfpIXOFBWWbNeAa7/DI2nnJkVVriwBcPebeldvDEXqHgnx5azTQa
7tCa+PsU8Yqs2sykYYxSMcCEbcsuc9HYa3gXdNi/GDPDc+d6Ebgcf5vd4S9Gcu1i
zC9mV2gzd/j8F0dSg9OXs7ABSiwrmnj84QvR2C/j7KXYr29pHNaV/J227VrBr/H6
7WH1VGGqWegFj9tHixUXcPdzWUXNUvYyglLCeIyEGK5uGHFrU1KmhEOacLUImgOT
FGi01Suo6030ZEl2cQjilO8U81cb7qDtUdNGFx2ajPi2Ldjfyeg=
=MqyP
-----END PGP SIGNATURE-----

Photo Roll from Hokkaido

Former Ministry From Edo Era in Goryokaku
Lat N 41 47.813 Long E 140 45.430

Fireworks “Hanabi” at Toya Lake
Lat N 42 33.970 Long E 140 49.228
f/16 30” ISO 640

Sapporo Subway O-dori Station, Exit to West-3 Intersection
Lat N 43 03.648 Long 141 21.118

Hakodate Bay Night View from Hakodate Mountain Observatory
Lat N 41 45.562 Long E 140 42.278
f/10 10” ISO 100

A Note on SIT Tunnel at Home

This blog article records the progress of setting up IPv6 tunnel to Hurricane Electric at home where my ISP has no plan in sight to provide prefix-delegated IPv6 access over a PPPoE.

Shorthands and Assumptions in This Note

eth0 connects to the Internet via IPv4. This note shall also apply to encapsulated interfaces, e.g. vlan15@eth0, lte0 or pppoe0.
eth1 connects to local LAN. Similar as above, the process shall be the same when LAN side is a VLAN or bridge (or both).
tun0 denotes the sit tunnel interface created in this step.
A line beginning with # denotes comments in the configuration notes.

Setting up the Tunnel Interface on ER-X

If registered correctly on TunnelBroker, it should provide the following information:

Logical address at local endpoint, e.g. 2001:444:111:222::2/64
Logical address at remote endpoint, e.g. 2001:444:111:222::1/64
IPv4 address at remote endpoint where encapsulated traffic is sent, e.g. 66.220.18.42, the HE tunneling endpoint in Paris.
A routable prefix for client side delegation, e.g. 2001:444:112:222::/64. This is usually different from the v6 addresses for the endpoint, and HE will show segments of the prefix in bold.

Now fill ER-X configuration nodes with corresponding information and default routing for IPv6:

interfaces:
    tunnel:
        tun0:
            address: [Fill logical v6 address in CIDR at endpoint]
            description: [Give a name to this tun]
            encapsulation: sit
            local-ip: [Fill in IPv4 address at eth0]
            remote-ip: [Fill in IPv4 address at tunneling endpoint]
protocols:
    static:
        interface-route6:
            ::/0:
                next-hop-interface: tun0
            # This creates a default IPv6 routing table entry that
            # routes all non-link-local address to the tunnel.

At this point, one should be able to ping any IPv6 address from the ER-X. If this is working, continue to instruct the LAN interface to delegate the prefix

interface:
    <path-to-interface-config-node>:
        ipv6:
            dup-addr-detect-transmits: 1
            # Stateless SLAAC configuration might produce identical
            # IP addresses. This allow the network to detect whether
            # a stateless address already exists.
            address:
                autoconf
                # Set autoconf to allow stateless delegation by SLAAC
            router-advert:
                prefix:
                    [Fill routable delegated prefix here]:
                        autonomous-flag: true
                        # Instructs computers on this network to auto
                        # discover DNS servers
                        on-link-flag: true
                        # Indicates that this prefix exists on the
                        # same Ethernet link, i.e. these addresses
                        # does not require routing

IPv6 enabled devices shall now receive globally unique IPv6 address assigned via SLAAC and prefix delegation.

Subsequent Steps

Confirm IPv6 assignment on LAN devices

$ ip addr
<------ MORE INTERFACES REDACTED ------>
2: eno1:  mtu 1480 qdisc fq_codel state UP group default qlen 1000
     link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
     inet 192.168.5.4/24 brd 192.168.5.255 scope global dynamic noprefixroute eno1
        valid_lft 80444sec preferred_lft 80444sec
     inet6 2001:470:d:XXXX:XXXX:XXXX:XXXX:dfd3/64 scope global dynamic noprefixroute 
        valid_lft 2591976sec preferred_lft 86376sec
     inet6 fe80::be40:XXXX:XXXX:XXXX/64 scope link noprefixroute 
        valid_lft forever preferred_lft forever
<------ MORE INTERFACES REDACTED ------>

Trace IPv6 connections to an IPv6 enabled website

$ traceroute -6 ac.cth451.me -n
 traceroute to ac.cth451.me (2606:4700:30::681c:1b16), 30 hops max, 80 byte packets
  1  2001:470:d:XXXX:XXXX:XXXX:XXXX:XXXX  0.415 ms  0.533 ms  0.624 ms
  2  2001:470:c:XXXX::1  185.353 ms *  203.802 ms
  3  2001:470:0:9d::1  178.603 ms  167.001 ms  189.255 ms
  4  2001:504:0:3:0:1:3335:1  196.520 ms  179.737 ms  196.117 ms
  5  2400:cb00:12:1024::6ca2:d61d  185.036 ms 2400:cb00:12:1024::6ca2:d614  175.573 ms 2400:cb00:12:1024::6ca2:d608  185.263 ms

It is advisable to setup network wide firewall on the router, as addresses can be reached by any other IPv6 connected devices from the Internet.

Further Notes

The sit tunnel shall also work if setup correctly on any other router or even a personal computer with public IPv4 address. I am unable to replicate the settings on a Linux router via raw commands as I do not own a linux machine with public IPv4 address.
I am not sure if the method would work if the local endpoint is behind NAT. This scenario will be experimented on after I return to campus.

New GitLab instance

TL;DR

I’ve set up a GitLab instance at git.cth451.me for now. Read the sign-in page on the instance to request read-write access. Continue reading New GitLab instance

“Certified” Androids?

The story begins with the moment I bought Cytus II on the very first day of its release from google play, expecting some real music gaming on my Surface Pro 3 running homebrew Android 7.1 (yes, that’s totally possible and it runs suprisingly fine with all hardware buttons and touchscreen working). However the game quits immediately and no logs are shown through the adb interface. I contacted Rayark for support and got reply like this:

Cytus II might only be compatible for native Android devices at the moment. Please also check if you have installed Xposed, firewalls, block ads, or any rooting software. If yes, these may effect the performance of the game. We’d like to suggest you to remove these software to ensure the game runs smoothly and properly.

(Probably) 3 weeks after the game’s release, I ended up playing Cytus II on my crammed 5′ mobile phone, not by choice though. Someone asked me, why I can’t buy a regular Android tablet or an iPad to do the same job. Well that could be an option if those people would donate in any means for a new one, and I really don’t think utilizing an existing hardware piece could cause any troubles for commercial devs and companies (Google: really?).

After some searches I reached conclusion that Cytus II has integrated an “compliance check” called SafetyNet and the underlying Compatibility Test Suite (CTS), a framework introduced by Google to verify if any android device falls into the category of “compatible”. Since Surface Pro 3 has never got an official Android release (it’s a Microsoft thing, of course) and the base android x86 is shipped with root and development mode on, there is probably no way that any CTS tests on Surface Pro 3 would pass any time sooner.

Just earlier this day, I saw the news that Google is attmpting to block Gapps from running on “uncertified” devices, where modding the android device or unlocking bootloader would void the “certified” status. There’s even a webpage letting people to “register” their Androids with their device identifier which is absolutely not working after I attempted to register my Surface only getting an unknown error.

I would not blame Rayark for their attempt to place a layer of piracy protection on such a nice game while hurting the ones who modded their devices properly exactly to play these games. My question is: If such a lockdown is so important that this system had been deployed to thousands of android software by now, why make android an open standard? Why not switch to the Apple production mode if a centralized control force seems so vital to the whole android community? Such blockage wouldn’t be easily tolerated if the so called “register uncertified devide page” is just a lie, and I believe there will be a solution to circumvent such unreasonable restrictions eventually. Before that, the vast population of “uncertified” androids and modders wouldn’t be so comfortable and I might really need to ask myself: Why I should spend 3 months porting an open source OS to a new device just to find nothing should works by design.