Last Update 6:44 AM May 20, 2024 (UTC)

Identity Blog Catcher

Brought to you by Identity Woman and Infominer.
Support this collaboration on Patreon!!!

Monday, 20. May 2024

John Philpin : Lifestream

Responsive Video Miss Ken. So spot on.

Responsive Video Miss Ken. So spot on. Your browser does not support the video tag.
Responsive Video

Miss Ken. So spot on.

Your browser does not support the video tag.

Is Scott Galloway the American Tech version of Clarkson?

Is Scott Galloway the American Tech version of Clarkson?

Is Scott Galloway the American Tech version of Clarkson?


📸 🌊 🚧 Another in the ongoing series that isn’t yet a clear

📸 🌊 🚧 Another in the ongoing series that isn’t yet a clear series.

📸 🌊 🚧

Another in the ongoing series that isn’t yet a clear series.


Ben Werdmüller

ShareOpenly is now on Tedium

I adore the way ShareOpenly has been added to Tedium: You can see it for yourself on all its posts, including this great one about the decline of the ball mouse. Its founder, Ernie Smith, told me: “figured I had to have fun with it”.

I adore the way ShareOpenly has been added to Tedium:

You can see it for yourself on all its posts, including this great one about the decline of the ball mouse. Its founder, Ernie Smith, told me: “figured I had to have fun with it”.

Sunday, 19. May 2024

John Philpin : Lifestream

I am no fan of the orange douche .. but the Biden campaign’s

I am no fan of the orange douche .. but the Biden campaign’s tweet on the crappy podium is misguided and not right. The clip they circulated was actually quite funny .. demonstrated fast thinking .. the ‘leaning left’ pun particularly clever and the words in their headline are misleading .. his dig was not at this event workers but the contractor .. who #HadOneJob

I am no fan of the orange douche .. but the Biden campaign’s tweet on the crappy podium is misguided and not right. The clip they circulated was actually quite funny .. demonstrated fast thinking .. the ‘leaning left’ pun particularly clever and the words in their headline are misleading .. his dig was not at this event workers but the contractor .. who #HadOneJob


Simon Willison

Spam, junk … slop? The latest wave of AI behind the ‘zombie internet’

Spam, junk … slop? The latest wave of AI behind the ‘zombie internet’ I'm quoted in this piece in the Guardian about slop: I think having a name for this is really important, because it gives people a concise way to talk about the problem. Before the term ‘spam’ entered general use it wasn’t necessarily clear to everyone that unwanted marketing messages were a bad way to behave. I’m hoping

Spam, junk … slop? The latest wave of AI behind the ‘zombie internet’

I'm quoted in this piece in the Guardian about slop:

I think having a name for this is really important, because it gives people a concise way to talk about the problem.

Before the term ‘spam’ entered general use it wasn’t necessarily clear to everyone that unwanted marketing messages were a bad way to behave. I’m hoping ‘slop’ has the same impact – it can make it clear to people that generating and publishing unreviewed AI-generated content is bad behaviour.


NumFOCUS DISCOVER Cookbook: Minimal Measures

NumFOCUS DISCOVER Cookbook: Minimal Measures NumFOCUS publish a guide "for organizers of conferences and events to support and encourage diversity and inclusion at those events." It includes this useful collection of the easiest and most impactful measures that events can put in place, covering topics such as accessibility, speaker selection, catering and provision of gender-neutral restrooms.

NumFOCUS DISCOVER Cookbook: Minimal Measures

NumFOCUS publish a guide "for organizers of conferences and events to support and encourage diversity and inclusion at those events."

It includes this useful collection of the easiest and most impactful measures that events can put in place, covering topics such as accessibility, speaker selection, catering and provision of gender-neutral restrooms.


Fast groq-hosted LLMs vs browser jank

Fast groq-hosted LLMs vs browser jank Groq is now serving LLMs such as Llama 3 so quickly that JavaScript which attempts to render Markdown strings on every new token can cause performance issues in browsers. Taras Glek's solution was to move the rendering to a requestAnimationFrame() callback, effectively buffering the rendering to the fastest rate the browser can support. Via lobste.rs

Fast groq-hosted LLMs vs browser jank

Groq is now serving LLMs such as Llama 3 so quickly that JavaScript which attempts to render Markdown strings on every new token can cause performance issues in browsers.

Taras Glek's solution was to move the rendering to a requestAnimationFrame() callback, effectively buffering the rendering to the fastest rate the browser can support.

Via lobste.rs


John Philpin : Lifestream

📸 For nearly winter it’s pretty nice… More tenders on the

📸 For nearly winter it’s pretty nice… More tenders on the beach than I have seen for a while. See what I mean? And some larger boats parked outside the usual ‘lot’.

📸 For nearly winter it’s pretty nice…

More tenders on the beach than I have seen for a while.

See what I mean?

And some larger boats parked outside the usual ‘lot’.


Simon Willison

A Plea for Sober AI

A Plea for Sober AI Great piece by Drew Breunig: “Imagine having products THIS GOOD and still over-selling them.”

A Plea for Sober AI

Great piece by Drew Breunig: “Imagine having products THIS GOOD and still over-selling them.”

Saturday, 18. May 2024

John Philpin : Lifestream

Sometimes Gallagher just nails it … 🔗 Read The Whole

Sometimes Gallagher just nails it … 🔗 Read The Whole Thing

Sometimes Gallagher just nails it …

🔗 Read The Whole Thing


You know the problem with great new features being added to

You know the problem with great new features being added to Micro Blog? You have to keep on learning more CSS to keep it ‘pretty’ Not complaining now - but seriously the last adjustments were only last weekend! GREAT job @manton

You know the problem with great new features being added to Micro Blog?

You have to keep on learning more CSS to keep it ‘pretty’

Not complaining now - but seriously the last adjustments were only last weekend!

GREAT job @manton


📸 🌊 🚧 Why the WIP? Because I am toying with a short mini s

📸 🌊 🚧 Why the WIP? Because I am toying with a short mini series - and if I decide to do it - I will be back.

📸 🌊 🚧

Why the WIP? Because I am toying with a short mini series - and if I decide to do it - I will be back.


📺 Sugar Stunning .. great job from Colin F .. but the homa

📺 Sugar Stunning .. great job from Colin F .. but the homage to ‘noir’ yet modern, the direction, the cinematography, the colors, the lights, the nuance … everything. Just so good. I have read that people loved the mid season ‘twist’ …if we are talking last scene in episode 6 .. yeah no .. I had that scene in my head for a couple of episodes prior, so no twist for me. In fact, it happened to c

📺 Sugar

Stunning .. great job from Colin F .. but the homage to ‘noir’ yet modern, the direction, the cinematography, the colors, the lights, the nuance … everything. Just so good.

I have read that people loved the mid season ‘twist’ …if we are talking last scene in episode 6 .. yeah no .. I had that scene in my head for a couple of episodes prior, so no twist for me. In fact, it happened to coincide with when I thought the show had finished .. and I 🖇️ even posted that I was very disappointed with the ending (EEJIT I am).

Anyway .. all this to say … loved it. If you have 🍎 📺, whatever else you are watching stop - and go take a look.

Sugar on 🔗 Reelgood

’All’ My TV Show Reviews

 


Michael Ruminer

Trust Over IP in Decentralized Information and the Role of Delegates

As I read the Issuer Requirements Guide from Trust Over IP Foundation it only reinforces my long-standing belief that well-known trusted proxy/witness/delegate issuers will serve an important role. Every organization or even many organizations will not want to maintain the governance needs to be a trusted issuer and instead will outsource that activity to a delegate that can ensure integrity and g

As I read the Issuer Requirements Guide from Trust Over IP Foundation it only reinforces my long-standing belief that well-known trusted proxy/witness/delegate issuers will serve an important role. Every organization or even many organizations will not want to maintain the governance needs to be a trusted issuer and instead will outsource that activity to a delegate that can ensure integrity and governance over their issuing activities.

For example, a gig platform such as Uber will not issue proof of work and proof of income credentials but will instead have a delegate do this on their behalf. The overhead for them in proper governance would be too steep. Instead the proxy/witness/delegate will collect the information from the source and issue a credential on their behalf. The delegate would be the trusted party in Trust Over IP (TOIP).

We see this in today’s world with service bureaus for different activities. The future will not be quite as decentralized as the potential for it to be because orgs will not wish to maintain the needed structures to ensure it. There is a place for well-known delegates.


Simon Willison

AI counter app from my PyCon US keynote

AI counter app from my PyCon US keynote In my keynote at PyCon US this morning I ran a counter at the top of my screen that automatically incremented every time I said the words "AI" or "artificial intelligence", using vosk, pyaudio and Tkinter. I wrote it in a few minutes with the help of GPT-4o - here's the code I ran as a GitHub repository. I'll publish full detailed notes from my talk once

AI counter app from my PyCon US keynote

In my keynote at PyCon US this morning I ran a counter at the top of my screen that automatically incremented every time I said the words "AI" or "artificial intelligence", using vosk, pyaudio and Tkinter. I wrote it in a few minutes with the help of GPT-4o - here's the code I ran as a GitHub repository.

I'll publish full detailed notes from my talk once the video is available on YouTube.


John Philpin : Lifestream

I opened Chat4o and my first words (audio input) after ‘hell

I opened Chat4o and my first words (audio input) after ‘hello’ were something like write me a quick post about Markdown. What is it. Why should I use it. Response back - as written. No formatting. No editing. Introduction In the world of technology and software development, it is crucial to have proper documentation to ensure that information is organized and easily accessible. Markdown i

I opened Chat4o and my first words (audio input) after ‘hello’ were something like write me a quick post about Markdown. What is it. Why should I use it.

Response back - as written. No formatting. No editing.

Introduction

In the world of technology and software development, it is crucial to have proper documentation to ensure that information is organized and easily accessible. Markdown is lightweight markup language with plain-text formatting syntax, designed to be easily converted into HTML and other formats. It provides a simple way to structure content without the need for complex coding.

Why Use Markdown?

Markdown is widely used by developers, writers, and content creators for various reasons:

Simplicity: Markdown uses plain text and a simple formatting syntax that is easy to learn. Portability: Markdown files can be opened on any text editor and easily converted to different formats. Flexibility: It supports various elements such as headers, lists, images, and links, making it versatile for different types of content. Readability: Markdown files are clean and easy to read in their raw form. Compatibility: It can be used on different platforms and is supported by many applications websites. Conclusion

In conclusion, Markdown is a powerful tool for creating well-structured documents quickly and efficiently. Whether you are writing a blog post, documenting code, or creating content for the web, Markdown offers a simple and effective way to format your text. By incorporating Markdown into your workflow, you can streamline the documentation process and focus on creating great content.

I have worked with Macs for nearly (more than?) 20 years - and never once said hello to it - or please - or thank you ….

If I don’t say those words when I work with some AI model - I feel rude.


Shed a Tear🔗 - William Gallagher The DECORATOR stands b

Shed a Tear🔗 - William Gallagher The DECORATOR stands back to appreciate his day’s work. The entire wall is now painted. Satisfied, he collects his snap-tin, paint, brushes, ladder, etc., lights a cigarette stub and walks off. While he’s been achieving that painted wall, the film crew cast have failed to do anything at all. So much effort, so much energy, so much time, and nothing achieved

Shed a Tear🔗 - William Gallagher

The DECORATOR stands back to appreciate his day’s work. The entire wall is now painted. Satisfied, he collects his snap-tin, paint, brushes, ladder, etc., lights a cigarette stub and walks off.

While he’s been achieving that painted wall, the film crew cast have failed to do anything at all. So much effort, so much energy, so much time, and nothing achieved whatsoever.

Completely unrelated, I’ve been sitting and typing in this chair for the past few days while four men put up a shed in our garden.

😂😂😂


Thrutopian Futures anyone? 🔗 📚 Any Human Power - Manda Sco

Thrutopian Futures anyone? 🔗 📚 Any Human Power - Manda Scott talks about her new Thrutopian Mytho-Political thriller - Accidental Gods

Simon Willison

Quoting Patrick Reynolds

I rewrote it [the Oracle of Bacon] in Rust in January 2023 when I switched over to TMDB as a data source. The new data source was a deep change, and I didn’t want the headache of building it in the original 1990s-era C codebase. — Patrick Reynolds

I rewrote it [the Oracle of Bacon] in Rust in January 2023 when I switched over to TMDB as a data source. The new data source was a deep change, and I didn’t want the headache of building it in the original 1990s-era C codebase.

Patrick Reynolds


John Philpin : Lifestream

📸 🌊 🚧 This morning .. I had an earlier one that looked go

📸 🌊 🚧 This morning .. I had an earlier one that looked good on a phone - but once it was here .. not so much. No - it wasn’t the upload - I checked the source. It just #sucked

📸 🌊 🚧 This morning ..

I had an earlier one that looked good on a phone - but once it was here .. not so much. No - it wasn’t the upload - I checked the source. It just #sucked


📺 Safe House Meh. Look carefully you will see I didn’t dis

📺 Safe House Meh. Look carefully you will see I didn’t dislike it. Equally, I didn’t like. I liked the premise - each series an entirely different cast, set of characters - even the Safe House itself was a different house (and yes - I did watch both) … but still. MEH. SafeHouse on 🔗 Reelgood ’All’ My TV Show Reviews  

📺 Safe House

Meh. Look carefully you will see I didn’t dislike it. Equally, I didn’t like. I liked the premise - each series an entirely different cast, set of characters - even the Safe House itself was a different house (and yes - I did watch both) … but still. MEH.

SafeHouse on 🔗 Reelgood

’All’ My TV Show Reviews

 


What Will The Cathedrals Of The Future Be ... Let Alone Look Like.

Cathedrals are places designed to show their creator’s opulence and power, but they’re also designed to alter our senses, to throw us into alternate worlds, so that they can open us our minds to change. If that’s the case, what will these places be in the 21st century? Good question that was running through my mind when I stumble across this quote … “In the day-to-day trenches of adult l

Cathedrals are places designed to show their creator’s opulence and power, but they’re also designed to alter our senses, to throw us into alternate worlds, so that they can open us our minds to change. If that’s the case, what will these places be in the 21st century?

Good question that was running through my mind when I stumble across this quote …

“In the day-to-day trenches of adult life, there is no such thing as atheism. There is no such thing as not worshipping. Everybody worships. The only choice we get is what to worship.”

💬 David Foster Wallace

Worlds Colliding. The piece I was reading was 🔗 Cathedrals of the 21st Century from Unchartered Territories, that ends …

You can see similar processes at play today in places like courtrooms or parliaments, albeit more subdued: They are big, solemn, sound is controlled and there are ceremonies, serious uniforms…

… and then immediately …

But you can also apply the same principles and end up with a completely different setting and experience: nightclubs.

If that observation makes you sit up you should go read the whole thing. Also, some great images throughout as you read Tomas Pueyo’s observations across geography and time.

Friday, 17. May 2024

Simon Willison

Understand errors and warnings better with Gemini

Understand errors and warnings better with Gemini As part of Google's Gemini-in-everything strategy, Chrome DevTools now includes an opt-in feature for passing error messages in the JavaScript console to Gemini for an explanation, via a lightbulb icon. Amusingly, this documentation page includes a warning about prompt injection: Many of LLM applications are susceptible to a form of abuse kn

Understand errors and warnings better with Gemini

As part of Google's Gemini-in-everything strategy, Chrome DevTools now includes an opt-in feature for passing error messages in the JavaScript console to Gemini for an explanation, via a lightbulb icon.

Amusingly, this documentation page includes a warning about prompt injection:

Many of LLM applications are susceptible to a form of abuse known as prompt injection. This feature is no different. It is possible to trick the LLM into accepting instructions that are not intended by the developers.

They include a screenshot of a harmless example, but I'd be interested in hearing if anyone has a theoretical attack that could actually cause real damage here.

Via Hacker News


John Philpin : Lifestream

Ami Dar - Founder of Idealist

Responsive Video A simple, lovely idea from Ami Dar (on LinkedIN) - founder of Idealist. Enjoy. Your browser does not support the video tag.
Responsive Video

A simple, lovely idea from Ami Dar (on LinkedIN) - founder of Idealist. Enjoy.

Your browser does not support the video tag.

Simon Willison

Commit: Add a shared credentials relationship from twitter.com to x.com

Commit: Add a shared credentials relationship from twitter.com to x.com A commit to shared-credentials.json in Apple's password-manager-resources repository. Commit message: "Pour one out." Via @rmondello@hachyderm.io

Commit: Add a shared credentials relationship from twitter.com to x.com

A commit to shared-credentials.json in Apple's password-manager-resources repository. Commit message: "Pour one out."

Via @rmondello@hachyderm.io


Quoting Kelsey Piper

I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it. If a departing employee declines to sign the document, or if they violate it, they can lo

I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.

If a departing employee declines to sign the document, or if they violate it, they can lose all vested equity they earned during their time at the company, which is likely worth millions of dollars.

Kelsey Piper


Ben Werdmüller

Share Openly: A simple icon for a new social sharing service

A lovely blog post by Jon Hicks on his process for creating the ShareOpenly icon. Characteristically, lots of care and attention went into this. I'm really glad you get to see the open hand icons, which we eventually decided against, but feel really warm and human. Jon's amazing, lovely to work with, and has a really impressive body of work. I'm grateful he was able to c

A lovely blog post by Jon Hicks on his process for creating the ShareOpenly icon. Characteristically, lots of care and attention went into this.

I'm really glad you get to see the open hand icons, which we eventually decided against, but feel really warm and human.

Jon's amazing, lovely to work with, and has a really impressive body of work. I'm grateful he was able to contribute such an important part of this personal project. #Technology

[Link]


A new Ani DiFranco album is something to celebrate

I’ve been following Ani DiFranco for decades. I’ve seen her play live around twenty times: she always brings a kind of joyful, progressive energy that leaves me motivated and buzzing. She has a new album out, and it feels like a return to visceral, honest form. It’s not quite the acoustic punk from the late nineties / early aughts — seriously, go check out Living in Clip, Not a Pretty Girl or D

I’ve been following Ani DiFranco for decades. I’ve seen her play live around twenty times: she always brings a kind of joyful, progressive energy that leaves me motivated and buzzing.

She has a new album out, and it feels like a return to visceral, honest form. It’s not quite the acoustic punk from the late nineties / early aughts — seriously, go check out Living in Clip, Not a Pretty Girl or Dilate — and it goes to some really experimental places, but I’m into it. This time, rather than making it on her own, she’s worked with producer BJ Burton, who’s also worked with Bon Iver and Taylor Swift.

We need progressive, momentum-bringing, energetic music more than ever. Ani delivers. And even the name of the album itself — Unprecedented Sh!t — feels very apt for the era.

From the liner notes:

The title Unprecedented Sh!t is not only representative of how much of a sonic departure the 11-track album is from Ani’s other work, but also a political and social commentary on the current state of the world. “We find ourselves in unprecedented times in many ways, faced with unprecedented challenges. So, our responses to them and our discourse around them, need to rise to that level.”

Amen.


Apple, SpaceX, Microsoft return-to-office mandates drove senior talent away

"Taken together, our findings imply that return to office mandates can imply significant human capital costs in terms of output, productivity, innovation, and competitiveness for the companies that implement them." There's no doubt that there's a lot of value in being in the same physical room together; I'm writing this on the day after a work summit that brought my team to

"Taken together, our findings imply that return to office mandates can imply significant human capital costs in terms of output, productivity, innovation, and competitiveness for the companies that implement them."

There's no doubt that there's a lot of value in being in the same physical room together; I'm writing this on the day after a work summit that brought my team together from across the country, and I'm still buzzing from the energy. But I think anyone in tech that proposes a full-time return to office policy needs to rethink.

It comes down to this: "it's easier to manage a team that's happy". People want their lives and contexts to be respected; everyone's relationship with their employers has been reset over the last few years. This goes hand in hand with the resurgence of unions, too: the contract between workers and employers is being renegotiated, and particularly for parents and carers, but really for everyone, working from home yields a kind of freedom that's hard to replace. And asking people to come back reads as a lack of trust and autonomy that erodes relationships and decimates morale. #Business

[Link]


Simon Willison

PSF announces a new five year commitment from Fastly

PSF announces a new five year commitment from Fastly Fastly have been donating CDN resources to Python—most notably to the PyPI package index—for ten years now. The PSF just announced at PyCon US that Fastly have agreed to a new five year commitment. This is a really big deal, because it addresses the strategic risk of having a key sponsor like this who might change their support policy based

PSF announces a new five year commitment from Fastly

Fastly have been donating CDN resources to Python—most notably to the PyPI package index—for ten years now.

The PSF just announced at PyCon US that Fastly have agreed to a new five year commitment. This is a really big deal, because it addresses the strategic risk of having a key sponsor like this who might change their support policy based on unexpected future conditions.

Thanks, Fastly. Very much appreciated!


Talking Identity

Talking Ethics in Identity at Identiverse 2024

Identiverse being the best identity conference around means that it’s always a challenge coming up with a talk proposal, as Andrew Hindle and team raise the bar each year. The process acts as a forcing function for me to think about the topics I want to bring to the community for discussion, topics that I […]

Identiverse being the best identity conference around means that it’s always a challenge coming up with a talk proposal, as Andrew Hindle and team raise the bar each year. The process acts as a forcing function for me to think about the topics I want to bring to the community for discussion, topics that I encounter all the time in #DayJob, but seldom have time to dig into.

This year I wanted to build on my talk from last year, but realized I couldn’t tackle it alone. Thankfully, I managed to con(vince) two absolute rock stars in Michelle Dennedy and Eve Maler to join me on stage for a fireside chat about the ethics imperative facing our industry.

Screenshot

As the abstract says, “What happens when you throw a privacy wonk, a standards guru, and a product architect that have been working in identity far too long on to a conference stage together?” Join us in Vegas and find out.


Simon Willison

Programming mantras are proverbs

Programming mantras are proverbs I like this idea from Luke Plant that the best way to think about mantras like "Don’t Repeat Yourself" is to think of them as proverbs that can be accompanied by an equal and opposite proverb. DRY, "Don't Repeat Yourself" matches with WET, "Write Everything Twice". Proverbs as tools for thinking, not laws to be followed. Via lobste.rs

Programming mantras are proverbs

I like this idea from Luke Plant that the best way to think about mantras like "Don’t Repeat Yourself" is to think of them as proverbs that can be accompanied by an equal and opposite proverb.

DRY, "Don't Repeat Yourself" matches with WET, "Write Everything Twice".

Proverbs as tools for thinking, not laws to be followed.

Via lobste.rs


Patrick Breyer

EU-Abgeordnete fordern die britische Regierung auf, Julian Assanges Auslieferung zu stoppen

Heute haben sich auf Initiative des EU-Abgeordneten Dr. Patrick Breyer (Piratenpartei) 31 Europaabgeordnete verschiedener Fraktionen in einem offenen Brief an den britischen Innenminister James Cleverly gewandt und …

Heute haben sich auf Initiative des EU-Abgeordneten Dr. Patrick Breyer (Piratenpartei) 31 Europaabgeordnete verschiedener Fraktionen in einem offenen Brief an den britischen Innenminister James Cleverly gewandt und dringlich appelliert, die Auslieferung von Wikileaks-Gründer Julian Assange an die Vereinigten Staaten zu stoppen. Dies geschieht im Vorfeld der bevorstehenden Gerichtsentscheidung am 20. Mai 2024, die voraussichtlich final über Assanges Schicksal entscheiden wird.

In dem Brief wird die britische Regierung aufgefordert, ihre Verantwortung im Hinblick auf Menschenrechte und Pressefreiheit wahrzunehmen. „Genau wie die EU-Kommission verbreitet die britische Regierung die Mär, über die Auslieferung hätten ausschließlich die Gerichte zu entscheiden. § 70 (2) des britischen Auslieferungsgesetzes gibt dem Innenminister die Befugnis, die Auslieferung zu verweigern, wenn sie gegen das Recht auf Leben oder das Verbot der Folter und unmenschlicher oder erniedrigender Behandlung der Europäischen Menschenrechtskonvention verstoßen würde“, erklärt Breyer. „Die psychische Gesundheit von Julian Assange, die potenziellen Haftbedingungen in den USA und das reale Risiko eines Suizids bei Auslieferung führen dazu, dass die Auslieferung eine solche unmenschliche oder erniedrigende Behandlung darstellen würde.“

Die Abgeordneten betonen in ihrem Schreiben außerdem, dass die Verfolung von Julian Assange politisch motiviert sei. Die Bestimmungen des britisch-amerikanischen Auslieferungsvertrags verbieten zu Recht die Auslieferung wegen politischer Straftaten. Die eindeutig politische Natur dieses Falls wird durch zahlreiche und hochgradig voreingenommene Aussagen führender Persönlichkeiten der US-Politik deutlich, die seit mindestens 2011 die extralegale Bestrafung oder Ermordung von Herrn Assange fordern.

Ein weiterer Kritikpunkt ist das Fehlen einer Garantie der US-Regierung, dass Assange vor Gericht dieselben Rechte wie ein US-Bürger erhalten würde. „Ein Verfahren gegen jemanden, der Dokumente veröffentlicht hat, in einem Land, das möglicherweise grundlegende Rechte auf Meinungs- und Pressefreiheit nicht anerkennt oder anwendet, ist inakzeptabel“, so Breyer.

Breyer und die anderen unterzeichnenden EU-Abgeordneten fordern die britische Regierung auf, die Pressefreiheit und das Recht auf freie Meinungsäußerung zu schützen und die Auslieferung von Julian Assange zu stoppen.

Die Unterzeichner des Briefes schließen sich den Forderungen von großen Organisationen wie Amnesty International und Reporter ohne Grenzen an, die die sofortige Freilassung von Julian Assange verlangen. Sie betonen, dass seine anhaltende Inhaftierung das Recht auf freie Meinungsäußerung gefährdet.


Doc Searls Weblog

Blog + Newsletter

Newsletters are all the rage now. In recognition of that, I blogged here two years ago about the idea of writing a solo newsletter. Since then I’ve been co-producing this one with Katherine Druckman at Reality 2.o. It’s a Substack one, so I know how that game works on the production as well as the […]

Newsletters are all the rage now. In recognition of that, I blogged here two years ago about the idea of writing a solo newsletter. Since then I’ve been co-producing this one with Katherine Druckman at Reality 2.o. It’s a Substack one, so I know how that game works on the production as well as the consumption ends.

Recently I also learned that WordPress makes it easy to turn blogs into newsletters, which is why you see “Get New Posts by Email” in the right column here. So, wanting an image to go with the news that this blog can now be a newsletter if you like, I said to ChatGPT, “Draw me Doc Searls blogging on the left side of the image, and producing a newsletter on the right side.” It gave me this, including the caption:

Here’s the illustration depicting Doc Searls engaging in two different activities. On the left, he is blogging, and on the right, he is producing a newsletter. This split scene captures his multitasking abilities in digital content creation.

Except for the slouch, I never looked like that. But what the hell. AI is wacky shit, so there ya go.

Feel free to subscribe.


John Philpin : Lifestream

📸 A Thought For Your Day. Ok …TWO Thoughts.

📸 A Thought For Your Day. Ok …TWO Thoughts.

📸 A Thought For Your Day.

Ok …TWO Thoughts.

Thursday, 16. May 2024

John Philpin : Lifestream

If Craft is the Word/Doc online equivalent to what Google Do

If Craft is the Word/Doc online equivalent to what Google Docs had the opportunity to be. And Airtable is the Excel/Numbers online equivalent to what Google Sheets had the opportunity to be. Then What is the Powerpoint/Keynote online equivalent to what Google Slides had the opportunity to be? Figma, Miro and Canva don’t seem to be it?

If Craft is the Word/Doc online equivalent to what Google Docs had the opportunity to be.

And Airtable is the Excel/Numbers online equivalent to what Google Sheets had the opportunity to be.

Then

What is the Powerpoint/Keynote online equivalent to what Google Slides had the opportunity to be?

Figma, Miro and Canva don’t seem to be it?


The Pragmatic Engineer

The Pulse #93: OpenAI makes Google dance

Google’s core search business has never been under as much threat as it is today – and the threat comes from OpenAI. Also: Google Cloud deletes a customer’s infra, and Tesla in trouble.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. NetBSD and Gentoo ban AI-generated commits, StackOverflow to sell data to OpenAI, AWS losing market share, jail time for for Tornado Cash developer, and more. 

OpenAI makes Google dance. OpenAI is setting the pace for Google with AI phone assistant capabilities, and is probably the reason that Google started shipping AI answers on top of search results.

Google Cloud deletes Australian trading fund’s infra. A $124B fund in Australia would have lost all data stored with Google Cloud, had they not relied on a third-party backup. A rare blunder from GCP, where regional replication did not stop the deletion – and a just as rare statement from Google Cloud’s CEO taking the blame.

Tesla trouble accompanies poorly executed layoffs. The market leader for electric vehicles in the US is seeing demand for cars drop, and is responding with mass layoffs. This included firing the 500-person Supercharger team: only to start to rehire them a week later.

1. Industry pulse Open source projects ban AI-generated code commits

Read more


Ben Werdmüller

Google’s broken link to the web

"A quarter-century into its existence, a company that once proudly served as an entry point to a web that it nourished with traffic and advertising revenue has begun to abstract that all away into an input for its large language models." This has the potential to be a disaster for the web and everyone who depends on it: for journalism, for bloggers, for communities, for every

"A quarter-century into its existence, a company that once proudly served as an entry point to a web that it nourished with traffic and advertising revenue has begun to abstract that all away into an input for its large language models."

This has the potential to be a disaster for the web and everyone who depends on it: for journalism, for bloggers, for communities, for every voice that couldn't be heard without an open, egalitarian platform.

The answer for all of those stakeholders has to be depending on forging real, direct relationships with real people. It doesn't scale; it doesn't fit well with a unidirectional broadcast model for publishing; it's now how most people who make content think about what they do. But it's how all of them are going to survive and continue to find each other.

I've been urging publishers to stop using the word "audience" and to replace it with "community", and to think about what verb might replace "publish" in a multi-directional web that is more about relationships than it is reaching mass eyeballs.

Of course, it might go in a direction we haven't predicted. We'll find out very soon; the only real certainty is that things are changing, and the bedrock that many people have depended on for two decades is shifting. #Technology

[Link]


Mozilla Foundation Welcomes Nabiha Syed as Executive Director

This is great news for Mozilla, for everyone who uses the internet, and for everyone who cares about ethics, privacy, and human rights. We need a well-functioning Mozilla more than ever - and that much-needed presence has been absent for years. The spirit in the following quote gives me a lot of hope - I think this is how all technology should be built, and how all techn

This is great news for Mozilla, for everyone who uses the internet, and for everyone who cares about ethics, privacy, and human rights.

We need a well-functioning Mozilla more than ever - and that much-needed presence has been absent for years.

The spirit in the following quote gives me a lot of hope - I think this is how all technology should be built, and how all technologists should approach their work, but it's rarely true:

“After all, the technology we have now was once just someone’s imagination. We can dream, build, and demand technology that serves all of us, not just the powerful few.”

I hope - and believe - that she can make it happen. #Technology

[Link]


Patrick Breyer

Piraten zur Verurteilung eines Tornado Cash-Programmierers: Privatsphäre ist kein Verbrechen!

Ein niederländisches Gericht hat gestern den Programmierer Alexey Pertsev zu fünf Jahren Haft verurteilt. Er sei deshalb der Geldwäsche schuldig, weil die von ihm entwickelte Software „Tornado Cash“ auch Straftätern vollständig …

Ein niederländisches Gericht hat gestern den Programmierer Alexey Pertsev zu fünf Jahren Haft verurteilt. Er sei deshalb der Geldwäsche schuldig, weil die von ihm entwickelte Software „Tornado Cash“ auch Straftätern vollständig anonyme und nicht rückverfolgbare Krypto-Transaktionen ermögliche (sog. „crypto mixer“). „Die Wahrung der Anonymität des Nutzers und die Verschleierung des Transaktionsverlaufs standen dabei im Mittelpunkt,“ so der Vorwurf des Gerichts. Aufgrund ihrer Funktionsweise sei die Software „speziell für Straftäter bestimmt“. In 36 Fällen seien gestohlene Kryptowährungen mithilfe der Software anonym in Umlauf gebracht worden, wofür der Programmierer verantwortlich sei. Das Gericht warf dem Angeklagten in seinem Urteil eine „Ideologie maximaler Privatsphäre“ vor.

Der Europaabgeordnete der Piratenpartei und Jurist Dr. Patrick Breyer warnt vor den Konsequenzen der Verurteilung: „Dieses Urteil kriminalisiert legitime Anonymität und alle, die sie als Programmierer ermöglichen. Die bei Bargeld selbstverständliche Anonymität, die unsere finanzielle Freiheit schützt, darf bei Digitalwährungen nicht kriminalisiert werden.

Die Konsequenzen dieser Denkweise könnten weit über Kryptowährungen hinaus auch Programmierer von Messengersoftware oder Anonymisierungsnetzwerke treffen. Aus diesem Geist heraus hat die EU zuletzt anonyme Barzahlungen eng begrenzt und mit der Chatkontrolle die Zerstörung des digitalen Briefgeheimnisses vorgeschlagen.

In jeder Freiheit nur den Missbrauch durch Einzelne zu sehen, schafft Unfreiheit und ist eine autoritäre Denkweise. Wir Piraten sind digitale Freiheitskämpfer, weil Freiheit Grundlage unserer Gesellschaft ist und ihr weit mehr nützt als der befürchtete Schaden durch ihren kriminellen Missbrauch.“

Die Spitzenkandidatin der Piratenpartei zur Europawahl und Informatikerin Anja Hirschel erklärt: „Wir Piraten sind klar pro Anonymisierung. Die Programmierung, der Betrieb und die Verwendung von Privacy-Tools muss diskriminierungsfrei möglich sein. Aus ihrer Verwendung reflexartig strafbare Handlungen unterstellen zu wollen ist grundlegend falsch.”


Simon Willison

Quoting Richard Schneeman

[...] by default Heroku will spin up multiple dynos in different availability zones. It also has multiple routers in different zones so if one zone should go completely offline, having a second dyno will mean that your app can still serve traffic. — Richard Schneeman

[...] by default Heroku will spin up multiple dynos in different availability zones. It also has multiple routers in different zones so if one zone should go completely offline, having a second dyno will mean that your app can still serve traffic.

Richard Schneeman


Webistemology - John Wunderlich

Bill 194, Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024

While Bill 194 introduces some welcome upgrades to Ontario's cybersecurity and privacy legislation, it falls short of delivering in several key areas, particularly protecting employees' privacy.

This post intends to contribute to the public debate on what could have been a significant legislation - Bill 194. This post is not a summary of Bill 194. I am not a lawyer, and this is not a legal analysis. The post below draws on my experience as a privacy and data protection expert and understanding of current standards and legislation. I will provide an overview of the bill's scope, goals, and provisions and assess its potential to enhance Ontario’s cybersecurity and respect the privacy of Ontarians. While Bill 194 introduces some welcome upgrades to Ontario's cybersecurity and privacy legislation, it falls short of delivering in several key areas, particularly protecting employees' privacy.

Overview

Bill 194, Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024 (the Bill), was introduced in the Ontario Legislature for first reading and passed on May 13, 2024. It has been ordered for its Second Reading. Bill 194 has been introduced in the current context of the ongoing evolution of cybersecurity and privacy threats and the explosive growth of artificial intelligence. The Bill is, therefore, not surprising in what it is intended to address:

The Act addresses cyber security and artificial intelligence systems at public sector entities. Public sector entities are institutions within the meaning of the Freedom of Information and Protection of Privacy Act and the Municipal Freedom of Information and Protection of Privacy Act, children’s aid societies and school boards. (See explanatory note.)

It is worth noting that the Bill does not make amendments to the Municipal Freedom of Information and Protection of Privacy - MFIPPA (the sister act to the Freedom of Information and Protection of Privacy Act - FIPPA). Hopefully, this can be addressed as the Bill goes through the legislative process.

It must be said that if one of the government's objectives in the Bill were to improve cyber security and privacy protections for Ontarians, this would have been a golden opportunity to introduce private sector legislation to provide a made-in-Ontario solution that could supplement and extend the protections offered by Federal legislation and ensure that Ontarians have robust and equivalent protection in both the public and private sectors. In particular, the government of Ontario's failure to protect employees' privacy is a long-standing issue highlighted by the gaps in this legislation. I note that the current Federal private-sector privacy law is due to be superseded by the contents of Bill C-27, but that is not part of this post.

Employees in Ontario do not have legislation that protects their privacy in either the public or the public sector. Public sector privacy protections were removed in 1995, making Ontario unique among Canadian provinces in that it does not protect the privacy of civil servants at work. It is also the case that, due to employment being in provincial jurisdiction, Federal private-sector privacy legislation does not protect employees in Ontario.

Ontario-based employees in the federal public sector or employed under the federal labour code (entities like banks, for example) have privacy protection under federal legislation. Still, those are estimated to be less than 500,000 of Ontario's nearly 8 million employees or slightly more than 6%. In the private sector, employees under collective agreements, based on arbitral jurisprudence and the specifics of their contract, will have privacy protection, but that accounts for less than 14% of private sector works. I derived these numbers mainly from searching for available Statistics Canada and other online sources.

TL;DR — employees in Ontario are the least likely to have privacy protection at work compared to other provinces or territories.

The Bill

The Bill has two significant elements. Schedule 1, “Enhancing Digital Security and Trust Act,” addresses cyber security issues, the use of artificial intelligence systems, the impact of digital technology on children, and some general provisions, all of which will be addressed below. Schedule 2, “Freedom of Information and Protection of Privacy Act,” amends the Freedom of Information and Protection of Privacy Act, RSO 1990, c F.31. Bill 194 is 51 pages long. From a content perspective, that is about 17 pages in English, with a matching section in French. If you think, "This seems a bit perfunctory, given the complicated nature of cyber security, digital protection of children, and privacy," you would be right. It seems to me that the entire bill could be summarized by saying that the government recognizes the importance of issues and will, therefore, write and implement regulations sometime in the future to deal with them. "Just trust us and pass the bill." When you compare this to the 4 years of discussion that went into creating the 458-page EU Artificial Intelligence Act, it comes up short, literally and figuratively. Closer to home, Bill C-27, which includes the Artificial Intelligence Data Act, is 148 pages (or 74 pages in English) but is accompanied by more than 100 civil society, industry, and expert submissions on the provisions and issues of the bill.

Schedule 1, Enhancing Digital Security and Trust Act

The following describes some of the more significant elements of this part of the Act. This includes Definitions (s. 1), Cyber Security (s. 2 - 4), Use of Artificial Intelligence Systems (s. 5 - 8), and Digital Technology Affecting Individuals Under Age 18 (s. 9 - 11), and some concluding general sections.

Definitions

The Bill adds a definition of artificial intelligence that appears to be derived, at least in part, from the definition of an AI system in Article 3 of the EU Artificial Intelligence Act. (An easier-to-use reference than the official text can be found in the AI Act Explorer prepared by The Future of Life Institute). It may be summarized as any system that infers from input to generate outputs to accomplish explicit or implicit objectives. Using an AI chatbot is an example that is fulfilled by this definition. A sample of definitions that are included in the AI Act but not this act include:

reasonably foreseeable misuse safety component training data input data

It is good that the Bill includes procured services and systems as a "use" of artificial intelligence systems. Still, much of the success of this approach will be determined by the nature of the due diligence in Ontario Public Service (OPS) procurement requirements for AI and machine learning systems. Another positive inclusion is that digital information includes collection, use, retention or disclosure by a third party. This will help ensure that accountability remains with the originating government institution.

Cyber Security

This part of Bill 194 boils down to a requirement for the government to make regulations governing cyber security, including s. 2 (1):

requiring public sector entities to develop and implement programs for ensuring cyber security; governing programs mentioned in clause (1), which may include prescribing elements to be included in the programs; requiring public sector entities to submit reports to the Minister or a specified individual in respect of incidents relating to cyber security, which may include different requirements in respect of different types of incidents; prescribing the form and frequency of reports.

In the absence of a public consultation on the content and purpose of the governing regulations, there is no assurance that the regulations that will be promulgated will meet diverse stakeholder needs nor that they will be effective in proving the desired effect of protecting security. While section 3 allows the government to make regulations setting technical standards, the devil will be in the details here. Noting that there are boatloads of security standards to choose from. There needs to be governance to ensure that the standards chosen are enforced. For example, I have been a consultant on several projects inside various Ministries, and it sometimes surprises information architects and project managers that there are Government of Ontario Information and Technology Standards (GO-ITS) to which their projects should abide. There is nothing to suggest in the Bill that even if good standards are adopted, they will be enforced with any rigour.

Use of Artificial Intelligence Systems

This part of Bill 194, similar to the prior section, mainly sets out the authority for the government to make regulations to govern the use of AI systems without creating content that could be publicly reviewed or publicly debated. I will note two particular gaps I feel should be addressed.

Developing an accountability framework

Section 5. (3) of the Bill states that each entity using artificial intelligence systems will develop and implement an accountability framework following the yet-to-be-published regulations. I will highlight what I believe to be two flaws with this approach.

There are no assurances in the Bill that marginalized or disadvantaged communities will provide input or be engaged in developing an Accountability Framework for an artificial intelligence system that may significantly impact their lives. Secondly, it appears that the approach in this Bill could lead to a proliferation of entity-specific Accountability Frameworks. This burdens both citizens whose data may be processed in multiple artificial intelligence systems with different frameworks and entities without the appropriate accountability expertise being asked to develop and implement their frameworks.

Rather than a proliferation of frameworks, creating a single Accountability Framework based on transparent, inclusive, and robust stakeholder engagement would be better.

Creating a risk framework

All that Bill 194 says on managing the risk of using artificial intelligence systems is, "A public sector entity to which this section applies shall take such steps as may be prescribed to manage risks associated with the use of the artificial intelligence system." This is woefully inadequate. The high-level risks and harms that can be created using artificial intelligence need to be articulated so that systems that may create high risks to individuals or Ontario as a whole can be identified, and those risks and harms can be identified and either avoided or mitigated. There is no identification of what might be termed unacceptable uses of AI systems or a way to identify whether a high-risk AI system - such as a system that collects biometric information about Ontarians and uses that as a basis for determining access to systems - is acceptable. (In my mind, such a system is inherently unacceptable.)

Digital Technology Affecting Individuals Under Age 18

This section replicates the section above; it essentially boils down to allowing the government to make regulations that

set out how children's information may be collected, used, or disclose require reports about how children's information may be collected, used, or disclosed may prohibit some processing of children's information

I have two broad comments here. The first is that I am somewhat relieved that the government is not trying to introduce broad systems of digital control or censorship in the name of protecting children. Such legislation is usually both overly broad and ineffective in its intended purpose. That isn't to say that there aren't real risks to students that could have been articulated, not least of which is using easily available tools to enable students to create deep fake photos and videos of other students - creating real trauma and having real-world consequences.

My second comment is that many digital risks to students are also digital risks for their parents, including misinformation and other social harms. This legislation would have been a great opportunity, for example, to create a requirement for school boards to develop and provide curricula and training to support students in identifying misinformation through critical digital media training.

General

The last section of Bill 194 includes section 12, which states that nothing in the Act establishes a private law duty of care owed to any person. I'm not a lawyer, but when I looked up the phrase, it said, "A duty recognized by law to take reasonable care to avoid conduct that poses an unreasonable risk of harm to others." My only comment here is to note that despite the title of the bill, the writers of the bill have taken care to ensure that the heads of government institutions do not have a duty to ensure that they take reasonable care to avoid the risk of harm (aside from the requirement of privacy safeguards addition Schedule 2, which doesn't appear to me to be the same thing). It seems that where an individual's information, especially sensitive information, is collected under a legislative authority, the institution or head should have a duty of care for that individual's information. It may be that this is standard language in this kind of legislation, but it still leaves me a little perplexed. 🤷‍♂️

Schedule 2, Freedom of Information and Protection of Privacy Act

This schedule is, in some ways, simpler in that it provides amendments to an existing Act (FIPPA) and doesn't endlessly defer to yet-to-be-determined regulations. Schedule 2 defines "information practices" to FIPPA, which will help those responsible for building systems comply with FIPPA. Some worthwhile elements for reporting have been added. I will take particular note of two significant changes: requirements for privacy impact assessments (PIAs) as well as breach reporting and notification requirement

Privacy Impact Assessments

This is a welcome addition to FIPPA. PIAs are a standard tool for identifying the risks to privacy in a system and recommending steps for their remediation. By standardizing the information required in a PIA, this legislation goes some distance to raising the floor for privacy protection and providing the ability to develop consistent expertise across all of government. I look forward to any prescribed requirements. This is followed by a section on risk mitigation that directs government institutions to implement the recommendations of the PIA

I would be remiss if I didn't point out the obvious gap between this and Schedule 1. There is no directive in Schedule 1 concerning impact assessments for AI systems nor is there a direction to heads to mitigate identified risks.

A copy of PIAs is required to be provided to the Information and Privacy Commissioner if asked. This could be improved by changing this to a mandatory filing with the Commissioner. This doesn’t require the IPC to approve the PIA but does make it available to the Commissioner promptly in case of a complaint or breach related to a system with a PIA.

Breach Reporting and Notice

Schedule 2 adds a Privacy Safeguards section to FIPPA. Specifically, the requirement is that "The head of an institution shall take steps that are reasonable in the circumstances to ensure that personal information in the custody or under the control of the institution is protected against theft, loss and unauthorized use or disclosure and to ensure that the records containing the personal information are protected against unauthorized copying, modification or disposal." This begs the question of why this requirement for privacy safeguards is only being added now, but suffice to applaud it.

The requirement for privacy safeguards provides the underpinning for defining a breach as "any theft, loss or unauthorized use or disclosure of personal information in the custody or under the control of the institution if it is reasonable in the circumstances to believe that there is a real risk that a significant harm to an individual would result...". Such breaches will be reported to the Commissioner, whose budget will hopefully reflect this new obligation. The factors identified as determining whether there is a real risk of significant harm include:

the sensitivity of the personal information; the probability of misuse; the availability of steps that a person could take to reduce the risk of harm mitigate the risk of harm directions or guidance from the Commissioner

With safeguards, breaches, and risks of harm defined, the last piece is the addition of a requirement to notify individuals if there has been a breach of their information. This is welcome but has consequences. In some circumstances, such a notification can be traumatic or require expenditures by the individual to compensate. Where is the requirement to compensate the individual or help them mitigate the impact?

Order Making Power

It is worth noting that the amended FIPPA will provide the Commissioner concerning privacy breaches a new power for the Commissioner and, I suspect, a welcome one to bring the Commissioner's powers for privacy in FIPPA in alignment with her order-making powers for Freedom of Information issues.

Wrapping Up

This post was created within a day or two of Bill 194's First Reading. I look forward to other and deeper contributions to the debate in the days to come. In the meantime, I have these takeaways:

It is past time for Ontario to stop being a laggard in the protection of employee privacy and the government should, at the very least, amend Bill 194 to give public sector employees the privacy protection and respect they deserve. A private sector privacy bill could address employment privacy issues, putting it under the authority of the Commissioner with private sector order-making powers. Alternatively, elements of privacy protection for employees could also be addressed by adding to Ontario's Employment Standards Act. The government should use Bill 194's second reading and committee review to ensure that there is a clear legislative articulation of: What are the acceptable and unacceptable uses of artificial intelligence How to identify, categorize, and mitigate individual and social risks associated with the use of artificial intelligence If the government wants to ensure that digital technology doesn't harm children, it should start with digital media training and take steps to prevent children from using technology to bully other children. Consider recognizing that the government has a duty of care when it processes sensitive personal information under a legislative authority that deprives individuals of the ability to refuse that processing. Adding PIA requirements with breach notifications will raise the bar for institutions processing Ontarians' personal information. This may lead to some interesting changes or headlines in the short term, but the longer-term consequences should be good.

At the end of the day, the government appears to want to be able to take steps to address cybersecurity, children's data processing, and artificial intelligence through regulations. It will be interesting to see how, or if, the consultation process will significantly alter this approach. The public consultation is open until June 11th and can be found at https://www.ontariocanada.com/registry/view.do?postingId=47433&language=en

Wednesday, 15. May 2024

Simon Willison

Quoting Casey Newton

But where the company once limited itself to gathering low-hanging fruit along the lines of “what time is the super bowl,” on Tuesday executives showcased generative AI tools that will someday plan an entire anniversary dinner, or cross-country-move, or trip abroad. A quarter-century into its existence, a company that once proudly served as an entry point to a web that it nourished with traffic a

But where the company once limited itself to gathering low-hanging fruit along the lines of “what time is the super bowl,” on Tuesday executives showcased generative AI tools that will someday plan an entire anniversary dinner, or cross-country-move, or trip abroad. A quarter-century into its existence, a company that once proudly served as an entry point to a web that it nourished with traffic and advertising revenue has begun to abstract that all away into an input for its large language models.

Casey Newton


PaliGemma model README

PaliGemma model README One of the more over-looked announcements from Google I/O yesterday was PaliGemma, an openly licensed VLM (Vision Language Model) in the Gemma family of models. The model accepts an image and a text prompt. It outputs text, but that text can include special tokens representing regions on the image. This means it can return both bounding boxes and fuzzier segment outlines

PaliGemma model README

One of the more over-looked announcements from Google I/O yesterday was PaliGemma, an openly licensed VLM (Vision Language Model) in the Gemma family of models.

The model accepts an image and a text prompt. It outputs text, but that text can include special tokens representing regions on the image. This means it can return both bounding boxes and fuzzier segment outlines of detected objects, behavior that can be triggered using a prompt such as "segment puffins".

You can try it out on Hugging Face.

It's a 3B model, making it feasible to run on consumer hardware.

Via Roboflow: PaliGemma: Open Source Multimodal Model by Google


John Philpin : Lifestream

📸 Mmmm … Whichever way you look at it .. and tasted ra

📸 Mmmm … Whichever way you look at it .. and tasted rather good.

📸 Mmmm …

Whichever way you look at it

.. and tasted rather good.


Simon Willison

OpenAI: Managing your work in the API platform with Projects

OpenAI: Managing your work in the API platform with Projects New OpenAI API feature: you can now create API keys for "projects" that can have a monthly spending cap. The UI for that limit says: If the project's usage exceeds this amount in a given calendar month (UTC), subsequent API requests will be rejected You can also set custom token-per-minute and request-per-minute rate limits for

OpenAI: Managing your work in the API platform with Projects

New OpenAI API feature: you can now create API keys for "projects" that can have a monthly spending cap. The UI for that limit says:

If the project's usage exceeds this amount in a given calendar month (UTC), subsequent API requests will be rejected

You can also set custom token-per-minute and request-per-minute rate limits for individual models.

I've been wanting this for ages: this means it's finally safe to ship a weird public demo on top of their various APIs without risk of accidental bankruptcy if the demo goes viral!

Via @romainhuet


ChatGPT in "4o" mode is not running the new features yet

Monday's OpenAI announcement of their new GPT-4o model included some intriguing new features: Creepily good improvements to the ability to both understand and produce voice (Sam Altman simply tweeted "her"), and to be interrupted mid-sentence New image output capabilities that appear to leave existing models like DALL-E 3 in the dust - take a look at the examples, they seem to have solved co

Monday's OpenAI announcement of their new GPT-4o model included some intriguing new features:

Creepily good improvements to the ability to both understand and produce voice (Sam Altman simply tweeted "her"), and to be interrupted mid-sentence New image output capabilities that appear to leave existing models like DALL-E 3 in the dust - take a look at the examples, they seem to have solved consistent character representation AND reliable text output!

They also made the new 4o model available to paying ChatGPT Plus users, on the web and in their apps.

But, crucially, those big new features were not part of that release.

Here's the relevant section from the announcement post:

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

This is catching out a lot of people. The ChatGPT iPhone app already has image output, and it already has a voice mode. These worked with the previous GPT-4 mode and they still work with the new GPT-4o mode... but they are not using the new model's capabilities.

Lots of people are discovering the voice mode for the first time - it's the headphone icon in the bottom right of the interface.

They try it and it's impressive (it was impressive before) but it's nothing like as good as the voice mode in Monday's demos.

Honestly, it's not at all surprising that people are confused. They're seeing the "4o" option and, understandably, are assuming that this is the set of features that were announced earlier this week.

Most people don't distinguish models from features

Think about what you need to know in order to understand what's going on here:

GPT-4o is a brand new multi-modal Large Language Model. It can handle text, image and audio input and produce text, image and audio output.

But... the version of GPT-4o that has been made available so far - both via the API and via the OpenAI apps - is only able to handle text and image input and produce text output. The other features are not yet available outside of OpenAI (and a select group of partners).

And yet in the apps it can still handle audio input and output and generate images. That's because the app version of the model is wrapped with additional tools.

The audio input is handled by a separate model called Whisper, which converts speech to text. That text is then fed into the LLM, which generates a text response.

The response is passed to OpenAI's boringly-named tts-1 (or maybe tts-1-hd) model (described here), which converts that text to speech.

While nowhere near as good as the audio in Monday's demo, tts-1 is still a really impressive model. I've been using it via my ospeak CLI tool since it was released back in November.

As for images? Those are generated using DALL-E 3, through a process where ChatGPT directly prompts that model. I wrote about how that works back in October.

So what's going on with ChatGPT's GPT-4o mode is completely obvious, provided you already understand:

GPT-4 v.s. GPT-4o Whisper tts-1 DALL-E 3 Why OpenAI would demonstrate these features and then release a version of the model that doesn't include them

I'm reminded of the kerfluffle back in March when the Google Gemini image creator was found to generate images of Black Nazis. I saw a whole bunch of people refer to that in conversations about the Google Gemini Pro 1.5 LLM, released at the same time, despite the quality of that model being entirely unrelated to Google's policy decisions about how one of the interfaces to that model should make use of the image creator tool.

What can we learn from this?

If you're fully immersed in this world, it's easy to lose track of how incredibly complicated these systems have become. The amount you have to know in order to even understand what that "4o" mode in the ChatGPT app does is very easy to underestimate.

Fundamentally these are challenges in user experience design. You can't just write documentation about them, because no-one reads documentation.

A good starting here is to acknowledge the problem. LLM systems are extremely difficult to understand and use. We need to design the tools we build on top of them accordingly.

Update: a UI workaround

On May 16th around 1PM PT OpenAI released a new iPhone app update which adds the following warning message the first time you try to access that headphones icon:

New Voice Mode coming soon

We plan to launch a new Voice Mode with new GPT-4o capabilities in an alpha within ChatGPT Plus in the coming weeks. We'll let you know when you have access.


Quoting Arvind Narayanan

If we want LLMs to be less hype and more of a building block for creating useful everyday tools for people, AI companies' shift away from scaling and AGI dreams to acting like regular product companies that focus on cost and customer value proposition is a welcome development. — Arvind Narayanan

If we want LLMs to be less hype and more of a building block for creating useful everyday tools for people, AI companies' shift away from scaling and AGI dreams to acting like regular product companies that focus on cost and customer value proposition is a welcome development.

Arvind Narayanan


How to PyCon

How to PyCon Glyph’s tips on making the most out of PyCon. I particularly like his suggestion that “dinners are for old friends, but lunches are for new ones”. I’m heading out to Pittsburgh tonight, and giving a keynote (!) on Saturday. If you see me there please come and say hi! Via Lobste.rs

How to PyCon

Glyph’s tips on making the most out of PyCon. I particularly like his suggestion that “dinners are for old friends, but lunches are for new ones”.

I’m heading out to Pittsburgh tonight, and giving a keynote (!) on Saturday. If you see me there please come and say hi!

Via Lobste.rs


Ben Werdmüller

Former Far-Right Hard-Liner Says Billionaires Are Targeting Texas Public Education

"When Courtney Gore ran for a seat on her local school board in 2021, she warned about a movement to indoctrinate children with “leftist” ideology. After 2 1/2 years on the board, Gore said she believes a much different scheme is unfolding: an effort by wealthy conservative donors to undermine public education in Texas and install a voucher system in which public money flows t

"When Courtney Gore ran for a seat on her local school board in 2021, she warned about a movement to indoctrinate children with “leftist” ideology. After 2 1/2 years on the board, Gore said she believes a much different scheme is unfolding: an effort by wealthy conservative donors to undermine public education in Texas and install a voucher system in which public money flows to private and religious schools."

An interesting ProPublica story about the motivation behind some of the money that's funded these bizarre right-wing school board elections. It's not so much about the ideology as it is about undermining trust in public education itself, so that it can be replaced with a voucher system that would benefit the underwriters.

This quote says it all:

“It’s all about destroying the trust with the citizens to the point where they would tolerate something like doing away with public schools.” #Education

[Link]


Simon Willison

Quoting Bruce Schneier

But unlike the phone system, we can’t separate an LLM’s data from its commands. One of the enormously powerful features of an LLM is that the data affects the code. We want the system to modify its operation when it gets new training data. We want it to change the way it works based on the commands we give it. The fact that LLMs self-modify based on their input data is a feature, not a bug. And i

But unlike the phone system, we can’t separate an LLM’s data from its commands. One of the enormously powerful features of an LLM is that the data affects the code. We want the system to modify its operation when it gets new training data. We want it to change the way it works based on the commands we give it. The fact that LLMs self-modify based on their input data is a feature, not a bug. And it’s the very thing that enables prompt injection.

Bruce Schneier


@_Nat Zone

プロ責法あらため情プラ法が成立しました(5月10日)

わたしも構成員であった総務省プラットフォームに関する研究会での検討を受けて、この3月に国会に提出された改正「プロバイダ責任制限法」改め「情プラ法」が、木村花さんの4回目の命日(5月23日)を前に、5月10日、国会で成立しました。法律名が変更されたのは「これまでの投稿の発信者情報の…

わたしも構成員であった総務省プラットフォームに関する研究会での検討を受けて、この3月に国会に提出された改正「プロバイダ責任制限法」1改め「情プラ法」2が、木村花さんの4回目の命日(5月23日)を前に、5月10日、国会で成立しました。法律名が変更されたのは「これまでの投稿の発信者情報の開示等にとどまらない内容となったため」3です。

法案、その他の関係資料は以下のとおりです4

令和6年3月1日特定電気通信役務提供者の損害賠償責任の制限及び発信者情報の開示に関する法律の一部を改正する法律案概要【325 KB】
要綱【97 KB】
法律案・理由【160 KB】
新旧対照条文【254 KB】
参照条文【310 KB】
 (所管課室名)
総合通信基盤局電気通信事業部利用環境課(出所)総務省 <https://www.soumu.go.jp/menu_hourei/k_houan.html> (2024-05-15取得)

この改正は、近年のSNS等での権利侵害情報の流通による被害の増加を受け、プラットフォーム事業者の責任を強化し、被害者の保護を図ることを目的としています。大規模プラットフォーム事業者には削除等の義務が課され、違反には罰則が科されることになりました。

主な改正点は以下の通りです:

法律の題名と通称名が変更された。新しい題名は上記の通りで、通称名は「情報流通プラットフォーム対処法(情プラ法)」となった。 第5章として「大規模特定電気通信役務提供者の義務」が新設された。これにより、一定規模以上のプラットフォーム事業者に対し、権利侵害情報の流通防止措置等の義務が課されることになった。 第6章として「罰則」が新設された。義務に違反した場合の罰則規定が設けられた。 条文数が従来の19条から38条へと倍増した。 総務省プラットフォームに関する研究会について

総務省プラットフォームに関する研究会は平成30年(2018年)10月18日から足掛け6年、全52回にわたって開催されてきた研究会です。令和2年2月に最終報告書を出し、一旦は終了したはずだったのですが、木村花さんの誹謗中傷と自殺事件を受け、緊急再招集され、以後、言論の自由を含め様々座な角度から検討を進めてきました。その経緯は、プラットフォームに関する研究会の概要ページにある議事概要や以下のような文書から見ていただけると思います。

インターネット上の誹謗中傷への対応の在り方についての意見募集(令和2年7月3日) インターネット上の誹謗中傷への対応の在り方に関する緊急提言及び意見募集の結果の公表 中間とりまとめ(案)についての意見募集に関する報道発表(令和3年7月16日) 中間とりまとめ(案)についての意見募集の結果及び中間とりまとめの公表に関する報道発表(令和3年9月15日) 第二次とりまとめ(案)についての意見募集に関する報道発表(令和4年7月4日) 第二次とりまとめ(案)についての意見募集の結果及び第二次とりまとめの公表に関する報道発表(令和4年8月25日) 第三次とりまとめ(案)についての意見募集に関する報道発表(令和5年12月13日) 第三次とりまとめ(案)についての意見募集の結果及び第三次とりまとめの公表に関する報道発表(令和6年2月2日)

そして、本年、第52回(令和6年1月31日)をもって最終回となり、第三次とりまとめを発表し、クローズいたしました。今次法改正は、この第三次とりまとめがベースとなっています。

この間、総務省および事務局の方々には困難な議論を取りまとめていただき、法改正にまでつなげていただけたことに本当に感謝しております。また、微力ながらわたくしも本件に貢献できたことをとても嬉しく思っております。


Simon Willison

Quoting John Gruber

The MacBook Airs are Apple’s best-selling laptops; the iPad Pros are Apple’s least-selling iPads. I think it’s as simple as this: the current MacBook Airs have the M3, not the M4, because there isn’t yet sufficient supply of M4 chips to satisfy demand for MacBook Airs. — John Gruber

The MacBook Airs are Apple’s best-selling laptops; the iPad Pros are Apple’s least-selling iPads. I think it’s as simple as this: the current MacBook Airs have the M3, not the M4, because there isn’t yet sufficient supply of M4 chips to satisfy demand for MacBook Airs.

John Gruber


John Philpin : Lifestream

🔗 When the city told him to hide his boat, he complied — sor

🔗 When the city told him to hide his boat, he complied — sort of - The Times of London - sorry 🍎 📰 - just couldn’t resist. A fine example of creativity besting bureaucracy.

🔗 When the city told him to hide his boat, he complied — sort of - The Times of London - sorry 🍎 📰 - just couldn’t resist. A fine example of creativity besting bureaucracy.


Translating A Transcript

German Language Podcast. In 🍎 Podcasts - you can see the German language transcription. BUT - you can’t copy the whole thing. Looks like you can. But you can’t. I use software to record meetings. Ooh I thought. Play the podcast - and record - at end of podcast one german transcription. Run that through a translator and voila … or so I thought. I did an experiment - thankfully. What

German Language Podcast.
In 🍎 Podcasts - you can see the German language transcription.
BUT - you can’t copy the whole thing.
Looks like you can.
But you can’t.

I use software to record meetings.

Ooh I thought.

Play the podcast - and record - at end of podcast one german transcription. Run that through a translator and voila … or so I thought.

I did an experiment - thankfully.

What the software actually seems to have done is taken the german audio and decided as best it could as to what the german audio was saying - as if it was English.

This was the transcription ..

00:00 Here That’s an idea. 02:05 Like the fruits for real the presidential Here for learn have much of the jobs about completed here in the middle is also provided, you know. 02:50 Silicon Valley can manage to our design I am gonna Valley There are not only isn’t as operating system for startups on invert one in a ropa want to talk about he knows it happens startups of middle stunt or of course team. 03:15 What are all in this one entertain? 03:18 I was eating product on his product and from about yeah. 03:23 I know what linear ball over with design what is supply? 03:28 Put on my mobile phones and Seed talk follow-up disabled and antonym of familial torch and lived in solution in California and Silicon Valley and San Francisco on her off my shoulder and startup boat about under startups of cardboard and get a normal about Cosmopolis this year noises Venture on Cosmopolis is some Lisa has been a community have another Community developers and against Trump his own and hell for some spring on this fits here on this is another Simple so that’s called.

Back to the drawing board.

Tuesday, 14. May 2024

John Philpin : Lifestream

Elon Musk’s startup, xAI, is nearing a multiyear deal to s

Elon Musk’s startup, xAI, is nearing a multiyear deal to spend $10 billion to rent Oracle cloud servers 💬 The Information 10 BILLION. To RENT.

Elon Musk’s startup, xAI, is nearing a multiyear deal to spend $10 billion to rent Oracle cloud servers

💬 The Information

10 BILLION. To RENT.


Simon Willison

Context caching for Google Gemini

Context caching for Google Gemini Another new Gemini feature announced today. Long context models enable answering questions against large chunks of text, but the price of those long prompts can be prohibitive—$3.50/million for Gemini Pro 1.5 up to 128,000 tokens and $7/million beyond that. Context caching offers a price optimization, where the long prefix prompt can be reused between requests

Context caching for Google Gemini

Another new Gemini feature announced today. Long context models enable answering questions against large chunks of text, but the price of those long prompts can be prohibitive—$3.50/million for Gemini Pro 1.5 up to 128,000 tokens and $7/million beyond that.

Context caching offers a price optimization, where the long prefix prompt can be reused between requests, halving the cost per prompt but at an additional cost of $4.50 / 1 million tokens per hour to keep that context cache warm.

Given that hourly extra charge this isn’t a default optimization for all cases, but certain high traffic applications might be able to save quite a bit on their longer prompt systems.

It will be interesting to see if other vendors such as OpenAI and Anthropic offer a similar optimization in the future.

Via @officiallogank


llm-gemini 0.1a4

llm-gemini 0.1a4 A new release of my llm-gemini plugin adding support for the Gemini 1.5 Flash model that was revealed this morning at Google I/O. I'm excited about this new model because of its low price. Flash is $0.35 per 1 million tokens for prompts up to 128K token and $0.70 per 1 million tokens for longer prompts - up to a million tokens now and potentially two million at some point in th

llm-gemini 0.1a4

A new release of my llm-gemini plugin adding support for the Gemini 1.5 Flash model that was revealed this morning at Google I/O.

I'm excited about this new model because of its low price. Flash is $0.35 per 1 million tokens for prompts up to 128K token and $0.70 per 1 million tokens for longer prompts - up to a million tokens now and potentially two million at some point in the future. That's 1/10th of the price of Gemini Pro 1.5, cheaper than GPT 3.5 ($0.50/million) and only a little more expensive than Claude 3 Haiku ($0.25/million).


How developers are using Gemini 1.5 Pro’s 1 million token context window

How developers are using Gemini 1.5 Pro’s 1 million token context window I got to be a talking head for a few seconds in an intro video for today's Google I/O keynote, talking about how I used Gemini Pro 1.5 to index my bookshelf (and with a cameo from my squirrel nutcracker). I'm at 1m25s. (Or at 10m6s in the full video of the keynote)

How developers are using Gemini 1.5 Pro’s 1 million token context window

I got to be a talking head for a few seconds in an intro video for today's Google I/O keynote, talking about how I used Gemini Pro 1.5 to index my bookshelf (and with a cameo from my squirrel nutcracker). I'm at 1m25s.

(Or at 10m6s in the full video of the keynote)


Why your voice assistant might be sexist

Why your voice assistant might be sexist Given OpenAI's demo yesterday of a vocal chat assistant with a flirty, giggly female voice - and the new ability to be interrupted! - it's worth revisiting this piece by Chris Baraniuk from June 2022 about gender dynamics in voice assistants. Includes a link to this example of a synthesized non-binary voice. Via MetaFilter comment

Why your voice assistant might be sexist

Given OpenAI's demo yesterday of a vocal chat assistant with a flirty, giggly female voice - and the new ability to be interrupted! - it's worth revisiting this piece by Chris Baraniuk from June 2022 about gender dynamics in voice assistants. Includes a link to this example of a synthesized non-binary voice.

Via MetaFilter comment


The Pragmatic Engineer

Applied AI Software Engineering: RAG

Retrieval-Augmented Generation (RAG) is a common building block of AI software engineering. A deep dive into what it is, its limitations, and some alternative use cases. By Ross McNairn.

👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

I recently spoke with Karthik Hariharan, who heads up engineering at VC firm Goodwater Capital, and he highlighted a trend he’d spotted:

“There’s an engineering project I’m seeing almost every startup building a Large Language Model (LLM) put in place: building their own Retrieval Augmentation Generation (RAG) pipelines.

RAGs are a common pattern for anyone building an LLM application. This is because it provides a layer of ‘clean prompts’ and fine-tuning. There are some existing open-source solutions, but almost everyone just builds their own, anyway.”

I asked a few Artificial Intelligence (AI) startups about this, and sure enough, all do build their own RAG. So, I reached out to a startup I know is doing the same: Wordsmith AI. It’s an AI startup for in-house legal teams that’s making heavy use of RAG, and was co-founded by Ross McNairn. He and I worked for years together at Skyscanner and he offered to share Wordsmith AI’s approach for building RAG pipelines, and some learnings. Declaration of interest: I’m an investor in Wordsmith, and the company has recently launched out of stealth.

Today, we cover:

Providing an LLM with additional context

The simplest RAGs

What is a RAG pipeline?

Preparing the RAG pipeline data store

Bringing it all together

RAG limitations

Real-world learnings building RAG pipelines

Today’s article includes a “code-along,” so you can build your own RAG. View the code used in this article at this GitHub repository: hello-wordsmith. To keep up with Ross, subscribe to his blog or follow him on LinkedIn.

With that, it’s over to Ross:

Introduction

Hi there! This post is designed to help you get familiar with one of the most fundamental patterns of AI software engineering: RAG, aka Retrieval Augmented Generation.

I co-founded a legal tech startup called Wordsmith, where we are building a platform for running a modern in-house legal team. Our founding team previously worked at Meta, Skyscanner, Travelperk and KPMG.

We are working in a targeted domain – legal texts – and building AI agents to give in-house legal teams a suite of AI tools to remove bottlenecks and improve how they work with the rest of the business. Performance and accuracy are key characteristics for us, so we’ve invested a lot of time and effort in how to best enrich and “turbo charge” these agents with custom data and objectives.

We ended up building our RAG pipeline, and I will now walk you through how we did it and why. We’ll go into our learnings, and how we benchmark our solution. I hope that the lessons we learned are useful for all budding AI engineers.

1. Providing an LLM with additional context

Have you ever asked ChatGPT a question it does not know how to answer, or its answer is too high level? We’ve all been there, and all too often, interacting with a GPT feels like talking to someone who speaks really well, but doesn’t know the facts. Even worse, they can make up the information in their responses!

Here is one example. On 1 February 2024, during an earnings call, Mark Zuckerberg laid out the strategic benefits of Meta’s AI strategy. But when we ask ChatGPT a question about this topic, this model will make up an answer that is high-level, but is not really what we want:

ChatGPT 3.5’s answer to a question about Meta’s AI strategy. The answer is generalized, and misses a critical source which answers the question

This makes sense, as the model’s training cutoff date was before Mark Zuckerberg made the comments. If the model had access to that information, it would have likely been able to summarize the facts of that meeting, which are:

“So I thought it might be useful to lay out the strategic benefits [of Meta’s open source strategy) here. (...)

The short version is that open sourcing improves our models. (...)

First, open-source software is typically safer and more secure as well as more compute-efficient to operate due to all the ongoing feedback, scrutiny and development from the community. (...)

Second, open-source software often becomes an industry standard. (...)

Third, open source is hugely popular with developers and researchers. (...)

The next part of our playbook is just taking a long-term approach towards the development.”

LLMs’ understanding of the world is limited to the data they’re trained on. If you’ve been using ChatGPT for some time, you might remember this constraint in the earlier version of ChatGPT, when the bot responded: “I have no knowledge after April 2021,” in several cases. 

Providing an LLM with additional information

There is a bunch of additional information you want an LLM to use. In the above example, I might have the transcripts of all of Meta’s shareholders meetings that I want the LLM to use. But how can we provide this additional information to an existing model?

Option 1: input via a prompt

The most obvious solution is to input the additional information via a prompt; for example, by prompting “Using the following information: [input a bunch of data] please answer the question of [ask your question].”

This is a pretty good approach. The biggest problem is that this may not scale because of these reasons:

The input tokens limit. Every model has an input prompt token limit. At the time of publication this is 4.069 tokens for GPT-3, 16,385 for GPT-3.5, 8,192 for GPT-4, 128,000 for GPT-4 Turbo, 200.000 for Anthropic models. Google’s Gemini model allows for an impressive one million token limit. While a million-token limit greatly increases the possibilities, it might still be too low for use cases with a lot of additional text to input.

Performance. The performance of LLMs substantially decreases with longer input prompts; in particular, you get degradation of context in the middle of your prompt. Even when creating long input prompts is a possibility, the performance tradeoff might make it impractical.

Option 2: fine-tune the model

We know LLMs are based on a massive weights matrix. Read more on how ChatGPT works in this Pragmatic Engineer issue. All LLMs use the same principles.

An option is to update these weight matrices based on additional information we’d like our model to know. This can be a good option, but it is a much higher upfront cost in terms of time, money, and computing resources. Also, it can only be done with access to the model’s weightings, which is not the case when you use models like ChatGPT, Anthropic, and other “closed source” models.

Option 3: RAG

The term ‘RAG’ originated in a 2020 paper led by Patrick Lewis. One thing many people notice is that “Retrieval Augmented Generation” sounds a bit ungrammatical. Patrick agrees, and has said this:

“We always planned to have a nicer-sounding name, but when it came time to write the paper, no one had a better idea.”

RAG is a collection of techniques which help to modify a LLM, so it can fill in the gaps and speak with authority, and some RAG implementations even let you cite sources. The biggest benefits of the RAG approach:

Give a LLM domain-specific knowledge You can pick what data you want your LLM to draw from, and even turn it into a specialist on any topic there is data about. 

This flexibility means you can also extend your LLMs’ awareness far beyond the model’s training cutoff dates, and even expose it to near-real time data, if available.

Optimal cost and speed. For all but a handful of companies, it's impractical to even consider training their own foundational model as a way to personalize the output of an LLM, due to the very high cost and skill thresholds. 

In contrast, deploying a RAG pipeline will get you up-and-running relatively quickly for minimal cost. The tooling available means a single developer can have something very basic functional in a few hours.

Reduce hallucinations. “Hallucination” is the term for when LLMs “make up” responses. A well-designed RAG pipeline that presents relevant data will all but eliminate this frustrating side effect, and your LLM will speak with much greater authority and relevance on the domain about which you have provided data.

For example, in the legal sector it’s often necessary to ensure an LLM draws its insight from a specific jurisdiction. Take the example of asking a model a seemingly simple question, like:

How do I hire someone?

Your LLM will offer context based on the training data. However, you do not want the model to extract hiring practices from a US state like California, and combine this with British visa requirements! 

With RAG, you control the underlying data source, meaning you can scope the LLM to only have access to a single jurisdiction’s data, which ensures responses are consistent.

Better transparency and observability. Tracing inputs and answers through LLMs is very hard. The LLM can often feel like a “black box,” where you have no idea where some answers come from. With RAG, you see the additional source information injected, and debug your responses.

2. The simplest RAGs

The best way to understand new technology is often just to play with it. Getting a basic implementation up and running is relatively simple, and can be done with just a few lines of code. To help, Wordsmith has created a wrapper around the LlamaIndex open source project to help abstract away some complexity. You can get up and running, easily. It has a README file in place that will get you set up with a local RAG pipeline on your machine, and which chunks and embeds a copy of the US Constitution, and lets you search away with your command line.

This is as simple as RAGs get; you can “swap out” the additional context provided in this example by simply changing the source text documents!

This article is designed as a code-along, so I'm going to link you to sections of this repo, so you can see where specific concepts manifest in code.

To follow along with the example, the following is needed:

An active OpenAI subscription with API usage. Set one up here if needed. Note: running a query will cost in the realm of $0.25-$0.50 per run.

Follow the instructions to set up a virtual Python environment, configure your OpenAI key, and start the virtual assistant.

This example will load the text of the US constitution from this text file, as a RAG input. However, the application can be extended to load your own data from a text file, and to “chat” with this data.

Here’s an example of how the application works when set up, and when the OpenAI API key is configured:

The example RAG pipeline application answering questions using the US Constitution supplied as additional context

If you’ve followed along and have run this application: congratulations! You have just executed a RAG pipeline. Now, let’s get into explaining how it works.

3. What is a RAG pipeline?

A RAG pipeline is a collection of technologies needed to enable the capability of answering using provided context. In our example, this context is the US Constitution and our LLM model is enriched with additional data extracted from the US Constitution document. 

Here are the steps to building a RAG pipeline:

 
Step 1: Take an inbound query and deconstruct it into relevant concepts
Step 2: Collect similar concepts from your data store
Step 3: Recombine these concepts with your original query to build a more relevant, authoritative answer.

Weaving this together:

A RAG pipeline at work. It extends the context an LLM has access to, by fetching similar concepts from the data store to answer a question

While this process appears simple, there is quite a bit of nuance in how to approach each step. A number of decisions are required to tailor to your use case, starting with how to prepare the data for use in your pipeline.

4. Preparing the RAG pipeline data store

Read more


John Philpin : Lifestream

📸 Sad

📸 Sad

📸 Sad


Hers what I’m thinking.Trump announced the Barron story .. a

Hers what I’m thinking.Trump announced the Barron story .. and Melania killed it.

Hers what I’m thinking.Trump announced the Barron story .. and Melania killed it.


📸 Early this morning. And then a bit later …. You got

📸 Early this morning. And then a bit later …. You got to stop taking them eventually. Right?

📸 Early this morning.

And then a bit later ….

You got to stop taking them eventually. Right?

Monday, 13. May 2024

Simon Willison

LLM 0.14, with support for GPT-4o

LLM 0.14, with support for GPT-4o It's been a while since the last LLM release. This one adds support for OpenAI's new model: llm -m gpt-4o "fascinate me" Also a new llm logs -r (or --response) option for getting back just the response from your last prompt, without wrapping it in Markdown that includes the prompt. Plus nine new plugins since 0.13!

LLM 0.14, with support for GPT-4o

It's been a while since the last LLM release. This one adds support for OpenAI's new model:

llm -m gpt-4o "fascinate me"

Also a new llm logs -r (or --response) option for getting back just the response from your last prompt, without wrapping it in Markdown that includes the prompt.

Plus nine new plugins since 0.13!


Just a Theory

Extension Summit Topic Review

Final Postgres Extension Mini-Summit! On Wednesday May 15 at noon Eastern / 16:00 UTC, we’ll review some of the topics that have come up in previous Mini-Summits with the goal to winnow down and select issues to address at PGConf.dev.

Boy howdy that went fast.

This Wednesday, May 15, the final Postgres extension ecosystem mini-summit will review topics covered in previous Mini-Summits, various Planet PostgreSQL posts, the #extensions channel on the Postgres Slack and the Postgres Discord. Following a brief description of each, we’ll determine how to reduce the list to the most important topics to take on at the Extension Ecosystem Summit at PGConf.dev in Vancouver on May 28. I’ll post a summary later this week along with details for how to participate in the selection process.

In the meantime, here’s the list as of today:

Metadata: Third-party dependencies Types of extensions Taxonomies System requirements (OS, version, CPU, etc.) Categorization Versioning Registry: Identity, namespacing, and uniqueness Distributed vs. centralized publishing Binary packaging and distribution patterns Federated distribution Services and tools to improve or build Stats, Reports, Badging: (stars, reviews, comments, build & test matrices, etc.) Packaging: Formats (e.g., tarball, OCI, RPM, wheel, etc.) Include dynamic libs in binary packaging format? (precedent: Python wheel) Build farming Platforms, architectures, and OSes Security, trust, and verification Developer: Extension developer tools Improving the release process Build pipelines: Supporting PGXS, prgx, Rust, Go, Python, Ruby, Perl, and more Community: Community integration: identity, infrastructure, and support How-Tos, tutorials, documentation for creating, maintaining, and distributing extensions Docs/references for different types of extensions: CREATE EXTENSION, hooks, background workers, CLI apps/services, web apps, native apps, etc. Core: Second extension directory (a.k.a. variable installation location, search path) Keeping all files in a single directory Documentation standard Inline extensions: UNITs, PACKAGEs, TLEs, etc. Minimizing restarts Namespacing Sandboxing, code signing, security, trust Dynamic module loading (e.g., use Thing in PL/Perl could try to load Thing.pm from a table of acceptable libraries maintained by the DBA) Binary compatibility of minor releases and/or /ABI stability

Is your favorite topic missing? Join us at the mini-summit or drop suggestions into the #extensions channel on the Postgres Slack.

More about… Postgres Yum PGConf Summit

Simon Willison

Hello GPT-4o

Hello GPT-4o OpenAI announced a new model today: GPT-4o, where the o stands for "omni". It looks like this is the gpt2-chatbot we've been seeing in the Chat Arena the past few weeks. GPT-4o doesn't seem to be a huge leap ahead of GPT-4 in terms of "intelligence" - whatever that might mean - but it has a bunch of interesting new characteristics. First, it's multi-modal across text, images an

Hello GPT-4o

OpenAI announced a new model today: GPT-4o, where the o stands for "omni".

It looks like this is the gpt2-chatbot we've been seeing in the Chat Arena the past few weeks.

GPT-4o doesn't seem to be a huge leap ahead of GPT-4 in terms of "intelligence" - whatever that might mean - but it has a bunch of interesting new characteristics.

First, it's multi-modal across text, images and audio as well. The audio demos from this morning's launch were extremely impressive.

ChatGPT's previous voice mode worked by passing audio through a speech-to-text model, then an LLM, then a text-to-speech for the output. GPT-4o does everything with the one model, reducing latency to the point where it can act as a live interpreter between people speaking in two different languages. It also has the ability to interpret tone of voice, and has much more control over the voice and intonation it uses in response.

It's very science fiction, and has hints of uncanny valley. I can't wait to try it out - it should be rolling out to the various OpenAI apps "in the coming weeks".

Meanwhile the new model itself is already available for text and image inputs via the API and in the Playground interface, as model ID "gpt-4o" or "gpt-4o-2024-05-13". My first impressions are that it feels notably faster than gpt-4-turbo.

This announcement post also includes examples of image output from the new model. It looks like they may have taken big steps forward in two key areas of image generation: output of text (the "Poetic typography" examples) and maintaining consistent characters across multiple prompts (the "Character design - Geary the robot" example).

The size of the vocabulary of the tokenizer - effectively the number of unique integers used to represent text - has increased to ~200,000 from ~100,000 for GPT-4 and GPT-3:5. Inputs in Gujarati use 4.4x fewer tokens, Japanese uses 1.4x fewer, Spanish uses 1.1x fewer. Previously languages other than English paid a material penalty in terms of how much text could fit into a prompt, it's good to see that effect being reduced.

Also notable: the price. OpenAI claim a 50% price reduction compared to GPT-4 Turbo. Conveniently, gpt-4o costs exactly 10x gpt-3.5: 4o is $5/million input tokens and $15/million output tokens. 3.5 is $0.50/million input tokens and $1.50/million output tokens.

(I was a little surprised not to see a price decrease there to better compete with the less expensive Claude 3 Haiku.)

The price drop is particularly notable because OpenAI are promising to make this model available to free ChatGPT users as well - the first time they've directly name their "best" model available to non-paying customers.

Tucked away right at the end of the post:

We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

I'm looking forward to learning more about these video capabilities, which were hinted at by some of the live demos in this morning's presentation.


Ben Werdmüller

Facebook news referrals: no sign of the slow-down stopping

"Aggregate Facebook traffic to a group of 792 news and media sites that have been tracked by Chartbeat since 2018 shows that referrals to the sites have plunged by 58%." I'll bang this drum forever: establish direct relationships with your audience. Do not trust social media companies to be your distribution. That means through your website. That means through email.

"Aggregate Facebook traffic to a group of 792 news and media sites that have been tracked by Chartbeat since 2018 shows that referrals to the sites have plunged by 58%."

I'll bang this drum forever: establish direct relationships with your audience. Do not trust social media companies to be your distribution.

That means through your website.

That means through email.

That means through direct social like the fediverse.

It's long past time that media learned this and internalized it forever. #Media

[Link]


The SF Bay Area Has Become The Undisputed Leader In AI Tech And Funding Dollars

"Last year, more than 50% of all global venture funding for AI-related startups went to companies headquartered in the Bay Area, Crunchbase data shows, as a cluster of talent congregates in the region." In other news, water is wet. There was a moment during the pandemic when it looked like everyone was going to work remotely and there was an opportunity for startups to

"Last year, more than 50% of all global venture funding for AI-related startups went to companies headquartered in the Bay Area, Crunchbase data shows, as a cluster of talent congregates in the region."

In other news, water is wet.

There was a moment during the pandemic when it looked like everyone was going to work remotely and there was an opportunity for startups to be founded anywhere. I think that time has gone: the San Francisco Bay Area is once again the place to found any kind of technology startup.

Yes, there are always exceptions, but the confluence of community density, living conditions, universities, and mindset make for a perfect storm. NYC and London - and maybe Boston / Cambridge - are pretty good too, for what it's worth, but the sheer volume of startup activity in the area gives San Francisco the edge.

This is something I fought earlier in my career: my first startup was proudly founded in Scotland and largely run from England. I wish we'd just moved to San Francisco.

This isn't to completely sing the praises of the city: the cost of living is now astronomical, and there's a contingent of right-wing activists that seem to want to paint it as some doom spiraling hellhole, as if its progressive past isn't something to be proud of. But there is still beauty, there is still that can-do sense of adventure, and if I was founding something new, that's probably where I'd be. #Technology

[Link]


Simon Willison

Quoting Tim Paul

I’m no developer, but I got the AI part working in about an hour. What took longer was the other stuff: identifying the problem, designing and building the UI, setting up the templating, routes and data architecture. It reminded me that, in order to capitalise on the potential of AI technologies, we need to really invest in the other stuff too, especially data infrastructure. It would be ironi

I’m no developer, but I got the AI part working in about an hour.

What took longer was the other stuff: identifying the problem, designing and building the UI, setting up the templating, routes and data architecture.

It reminded me that, in order to capitalise on the potential of AI technologies, we need to really invest in the other stuff too, especially data infrastructure.

It would be ironic, and a huge shame, if AI hype sucked all the investment out of those things.

Tim Paul


Ben Werdmüller

The Philadelphia Inquirer is here to fight

SEPTA - the South Eastern Pennsylvania Transportation Authority — trains are covered with these ads for the Philadelphia Inquirer: I’m curious to know if they actually work. They feel very negative to me: a pot-shot at the New York Times rather than an argument for why the Inquirer is great in its own right. There’s an underlying assumption here that newspaper subscriptions are zero-sum: tha

SEPTA - the South Eastern Pennsylvania Transportation Authority — trains are covered with these ads for the Philadelphia Inquirer:

I’m curious to know if they actually work. They feel very negative to me: a pot-shot at the New York Times rather than an argument for why the Inquirer is great in its own right.

There’s an underlying assumption here that newspaper subscriptions are zero-sum: that each household will only receive one. Of course, most households aren’t even that: it’s increasingly rare for anyone to subscribe to a paper newspaper. But for digital subscriptions, I’d have assumed that it would be additional: households might subscribe to both the Inquirer and the Times (as well as a few other publications; maybe the New Yorker and Philadelphia Magazine).

Is their assumption right, or is mine? I don’t know. What I do know is that the ad feels combative and what I’m left with is the conflict rather than anything about the Inquirer’s own coverage. While there is definitely some anti-New York feeling among multi-generational Philadelphians, it feels like an odd choice.


Damien Bod

Using SonarCloud with ASP.NET Core, Angular and github actions

This article demonstrates how to implement code analysis and Static Application Security Testing (SAST) using SonarCloud and GitHub Actions. The solution involves building a secure web application with ASP.NET Core for the backend and an Angular UI for the frontend, following a backend-for-frontend security architecture. Both the ASP.NET Core (C#) codebase and the Angular (TypeScript […]

This article demonstrates how to implement code analysis and Static Application Security Testing (SAST) using SonarCloud and GitHub Actions. The solution involves building a secure web application with ASP.NET Core for the backend and an Angular UI for the frontend, following a backend-for-frontend security architecture. Both the ASP.NET Core (C#) codebase and the Angular (TypeScript and JavaScript) files undergo analysis during the quality build process with SonarCloud.

Code: https://github.com/damienbod/EndToEndSecurity

Solution Setup

The application uses an Angular UI implemented with NX tools and services. During deployment builds, the UI is integrated into the ASP.NET Core backend, while in development, it operates as a standalone component. Microsoft YARP facilitates the connection between Angular development and local ASP.NET Core APIs. The Angular application is an integral part of the backend system when deployed. Both technical stacks require code analysis and Static Application Security Testing (SAST). Additionally, the solution is secured as a single OpenID Connect confidential client, utilizing the code flow with Proof Key for Code Exchange (PKCE).

SonarCloud setup

I set up SonarCloud testing and integrated it with GitHub Actions using the repository from Marc Rufer.

https://github.com/rufer7/github-sonarcloud-integration

This references the docs from SonarCloud and all the steps required for setting up a build and analysis of the different technical stacks are documented.

ASP.NET Core project setup

To enable SonarCloud to analyze both the ASP.NET Core project and the Angular projects, you’ll need to make adjustments in the .NET Core csproj file settings. Specifically, the Angular components should be added as hidden elements so that SonarCloud can properly detect and analyze them.

<ItemGroup> <!-- This is required to include ts and js files in SonarCloud analysis --> <!-- Add to the sonar cloud build: EndToEndSecurity == github repo --> <!-- /d:sonar.projectBaseDir="D:\a\EndToEndSecurity\EndToEndSecurity" /d:sonar.exclusions=**/node_modules/** --> <!-- See https://docs.sonarsource.com/sonarqube/9.8/analyzing-source-code/scanners/sonarscanner-for-dotnet/#advanced-topics --> <Content Include="..\ui\**\*.ts" Visible="false"> <CopyToOutputDirectory>Never</CopyToOutputDirectory> </Content> <Content Include="..\ui\**\*.js" Visible="false"> <CopyToOutputDirectory>Never</CopyToOutputDirectory> </Content> </ItemGroup> Quality build

The SonarCloud github action YAML file implements the quality build. Normally this would be integrated with the default build, PRs and feature branches would run this. The dotnet testing tools are added but not active. The build uses a windows-latest image and java. When testing the code of the two technical stacks, you should ignore folders like node_modules and so on. This can be excluded in the YAML file. For this to work, the SonarCloud project must match the YAML file definitions. This is well documented in the Sonar documentation.

name: SonarCloud on: push: branches: - develop - main pull_request: types: [opened, synchronize, reopened] jobs: build: name: Analyze dotnet and Augular projects runs-on: windows-latest steps: - name: Set up JDK 17 uses: actions/setup-java@v4 with: java-version: 17 distribution: 'zulu' # Alternative distribution options are available. - uses: actions/checkout@v4 with: fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis - name: Cache SonarCloud packages uses: actions/cache@v4 with: path: ~\sonar\cache key: ${{ runner.os }}-sonar restore-keys: ${{ runner.os }}-sonar - name: Cache SonarCloud scanner id: cache-sonar-scanner uses: actions/cache@v4 with: path: .\.sonar\scanner key: ${{ runner.os }}-sonar-scanner restore-keys: ${{ runner.os }}-sonar-scanner - name: Install SonarCloud scanner if: steps.cache-sonar-scanner.outputs.cache-hit != 'true' shell: powershell run: | New-Item -Path .\.sonar\scanner -ItemType Directory dotnet tool update dotnet-sonarscanner --tool-path .\.sonar\scanner - name: Install dotnet-coverage # not used as not tests exist in backend shell: powershell run: | dotnet tool install --global dotnet-coverage - name: Build and analyze env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }} shell: powershell run: | .\.sonar\scanner\dotnet-sonarscanner begin /k:"damienbod_EndToEndSecurity" /o:"damienbod" /d:sonar.token="${{ secrets.SONAR_TOKEN }}" /d:sonar.host.url="https://sonarcloud.io" /d:sonar.projectBaseDir="D:\a\EndToEndSecurity\EndToEndSecurity" /d:sonar.exclusions=**/node_modules/** dotnet build .\Bff.sln --configuration Release # dotnet-coverage collect 'dotnet test .\src\--testproj--.Tests\--testproj--.Tests.csproj' -f xml -o 'coverage.xml' .\.sonar\scanner\dotnet-sonarscanner end /d:sonar.token="${{ secrets.SONAR_TOKEN }}" Badges

Badges from SonarCloud can be added directly in the readme file of the github repository. The badges are created in SonarCloud and I switched them to the overall results and not just the last delta. By clicking the badges in the readme, you are redirected to the SonarCloud test results.

SonarCloud is a great service for code quality analysis and has a good SAST implementation with very good access into the github UI and tools. Security alerts can be directly viewed in github.

Links

https://docs.sonarsource.com/sonarcloud/getting-started/github

https://github.com/rufer7/github-sonarcloud-integration

[HOWTO] Integrate SonarCloud analysis in an Azure DevOps YAML pipeline

https://community.sonarsource.com/t/code-coverage-report-for-net-not-working-on-linux-agent/62087

https://docs.sonarsource.com/sonarcloud/advanced-setup/ci-based-analysis/sonarscanner-for-net/#analyzing-languages-other-than-c-and-vb

https://andreiepure.ro/2023/08/20/analyze-web-files-with-s4net.html


Simon Willison

GPUs Go Brrr

GPUs Go Brrr Fascinating, detailed low-level notes on how to get the most out of NVIDIA's H100 GPUs (currently selling for around $40,000 a piece) from the research team at Stanford who created FlashAttention, among other things. The swizzled memory layouts are flat-out incorrectly documented, which took considerable time for us to figure out. Via Hacker News

GPUs Go Brrr

Fascinating, detailed low-level notes on how to get the most out of NVIDIA's H100 GPUs (currently selling for around $40,000 a piece) from the research team at Stanford who created FlashAttention, among other things.

The swizzled memory layouts are flat-out incorrectly documented, which took considerable time for us to figure out.

Via Hacker News

Sunday, 12. May 2024

Wrench in the Gears

Blessings

It’s mother’s day. Some get to celebrate with loved ones, while others navigate the bittersweet reality of bereavement or estrangement. My heart is with you all. If there’s one lesson I’ve learned these past few years it is the importance of love, even across distance, even when it may appear to be unreciprocated. The system [...]

It’s mother’s day. Some get to celebrate with loved ones, while others navigate the bittersweet reality of bereavement or estrangement. My heart is with you all. If there’s one lesson I’ve learned these past few years it is the importance of love, even across distance, even when it may appear to be unreciprocated. The system seeks to harness the power of mothers’ hearts through token engineering, but our hearts are wild and unruly. Our torus fields will not be tamed. 

Today I enjoyed many blessings. I am working to move forward from a place of gratitude, so I want to acknowledge them in this short post with a few pictures from Shrine Mont, Virginia. It is a resort maintained by the Anglican Church that includes several mineral springs that were sacred to the original people’s of the Shenandoah. At some point in the past year, I looked up who founded MIT. It turns out it was a Virginia geologist William Barton Rogers. Having studied the role of water as a solvent on minerals and overseen the state geological survey in 1835, Rogers would have been familiar with these springs. His survey was made three years after Andrew Jackson put the hot springs of Arkansas under federal ownership. In addition to the springs, the area also features a subtle energy vortex that you can read about here.

This is my second visit. I made a video of my first trip in the summer of 2022. 

Things for which I am grateful today: 

Friends of like mind who offer good conversation, a soft bed, hot shower, and laundry services to a wayward traveler.

Unconditional love from an affectionate dog.

A swift recovery from a debilitating migraine.

A sky of clear blue, full of natural, puffy clouds. 

Stands of purple and white woodland phlox.

A burbling brook with a nicely-spaced stone crossing. 

A labyrinth to walk. 

An abundance of tulip poplar buds.

Varied shades of pink in myrtle flowers and lady slipper orchids.

The sweet fragrance of locust tree blossoms.

A quiet moment to enjoy the view of white wooden buildings with dark bottle green trim beyond a pond ringed by last season’s cattails and trilling red-winged blackbirds.

 


Simon Willison

Parsing PNG images in Mojo

Parsing PNG images in Mojo It’s still very early days for Mojo, the new systems programming language from Chris Lattner that imitates large portions of Python and can execute Python code directly via a compatibility layer. Ferdinand Schenck reports here on building a PNG decoding routine in Mojo, with a detailed dive into both the PNG spec and the current state of the Mojo language. Via

Parsing PNG images in Mojo

It’s still very early days for Mojo, the new systems programming language from Chris Lattner that imitates large portions of Python and can execute Python code directly via a compatibility layer.

Ferdinand Schenck reports here on building a PNG decoding routine in Mojo, with a detailed dive into both the PNG spec and the current state of the Mojo language.

Via Hacker News


Heres Tom with the Weather


Ben Werdmüller

British newspaper groups warn Apple over ad-blocking plans, FT reports

"British newspaper groups have warned Apple that any move to impose a so-called "web eraser" tool to block advertisements would put the financial sustainability of journalism at risk, the Financial Times reported on Sunday." Counterpoint: block the ads. The web is designed to be a flexible platform that can be mixed and remixed however you need. One of the points of CSS

"British newspaper groups have warned Apple that any move to impose a so-called "web eraser" tool to block advertisements would put the financial sustainability of journalism at risk, the Financial Times reported on Sunday."

Counterpoint: block the ads.

The web is designed to be a flexible platform that can be mixed and remixed however you need. One of the points of CSS was that you could have your own styles for a site and they would supersede the interface that came out of the box.

Relying on ads is a race to the bottom. There are plenty of other ways to make money and build deeper relationships with your audience - many of which don't require paywalls or any invasive technology at all.

Ad technology profiles and tracks users; slows down websites; wastes energy; obliterates the user experience; and isn't even all that profitable. It's hard to square an organization that claims to be acting in the public interest advocating for them. #Media

[Link]


Simon Willison

About ARDC (Amateur Radio Digital Communications)

About ARDC (Amateur Radio Digital Communications) In ham radio adjacent news, here's a foundation that it's worth knowing about: ARDC makes grants to projects and organizations that are experimenting with new ways to advance both amateur radio and digital communication science. In 1981 they were issued the entire 44.x.x.x block of IP addresses - 16 million in total. In 2019 they sold a qu

About ARDC (Amateur Radio Digital Communications)

In ham radio adjacent news, here's a foundation that it's worth knowing about:

ARDC makes grants to projects and organizations that are experimenting with new ways to advance both amateur radio and digital communication science.

In 1981 they were issued the entire 44.x.x.x block of IP addresses - 16 million in total. In 2019 they sold a quarter of those IPs to Amazon for about $100 million, providing them with a very healthy endowment from which they can run their grants program!


Ben Werdmüller

Abortion bans drive away young talent: New CNBC/Generation Lab survey

"The youngest generation of American workers is prepared to move away from states that pass abortion bans and to turn down job offers in states where bans are already in place, a new survey from CNBC/Generation Lab finds." This stands to reason: why would you move to a place where government wants to control what you do with your body? Whether you have a uterus or not, cari

"The youngest generation of American workers is prepared to move away from states that pass abortion bans and to turn down job offers in states where bans are already in place, a new survey from CNBC/Generation Lab finds."

This stands to reason: why would you move to a place where government wants to control what you do with your body? Whether you have a uterus or not, caring for the well-being of people who do is obvious. And all the societal overreach and Handmaid's Tale overtones affect everybody.

I'm interested to see how this affects those locations over time. Of course, there are other implications of this legislation, too: it's likely to be one of the major drivers for voters in November. #Business

[Link]


Aaron Parecki

FedCM for IndieAuth

IndieWebCamp Düsseldorf took place this weekend, and I was inspired to work on a quick hack for demo day to show off a new feature I've been working on for IndieAuth.

IndieWebCamp Düsseldorf took place this weekend, and I was inspired to work on a quick hack for demo day to show off a new feature I've been working on for IndieAuth.

Since I do actually use my website to log in to different websites on a regular basis, I am often presented with the login screen asking for my domain name, which is admittedly an annoying part of the process. I don't even like having to enter my email address when I log in to a site, and entering my domain isn't any better.

So instead, I'd like to get rid of this prompt, and let the browser handle it for you! Here's a quick video of logging in to a website using my domain with the new browser API:

So how does this work?

For the last couple of years, there has been an ongoing effort at the Federated Identity Community Group at the W3C to build a new API in browsers that can sit in the middle of login flows. It's primarily being driven by Google for their use case of letting websites show a Google login popup dialog without needing 3rd party cookies and doing so in a privacy-preserving way. There's a lot to unpack here, more than I want to go into in this blog post. You can check out Tim Cappalli's slides from the OAuth Security Workshop for a good explainer on the background and how it works.

However, there are a few experimental features that are being considered for the API to accommodate use cases beyond the "Sign in with Google" case. The one that's particularly interesting to the IndieAuth use case is the IdP Registration API. This API allows any website to register itself as an identity provider that can appear in the account chooser popup, so that a relying party website doesn't have to list out all the IdPs it supports, it can just say it supports "any" IdP. This maps to how IndieAuth is already used today, where a website can accept any user's IndieAuth server without any prior relationship with the user. For more background, check out my previous blog post "OAuth for the Open Web".

So now, with the IdP Registration API in FedCM, your website can tell your browser that it is an IdP, then when a website wants to log you in, it asks your browser to prompt you. You choose your account from the list, the negotiation happens behind the scenes, and you're logged in!

One of the nice things about combining FedCM with IndieAuth is it lends itself nicely to running the FedCM IdP as a separate service from your actual website. I could run an IndieAuth IdP service that you could sign up for and link your website to. Since your identity is your website, your website would be the thing ultimately sent to the relying party that you're signing in to, even though it was brokered through the IdP service. Ultimately this means much faster adoption is possible, since all it takes to turn your website into a FedCM-supported site is adding a single <link> tag to your home page.

So if this sounds interesting to you, leave a comment below! The IdP registration API is currently an early experiment, and Google needs to see actual interest in it in order to keep it around! In particular, they are looking for Relying Parties who would be interested in actually using this to log users in. I am planning on launching this on webmention.io as an experiment. If you have a website where users can sign in with IndieAuth, feel free to get in touch and I'd be happy to help you set up FedCM support as well!


Kent Bull

KERI Internals Part 1: Concurrency and Async Programming with the HIO Async Framework and I/O Library

Welcome to the inner workings of the Python implementation of KERI! HIO stands for Hierarchical IO. Disclaimer: this post is for a technical audience who have a need to read and understand the WebOfTrust Python implementation of the KERI, ACDC, and CESR Trust over IP (ToIP) specifications. Have you ever […]

Welcome to the inner workings of the Python implementation of KERI! HIO stands for Hierarchical IO.

Disclaimer: this post is for a technical audience who have a need to read and understand the WebOfTrust Python implementation of the KERI, ACDC, and CESR Trust over IP (ToIP) specifications.

Have you ever wanted to contribute to the KERI ecosystem and been curious about the way the Python implementations are written? Or have you wanted to build on top of or modify the Python KERI codebase?

Not to worry, this technical series on KERI internals has your back. You will be ready to read through, understand, and build on top of the Python KERI code base once you understand the HIO async runtime, the focus of this article.

You are wanted as a contributor to the KERI ecosystem! The goal of this article is to assist you in becoming either a contributor to the Python implementation of KERI & ACDC or an advanced user of said implementation.

HIO Introduction

HIO is an asynchronous runtime and input/output (IO) framework written by Dr. Samuel Smith that supports cooperative multitasking. It is used throughout the Python implementation of the KERI suite of protocols.

This article serves as an introduction to the three primary classes composing the basis for HIO’s asynchronous runtime and as the lifecycle context functions for the main task class, the Doer. Additionally, you will have an idea of how these concepts relate to similar concepts in Python’s AsyncIO runtime. The three HIO classes include:

the Doist, the root scheduler, the DoDoer, the heirarchical container of Doer and DoDoer instances Doer, the core task concept in HIO.

Due to its nature as the asynchronous runtime engine, HIO is found at the heart of the core Python libraries in the WebOfTrust ecosystem including the core library KERIpy, the agent server KERIA, and the SignifyPy client companion to KERIA.

In order to understand the purpose of three classes mentioned above and how they compare to Python’s AsyncIO it is important to clarify terminology around concurrent and asynchronous programming in Python. As Python’s async/await is much more common and familiar than HIO this article starts there to introduce the concepts.

Why is HIO used in KERIpy, KERIA, and SignifyPy?

Performance, control, and features, at a high level, are the reason why HIO was used for KERIpy. HIO’s use of what are called “classic coroutines” and asynchronous buffers for I/O provide a level of control and performance that is difficult to achieve with Python’s AsyncIO implementation. An API into the timing system used for the event loop and scheduler provide tight, deterministic control over scheduling order of tasks.

A future article will go deeper than this short overview into the argument for using HIO and what specifically sets it apart from other async frameworks like AsyncIO, Curio, and Trio.

Async Framework Short Comparison

An asynchronous framework typically consists of a number of major abstractions including an event loop, task or coroutine, scheduler, queues for communicating between tasks, futures, callbacks, non-blocking I/O, synchronization primitives (locks, semaphores), timeouts and cancellation, and some notion of lifecycle for tasks. This article focuses specifically on the event loop, scheduler, and task abstractions in HIO and Python’s AsyncIO.

Cooperative Multitiasking

Both HIO and AsyncIO allow you to accomplish what is called “cooperative multitasking” which is where each coroutine yields control to a central scheduler so that other coroutines can be activated for their next execution. In AsyncIO the scheduler is the asyncio event loop and a coroutine is any function declared with the async def syntax. In HIO the scheduler is the Doist class and the coroutine is the Doer class.

Concurrency and parallelism in Python

When discussing concurrency or asynchronous programming it is important to distinguish between what is typically meant by concurrency and parallelism.

Concurrency is about dealing with lots of things at once.

Parallelism is about doing lots of things at once.

Not the same, but related.
One is about structure, one is about execution.

Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

— Rob Pike, co-inventor of the Go language

Parallelism is a special case of concurrency. In Python threading, multiprocessing, and asyncio are the core packages for concurrent programming. In this post we only address the asyncio package, which supports what are called native coroutines.

Python’s AsyncIO package Native coroutines – async/await

A native coroutine is any function that, as mentioned earlier, uses the async def syntax to define a function, introduced with PEP-492 in Python 3.5 (2015). Calling an async def function does not automatically execute the code in the function. To execute the code the await keyword must be used when calling the function. This instructs the asyncio event loop to schedule execution of the function.

import asyncio # Native coroutine - uses the "async def" syntax to define a function async def print_after(delay, what): await asyncio.sleep(delay) print(what) # An asyncio-compatible main function async def main(): print(f"started at {time.strftime('%X')}") await print_after(1, 'hello') await print_after(2, 'world') print(f"finished at {time.strftime('%X')}") # The asyncio task scheduler - uses the default asyncio event loop asyncio.run(main())

In Python the asyncio package provides the run function where you can run the default event loop and task scheduler with asyncio.run(my_main_coroutine()).

The image below illustrates how coroutines, the core task concept in asyncio, are run in the AsyncIO framework.

You have your program, the Python process, that sends tasks to the asyncio event loop with either an explicit call to asyncio.create_task() or use the await keyword to schedule a task in the asyncio event loop and wait for its completion within the body of the function that used the await keyword.

AsyncIO can be difficult to use correctly though it is usually easy to recognize due to most library authors targeting asyncio mark their async functions with async def. There is also the @types.coroutine annotation to make an existing generator function compatible with native coroutines. In order to use AsyncIO and get the performance benefits of using asyncio your whole program has to embrace the AsyncIO paradigm, meaning you use async def native coroutines for anything that does I/O or long-running tasks and you use await to schedule all coroutines.

Summary of AsyncIO

AsyncIO has a few main concepts for the async runtime, the asyncio event loop and an async def function as a coroutine. These basic concepts map nicely onto the HIO concepts of a Doist, the root scheduler in HIO, and the Doer, the coroutine or task concept in HIO. The main point where AsyncIO and HIO differ are that HIO has an explicit construct for creating hierarchies of tasks, the DoDoer. While there is no explicit construct in AsyncIO for a DoDoer any async/await coroutine could be considered to be a container for other coroutines.

Combining AsyncIO and HIO

Though asyncio native coroutines are not used at all in HIO the two approaches are compatible and composable. You can schedule AsyncIO tasks from a HIO task (a Doer) and you can also schedule a HIO task from an asyncio coroutine.

Yet first we must describe what HIO is. The subject of combining AsyncIO and HIO will be covered in a future article. This article is a short introduction to the three main classes of HIO’s async runtime implementation.

What is HIO?

HIO stands for Hierarchical IO. The README [1] describes it as weightless, hierarchical asynchronous coroutines and I/O in Python. This means that the task abstractions in HIO allow for nesting subtasks within tasks. HIO has three primary classes that make up its asynchronous runtime:
1. the Doist, or the root scheduler,
2. the DoDoer, a container holding either other DoDoer instances or Doer instances allowing you to create task hierarchies,
3. the Doer, the basic task or coroutine construct in HIO.

HIO makes heavy use of what are now known as “classic coroutines“[2] where the scheduler uses the my_coro.send(data) function to send data into a Python generator function. This generator function is the classic coroutine.

A few keywords distinguish classical coroutines including:

yield: used to pause execution of a coroutine (generator function), send a value out of a coroutine, and receive a value into a coroutine. yield from: used when nesting generators (inner generators) to pause execution of the outer generator and pass, or delegate, conrol to a sub-generator. Once the sub-generator completes then control is passed back to the outer generator. The yield from keyword is very similar to the await keyword from AsyncIO. Both drive sub-generators and both allow consumption of values returned by sub-generators. await does not completely replace yield from because await must be used inside a native coroutine and must be used with an awaitable object. yield from can be used in any function and with any iterable.

The yield keyword used in the body of a Python generator function allows it to receive values from the my_coro.send() function, similar to how Erlang/Elixir use the OTP to pass messages between processes with send and receive. The Python my_coro.send(data) is the “send” and the myvar = yield from invocation is the “receive.” And the yield from keyword used in the body of a classic coroutine allows delegating to, or transferring execution to, a nested or sub-generator.

This classic coroutine approach HIO uses is grounded in structured concurrency where there are clear entry and exit points to tasks, errors in concurrently executing tasks propagate up the task chain, and clear expression of control flow within the structure of source code despite the presence of concurrency. The context methods of a HIO Doer task provide the clear entry and exit points as well as a clear exception handling mechanism.

Overview

The root scheduler, the Doist, processes an array of Doer and DoDoer tasks. The DoDoer is the hierarchical task concept, and the Doer is the core task concept as shown below in the diagram.

Your program, the Python process, runs the Doist and the Doist runs the list of tasks until they finish or the program is terminated.

# from github.com/WebOfTrust/keripy/src/keri/app/cli/directing.py # module: keri.app.cli.directing # receives a list of tasks for the scheduler to run def runController(doers, expire=0.0): """ Utility Function to create doist to run doers """ tock = 0.03125 # creates the Doist, the root scheduler doist = doing.Doist(limit=expire, tock=tock, real=True) # adds tasks to the Doist to run. Calling "do" runs the Doist doist.do(doers=doers)

Here is a code example of creating an array of doers to pass to the root scheduler, the Doist, from KERIpy. This runWitness function shows the set of tasks that must be created in order to run a KERIpy witness.

# from github.com/WebOfTrust/keripy/src/keri/app/cli/commands/witness/start.py # module: keri.app.cli.commands.witness # Function used by the CLI to run a single basic witness def runWitness(name="witness", base="", alias="witness", bran="", tcp=5631, http=5632, expire=0.0): """ Setup and run one witness """ ks = keeping.Keeper(name=name, base=base, temp=False, reopen=True) aeid = ks.gbls.get('aeid') if aeid is None: hby = habbing.Habery(name=name, base=base, bran=bran) else: hby = existing.setupHby(name=name, base=base, bran=bran) hbyDoer = habbing.HaberyDoer(habery=hby) # setup doer doers = [hbyDoer] # list of tasks # extends the task list with the tasks from indirecting.setupWitness doers.extend(indirecting.setupWitness(alias=alias, hby=hby, tcpPort=tcp, httpPort=http)) # calls the Doist root scheduler with a list of tasks directing.runController(doers=doers, expire=expire)

This function creates a few tasks to be run and hands them off to the Doist scheduler with directing.runController. The scheduler then runs the tasks to completion, or infinitely, depending on the contents of the recur function shown below in the Doer.

HIO Task – a Doer

The core task concept in HIO is expressed as the Doer class shown in the UML diagram below. The HIO scheduler, a Doist, runs the Doer task until the .done attribute becomes True. There are six context functions five of which are executed over the lifecycle of the task including enter, recur, clean, close, and exit. The abort function is only called when a task is cancelled or an exception is raised.

HIO Scheduler – the Doist

At the top of the execution hierarchy in the HIO library you find the Doist class, the root scheduler of all task instances, or Doer instances. The generator returned from invoking a Doer is called a “deed” and is handed over to the Doist function. The Doist shown below has a list of deeds that are these generator functions, classic coroutines, that it runs when the Doist is executed.

To run a Doist you invoke the .do function on the Doist as shown below in a test adapted from HIO.

def test_doist_doers(): """ Test doist.do with .close of deeds """ tock = 0.03125 doist = doing.Doist(tock=tock) # creates a Doer, an example doer doer0 = doing.ExDoer(tock=tock, tymth=doist.tymen()) # creates a Doer, an example doer doer2 = doing.ExDoer(tock=tock, tymth=doist.tymen()) doers = [doer0, doer1] doist.do(doers=doers) # run the Doist assert doer0.done == True Context Functions

The six context functions in the Doer are run by the enter and exit functions of the Doist as well as the do function of the Doer. Each of these functions serve as a lifecycle hook for a different time in the execution of the Doer. The .do function reproduced below shows where each context function is executed after calling Doer.do. Take special notice of the while loop inside of the try/except block. This is the loop that continues to run the body of the Doer, the function or generator that does the work of the Doer.

# from github.com/ioflo/hio/src/hio/base.doing.py class Doer(tyming.Tymee): ... def do(self, tymth, *, tock=0.0, **opts): """ Generator method to run this doer. Calling this method returns generator. Interface matches generator function for compatibility. To customize create subclasses and override the lifecycle methods: .enter, .recur, .exit, .close, .abort Parameters: tymth is injected function wrapper closure returned by .tymen() of Tymist instance. Calling tymth() returns associated Tymist .tyme. tock is injected initial tock value args is dict of injected optional additional parameters """ try: # enter context self.wind(tymth) # update tymist dependencies self.tock = tock # set tock to parameter self.done = False # allows enter to override completion state self.enter() # (1) first context function, enter #recur context if isgeneratorfunction(self.recur): # .recur is generator method self.done = yield from self.recur() # (2) recur context delegated, second context function else: # .recur is standard method so iterate in while loop while (not self.done): # recur context tyme = (yield (self.tock)) # yields .tock then waits for next send self.done = self.recur(tyme=tyme) # (2) second context function, recur except GeneratorExit: # close context, forced exit due to .close self.close() # (3) third context function, close except Exception as ex: # abort context, forced exit due to uncaught exception self.abort(ex=ex) # (4) fourth context function, abort raise else: # clean context self.clean() # (5) fifth context function, clean finally: # exit context, exit, unforced if normal exit of try, forced otherwise self.exit() # (6) sixth context function, exit # return value of yield from or StopIteration.value indicates completion return self.done # Only returns done state if normal return not close or abort raise

In the normal execution of a Doer the .do() function calls, in this order, enter, recur, clean, and then exit. The close context function is only executed when it is explicitly called by some higher level construct such as a DoDoer or the Doist scheduler itself.

In an error case, or abnormal execution of a Doer, the abort context function is called. This can also be called as a part of normal execution of a program to catch a shutdown signal to instruct a DoDoer or a Doist to perform a graceful shutdown.

HIO DoDoer – where task hierarchies are defined

This post touches lightly on DoDoers to say that the DoDoer provides hierarchical task management which means you can nest tasks for a clear hierarchy of task execution for groups of tasks. A future article will detail the definition and usage of the DoDoer.

AsyncIO vs HIO – How do they compare?

Classic coroutines are very powerful constructs that provide a richer control flow construct as compared to AsyncIO’s async def coroutine construc. This is because you can use any number of yield or yield from statements in the body of a classic coroutine, which provides you with the ability to custom-fit the execution of a generator-based coroutine to your specific use case. The async/await syntax does a similar thing for you, yet with a standard syntax that you cannot customize.

With HIO you can also repeatedly accept information into a classic coroutine instance through the yield from syntax. The fact that classic coroutines are just generator functions means you have full control over iteration of that generator, and all of it’s contained state including any state it has closed over, from an async context with all the power of Python iterators.

For example, you could run a classic coroutine any arbitrary number of times within a custom scheduler depending on special rules and have fine-grained access to what is sent into the coroutine with the .send() function.

Yet with this additional power comes the potential to have complicated and hard to understand control flow. It is understandable why there would be so much support in the Python community for a simpler, less powerful syntax, which is what async/await is. The linked article[2] from Luciano Ramalho goes in depth on the features of both classic coroutines and Python’s AsyncIO.

Wrap up and Next Steps

This article focused on the “what” of the async framework side of HIO, specifically the three primary classes at the core of the async runtime in HIO, the Doist scheduler, DoDoer hierarchical task container, and the Doer task class. The raw power of classic coroutines significantly influenced the decision to use them in HIO as well as in KERIpy, KERIA, and SignifyPy. Yet, this is not an either-or, all-or-nothing situation. You can use HIO and AsyncIO together.

Major topics not covered in this article that are important to understand HIO include details of the DoDoer and the network and file I/O capabilities of the HIO package.

Future articles will delve deeper into the “why” of HIO, the rationale behind HIO, how and when to use it properly, as well as how to use HIO and AsyncIO together. To gain a deeper understanding of HIO one of your next steps would be to read some of the tests in the HIO source code repository, specifically the test_doist_once and test_nested_doers tests.

References

[1] S. Smith, “hio/README.md at main · ioflo/hio,” GitHub, Aug. 21, 2021. https://github.com/ioflo/hio/blob/main/README.md (accessed May 09, 2024).

[2] L. Ramalho, “Classic Coroutines,” Fluent Python, the lizard book, Apr. 2022. https://www.fluentpython.com/extra/classic-coroutines (accessed May 11, 2024).

[3] Real Python, “Async IO in python: A complete walkthrough,” Real Python, https://realpython.com/async-io-python (accessed May 9, 2024).

Saturday, 11. May 2024

Ben Werdmüller

The itch

I’m really itching to build something new again. Not a new widget or open source project, but a new service. Something that makes peoples’ lives better. I love startups. And the ideas are brewing.

I’m really itching to build something new again.

Not a new widget or open source project, but a new service. Something that makes peoples’ lives better.

I love startups. And the ideas are brewing.


Werdmüller on Medium

Think twice before exercising your stock options

Startup options and when they’re worth it Continue reading on Medium »

Startup options and when they’re worth it

Continue reading on Medium »


Ben Werdmüller

Think twice before exercising your stock options

I recently wrote a short aside about stock options: But in general, for regular employees, I think options are rarely worth it. They typically require an up-front investment that many employees simply can’t make, so it’s a bit of a fake benefit to begin with, and their future value is little more certain than a lottery ticket. Hunter Walk kindly reshared it on a few networks with some

I recently wrote a short aside about stock options:

But in general, for regular employees, I think options are rarely worth it. They typically require an up-front investment that many employees simply can’t make, so it’s a bit of a fake benefit to begin with, and their future value is little more certain than a lottery ticket.

Hunter Walk kindly reshared it on a few networks with some of his own thoughts; a conversation with Tony Stubblebine arose in the comments that Hunter wrote up as its own post. In particular, he says it helped him articulate the ups and downs of private stock to the average person:

For much of a startup’s life new FUNDING VALUATIONS are LEADING indications of POTENTIAL. They are what someone is willing to pay for shares today based on what they believe the company CAN DO in the FUTURE.

DOWN ROUNDS and RECAPS are LAGGING indications of PERFORMANCE. They are what someone is willing to pay for shares today based upon what the company HAS DONE in the PAST.

It’s a great post, and the comments from Tony were thoughtful. Which led me to feeling a bit bad about how flippant and imprecise my original post had been.

So, on that note, I’d love to define options, make some corrections, and dive a little deeper into my core argument.

The ins and outs of options

First, let’s define options and explain why they’re so common as a factor of startup compensation.

An option is the right to buy a specified number of shares in a company at a specific price. That price is typically defined by an external auditor. It’s good practice for this to happen once a year, but it’ll also be triggered when the company raises a round of equity funding (i.e., when it sells shares to outside investors in order to raise significant capital).

If a startup were simply to grant stock directly to employees, it would be taxable as compensation. Options are almost always non-taxable at the point where they are issued, so they’re a favorite way to give employees the ability to see some of the potential upside in a venture.

Typically in a startup you’ll receive an option grant as part of your compensation package. So, for example, you might receive the right to buy (“exercise”) 40,000 shares at 50 cents a share (the “strike price”). This is almost always on what’s called a vesting schedule: you won’t be able to buy any shares in the first year, but then when you cross that threshold (the “cliff”), you’ll be able to buy 25% of your allocation (the first 10,000 shares in my example). Over the next three years, the amount of your allocation that you can exercise will increase proportionally, until you can buy them all at the end of four years.

If you leave the company, you usually only have 90 days to exercise whichever options have vested. Some particularly progressive companies extend that exercise window — sometimes to a couple of years. But for 80-90% of startups, it’s 90 days.

If the startup is excited about keeping you, you may find that they’ll grant you more options periodically, each with their own vesting schedules. This, they hope, will keep you at the company.

In my example above, you might have done the math to realize: 40,000 shares at 50 cents a share is $20,000. You would need to lay out that amount of money to acquire the shares — and you need to hope that the company’s shares increase in value in order to see any upside.

If the company’s share price has increased in the time between the options were granted and when the employee exercises them, the difference is taxable. In the above example, recall that my options are for 40,000 shares at 50 cents a share. Let’s say I choose to exercise them all at the end of my four year vesting period: as we’ve discussed, I pay $20,000. But let’s say that the real fair market value has risen to 75 cents a share. The difference between 40,000 shares at 50 cents and 40,000 shares at the market value of 75 cents is $10,000 is usually taxed as income. So I’m actually paying $20K + income tax on another $10K. (This isn’t by any means the full extent of potential tax implications; I’m not going to touch ISOs and AMT in this post, for example.)

Early employees, who join before most funding rounds have taken place, will receive options with a very low exercise price. Later employees will usually receive options with a higher price, because more growth and fundraising has taken place in the interim. (Down rounds and recaps are certainly possible, though: many startups go through tough times where their valuation decreases. Not every graph always goes up and to the right.)

In both cases, any stock they buy is largely illiquid. Because the startup is likely a private company rather than a publicly traded one, their shares are not liquid. They will need to wait for the company to go public or hope that management will allow them to trade their stock on the secondary market.

Some corrections

So the first thing to say is: no, options are not really like a lottery ticket. They are a sort of gamble, but it’s one where (depending on your position, seniority, and what size the company was when you joined) you have a say in the outcome.

The second, which I’ve already corrected in the original post is: as Hunter pointed out in his post, a recap is not the thing that actually lowers the stock price. It’s a trailing signal of what the company has already done. A change in stock price is an effect of what has already happened.

And a clarification: options don’t require an up-front investment at the time that they’re granted. You invest at the time when you exercise them, which may still be as a lump sum.

Why I think exercising options isn’t worth it for many employees

If you’re on a rocket ship startup, exercising your options is almost certainly worth it (depending on the strike price of your particular options grant). The problem is: how do you know you’re on a rocket ship? Or, given that most startup employees won’t be part of a startup with hockey-stick growth, how can you be reasonably sure that your company will grow in such a way that exercising your options is worth it?

90% of startups fail. That doesn’t mean that every startup has an equal 1 in 10 chance of success: a lot depends on a range of factors that include internal culture, management expertise, execution quality, and market conditions. Still, there is not a small amount of luck involved. Most startups won’t make it.

You should never make an investment that you can’t afford to lose. As Hunter says in his post:

Don’t behave as if they’re worth anything until they actually are

Don’t over-extend yourself to exercise [options] in scenarios which put your financial well-being at risk.

If you’re obviously, unquestionably on a rocket ship: by all means, buy the options. (Yes, sometimes it really is obvious.)

If it’s not clear that you’re on a rocket ship, but you’re feeling good about the startup, and you can definitely afford to spend the money it would take to exercise your options: knock yourself out. Honestly, I don’t really care what people with wealth do in this scenario. My worries do not relate to you.

If it’s not clear that you’re on a rocket ship and spending the money to exercise your options would be a stretch: I would suggest you think twice before doing so. I also would warn you to never take out debt (which many startup employees do!) in order to exercise your options.

And that’s really the crux of my argument.

Startup employees without significant independent spending power who work for a venture with an uncertain future and who did not join their ventures at a very early stage — which I would argue describes most startup employees — should think long and hard before exercising their options.

It’s more than a little bit unfair that the people who can most easily realize upside from the startups they work for are people who already have wealth. Granting the ability for employees to buy shares directly at their fair market value is limited, too: this would make them investors, who the SEC says mostly need to be accredited. The definition of accreditation is either being a licensed investor, earning over $200,000 a year for the last two years, or having a net worth of over a million dollars excluding the value of their home. So the door is effectively closed to people from regular backgrounds.

I wish more equitable systems were commonly in use. Some different tactics are in use, which include:

Restricted Stock Units. Here, stock is granted directly as part of an employee’s compensation. Upside: the employee has the shares. Downside: they’re taxed on them as soon as they vest, and selling them is restricted. So the employee effectively receives an additional tax bill with no way of recouping the lost funds until much later (if they’re lucky). RSUs are common in later-stage companies but very uncommon in riskier, early-stage companies for this reason. Phantom stock. Really this is a bonus plan tied to stock performance, income tax and all. Profit sharing. Which is only useful if the startup makes a profit (most don’t).

While some have value in their own right in particular contexts, I see them as compensation strategies that might sit alongside stock options, rather than replacing them.

I would love it to be less risky for the employees who are actually doing the work of making a startup valuable to see more of the upside of that work. But, at least for now, my advice remains to take those inflated Silicon Valley salaries and bank them in more traditional investments.


Timothy Ruff

KERI 101: Witnesses, Watchers, & Detectability

The “oil lights” of cryptographic key management. TL;DR KERI is an open, decentralized protocol for discovery, recovery, and management of cryptographic keys. KERI enables anyone to digitally sign digital things and to verify what others have signed, without the need for shared networks, blockchains, registries, platforms, or identity providers. KERI solves seven hard problems of key managemen
The “oil lights” of cryptographic key management. TL;DR KERI is an open, decentralized protocol for discovery, recovery, and management of cryptographic keys. KERI enables anyone to digitally sign digital things and to verify what others have signed, without the need for shared networks, blockchains, registries, platforms, or identity providers. KERI solves seven hard problems of key management: Rotation, Recovery, Detectability, Discovery, Delegability, Revocability, and Multi-Signature. KERI has three Detectability functions — Witnesses, Watchers, and Anchored Issuances — that are like the oil light of a car: they detect key compromise and malicious behaviors of Signers and service providers.

With KERI Detectability functions…

Signers can detect:

Their own compromised keys when used by hackers for signing; Their own compromised keys when used by hackers to rotate, delegate, or revoke their signing keys.

And Verifiers can detect:

Compromised Signers; Malicious Signers; Their own malicious or compromised verification service provider.

Without KERI Anchored Issuances in particular, if a Signer suspects key compromise and performs a rotation, it cannot determine which issuances were made fraudulently, and must revoke and reissue all previously issued credentials. (!)

With KERI Anchored Issuances, fraudulently issued credentials will not successfully verify before or after key rotations, whereas authentically issued credentials will continue to successfully verify, even after an unlimited number of key rotations.

What is KERI?

KERI — Key Event Receipt Infrastructure — is a new (2019), open, decentralized protocol for discovering, recovering, and managing cryptographic keys. KERI enables people, organizations, and things to sign digital things and verify what others have signed, without the need for shared networks, blockchains, registries, platforms, or identity providers.

With KERI, Signers and Verifiers worldwide maintain their own infrastructure without anything in common, but they do share one game-changing thing: a protocol for discovering the Signer’s current key state. This enables Verifiers to securely confirm that a piece of data has been signed by a particular Signer, and that it hasn’t been tampered with, revoked, or expired.

KERI is “end verifiable”; the means through which data travels needn’t be secure for it to still be verifiably authentic when it arrives at its destination. KERI is zero trust, enabling objective verifiability without subjective trust, and it creates no centralized troves of valuable data. KERI is quantum-resistant and uses existing cryptography and cloud infrastructure.

In short, KERI brings the ability to distinguish fake from real in the digital realm, across networks, borders, and boundaries of all kinds.

Seven Hard Problems of Key Management

KERI address seven hard problems of key management, listed below. Most were previously not solvable in a decentralized manner, and some not in a centralized one.

Seven hard problems (with the KERI solution):

Rotation (change current keys for new, without re-issuing past issuances) Recovery (from private key loss or compromise, without re-issuing past issuances) Detectability (of key compromise or malicious Signer behavior) Discovery (just-in-time discovery of key state) Delegability (directly from one Signer/key controller to another) Revocability (decentralized, instant, with privacy) Multi-signature (weighted m-of-n, verifiable) Detectability: Witnesses, Watchers, & Anchored Issuances Detectability Is the ‘Oil Light’ of Key Management

Together, KERI Witnesses, Watchers, and Anchored Issuances, all described below, enable Detectability. Detectability is like the oil light of a car: ignore it, disconnect it, or omit it at your peril, because it is the most critical indicator on your dashboard.

Removing Witnesses, Watchers, and Anchored Issuances from a system removes Detectability.

Without Detectability, Signers cannot detect:

Their own compromised keys when used by hackers for signing; Their own compromised keys when used by hackers to rotate, delegate, or revoke the Signer’s keys.

And Verifiers cannot detect:

A Signer’s compromised keys; Malicious Signers; Their own malicious or compromised verification service provider. What Are Witnesses?

KERI Witnesses are secure, distributed replicas of a Signer’s key event log. They are established (built or bought) and controlled by a Signer (also a “key controller” or “issuer”). A Signer can be anyone or anything, anywhere.

Witnesses make separate, additional replicated points of verification available to Verifiers, enabling Verifiers to detect both compromised and malicious behaviors by Signers. Witnesses can be web servers, distributed databases, blockchains, or any other addressable data source.

To compromise a KERI identifier, each of its Witnesses must also be compromised.

There are no limits or requirements as to whether a Signer uses Witnesses at all or how many; the KERI protocol simply enables any number of Witnesses to be established, deployed however the Signer chooses, and for Verifiers to verify that all Witnesses are acting in unanimity about the Signer’s key state.

What Are Watchers?

KERI Watchers are services established (built or bought) by a Verifier — who can be anyone or anything anywhere — that enable the Verifier to confidently rely on what their own service providers are telling them about a Signer’s key state.

Watchers help Verifiers avoid the single-oracle problem: if you have only one source of truth, you cannot know if that source has been compromised or is acting maliciously (deliberately lying to you). When multiple distinct Watchers all report the same key state of a Signer, it can be more safely relied upon. Having Watchers removes a tempting single point of vulnerability for hackers.

As with Witnesses, there are no limits or requirements as to whether Watchers are used at all or how many; the KERI protocol simply enables any number of Watchers to be established and deployed however desired.

Anchored Issuances: A Breakthrough in Detecting and Preventing the Use of Stolen Keys

Another critical element of Detectability, separate from Witnesses and Watchers, is “Anchored Issuances” (referred to as “seals” in the KERI white paper).

When issued credentials are properly anchored in one’s Key Event Log (KEL) and properly witnessed and watched, it becomes impossible for a hacker of your private keys to use them without your detection; the unauthorized issuances will not successfully verify. With un-anchored / “unbound” issuances, hackers can use stolen private keys as they please with zero detectability and, unfortunately, every issuance they make will successfully verify.

Akin to a check register, Anchored Issuances provide a secure record of issued credentials that can be instantly and cryptographically confirmed by Verifiers. No other system we are aware of provides detectability or prevention of the use of stolen private keys.

No Detectability? Rotate, Revoke, and Re-Issue Everything. (Really)

Some alternative key management systems borrow KERI’s primary functions — pre-rotation, SCIDS, etc. — but forgo its Detectability functions. For them and for all non-KERI key management systems, a big future headache awaits: if a Signer suspects key compromise and performs a rotation, there is no way to distinguish valid issuances from invalid ones — they will all continue to verify successfully — so a Signer must revoke and reissue all previously issued credentials. (!)

That is, if a Signer wishes to maintain a positive reputation for its credentials. Revoking and reissuing everything would likely be a user experience nightmare, which alone defeats the purpose of having pre-rotation. Pre-rotation is arguably KERI’s greatest innovation, but does not achieve its full potential without Detectability.

Can’t We Just Rotate More Often?

Yes, but it won’t help much. KERI pre-rotation can be done repeatedly without downstream costs or consequences for KERI identifiers, a breathtaking breakthrough in key management. But… while a Signer’s more-frequent rotation might attempt to repair one vulnerability — compromise of the Signer’s keys — it does not address other vulnerabilities listed in the “KERI Detectability” section, and it still leaves the ugly problem of revoking and re-issuing everything if a compromise is suspected. In other words, removing Detectability leaves significant holes in the security model no matter how often a Signer rotates their keys.

Such a tradeoff may be acceptable for less critical use cases or at smaller scale, but is likely unacceptable for most organizations.

In Conclusion

KERI Witnesses, Watchers, and Anchored Issuances are the ‘oil lights’ of cryptographic key management, enabling robust detectability of malicious or compromised digital signing behaviors for both Signers and Verifiers. KERI Anchored Issuances goes beyond detection to full-on prevention, actively blocking the use of stolen private keys for credential signing.

KERI Detectability brings critical advances in key management specifically and cybersecurity generally, and should be seriously considered for inclusion within any comprehensive cybersecurity, identity, or key management system.

Friday, 10. May 2024

Just a Theory

PGXN Language Poll Result

Results of the Postgres community poll for building PGXN v2 in Go, Rust, or both.

Back on March 28, I asked the Postgres community whether new services for PGXN v2 should be written in Go, Rust, or “some of each”. I went so far as to create a poll, which ran through April 12. A month later you might reasonably be wondering what became of it. Has David been refusing to face reality and accept the results?

The answer is “no”. Or at least I don’t think so. Who among us really knows ourselves. Since it closed, the poll has provided the results since it closed, but I suspect few have looked. So here they are:

Candidate Votes % All Votes 🦀 Rust 102 60.4% 🐿️ Go 53 31.4% 🐿️ + 🦀 Some of each 13 7.7%

🦀 Rust is the clear winner.

I don’t know whether some Rust brigade descended upon the poll, but the truth is that the outcome was blindingly apparent within a day of posting the poll. So much so that I decided to get ahead of things and try writing a pgrx extension. I released jsonschema on PGXN on April 30. Turned out to be kind of fun, and the pgrx developers kindly answered all my questions and even made a new release to simplify integration testing, now included in the pgrx-build-test utility in the pgxn-tools Docker image.

But I digress. As a result of this poll and chatting with various holders of stakes at work and haunting the #extensions Slack channel, I plan to use Rust for all new PGXN projects — unless there is an overwhelmingly compelling reason to use something else for a specific use case.

Want to help? Rustaceans welcome! Check out the project plan plan or join us in the #extensions channel on the Postgres Slack.

More about… Postgres PGXN Go Rust Perl

Doc Searls Weblog

Personal vs. Personalized AI

There is a war going on. Humanity and nature are on one side and Big Tech is on the other. The two sides are not opposed. They are orthogonal. The human side is horizontal and the Big Tech side is vertical.* The human side is personal, social, self-governed, heterarchical, open, and grounded in the physical […]

There is a war going on. Humanity and nature are on one side and Big Tech is on the other. The two sides are not opposed. They are orthogonal. The human side is horizontal and the Big Tech side is vertical.*

The human side is personal, social, self-governed, heterarchical, open, and grounded in the physical world. Its model is nature, and the cooperative contexts in which competition, creation, and destruction happen in the natural world.

The Big Tech side is corporate, industrial, hierarchical, competitive, mechanistic, extractive, and closed, even though it produces many positive-sum products and services that are good for people and good for nature. It is also, being competitive and rewarding toward winner-take-most outcomes, dominated by giants.

This war has been fought over many other things in the past, especially in tech. But AI is the big one right now—and perhaps the biggest one of all time.

Over the long run, both sides will win, because we need the best of what both bring to the world’s big round table. In the past, this has happened in countless markets, countries, polities, societies, and other contexts. In tech it happened with the conflict between PCs and mainframes, between the open and public Internet and closed private networks, and between open operating systems and closed ones. None of those conflicts have final outcomes, but they have arrived at stable and cooperative states. They arrived at those states because everyone recognized that people can do more with their tools than those big entities can do on their own, or by using people as extracted resources.

This conflict is perhaps best represented by the ironies that have been at work within OpenAI and Sam Altman, its prior and current CEO. We can also see it with sparkling clarity in something amazing that Sam said near the close of his keynote at the first OpenAI Developer Conference, in November of last year:

Lets unpack that.

Individual empowerment means exactly what it says. Agency is the ability to take action or to choose what action to take (Cambridge Dictionary). Scale means growth toward ubiquity.

You have individual empowerment and agency when you ride your bike, drive your car, mow your lawn, or button your shirt. You also have it when you work on your laptop, phone, and other smart rectangles. Scale means your power and agency will grow to the max while the same will happen for nearly everyone else. Scale for you alone won’t cut it. When most of us have it, that’s scale.

Technologies extend us. They enlarge our capacities in the world. Intelligence is one of those capacities. So is memory. Your rectangles help a lot with both. That’s why those have already scaled to ubiquity, or close enough.

AI will do the same, but only if it’s personal. Not if it’s  just “personalized.”

ChatGPT4 doesn’t know that yet. I can tell because I just asked:

Me:
What is “personal AI”?

ChatGPT4:
“Personal AI” refers to artificial intelligence systems that are designed to serve individual users through personalized interactions and services. These AI systems learn from the specific preferences, behaviors, and needs of their users to offer more tailored assistance, recommendations, or information. The goal of personal AI is to enhance user experience by adapting to unique personal profiles.

There are various applications of personal AI, including:

Virtual Assistants: Like Siri, Google Assistant, and others, these AIs help with daily tasks such as scheduling, reminders, searching information, and controlling smart home devices, all customized to the user’s habits and preferences. Recommendation Systems: Used in streaming services like Netflix or shopping platforms like Amazon, these AI systems analyze your past behavior to recommend movies, shows, or products you might like. Healthcare: Personal AI can track individual health metrics and provide personalized advice, monitor conditions, or even alert medical professionals if needed. Learning and Development: In education, personal AI can adapt learning materials to the pace and learning style of the student, offering a customized educational experience. Gaming: AI in gaming can adapt to provide a challenge tailored to the player’s skill level or alter the game narrative based on choices made by the player.

The development of personal AI focuses on enhancing user engagement by creating more intuitive and responsive interactions that feel uniquely suited to each individual.

For a second opinion, I asked Google’s Gemini the same question.

Me:
What is “personal AI”?

Gemini:
Personal AI refers to artificial intelligence designed to assist and interact with a specific individual. It’s essentially your own AI companion that learns about you and tailors its services to your needs. Here are some key aspects of personal AI:

Personalization: Unlike general-purpose AI assistants, personal AI focuses on you. It learns from your interactions, preferences, and habits to become more helpful over time. Focus on Memory and Knowledge: Some personal AI systems aim to become an extension of your mind. They can store your memories, knowledge, and experiences, making them readily accessible and even usable to help with tasks like writing or communication. Varied Applications: Personal AI can have a wide range of applications. Imagine an AI that helps you: Organize your life: Manage schedules, appointments, and reminders. Boost productivity: Assist with tasks, research, and communication. Enhance creativity: Help you brainstorm ideas or generate creative content. Improve learning: Personalize learning experiences and provide targeted information.

Privacy Concerns: Storing personal data on an AI raises concerns about privacy and security.

Overall, personal AI holds promise for a future where technology seamlessly integrates with our lives, offering personalized assistance and support.

To both companies, personal AI is a personalized service—from them. It’s not something you own and control. It’s not about individual empowerment and agency. It doesn’t extend you. It’s hired prosthesis: an extension of them rather than of you.

But hats off. OpenAI and Gemini do an amazing job. So do lots of other AI services by wannabe giants. I use AI every day to improve what I write, to generate and fix images, to answer questions that old-fashioned search engines can’t answer or suck at. I even pay extra for some of it (such as ChatGPT4 and Adobe’s Creative Suite). And it seems they all get better, all the time, at everything. Won’t personalization be good enough, eventually?

No. Because they aren’t you. They also aren’t yours, so they can’t extend you. They can seem to. But they don’t. They also don’t have access to most of your private life. Nor should they.

But your private life could use some AI help. For example—

Schedules, including on your calendars, past and future Health data, including all your medical reports, prescriptions, appointments, insurance information, past and present providers, plus what your watch, phone, and other devices record about you Financial records, including bills, receipts, taxes, and anything called an account that involves money Travel, including all the movements your phone (and phone company), watch, and car record about where you go and where you’ve been Work—past and present, including whatever HR systems know or knew about you Contacts—all the people, businesses, and other entities you know Business relationships, with brokers, retailers, service providers, whatever Subscriptions, including all those “just $1 for the first four weeks” offers you’ve accepted, plus other forms of screwage that are stock-in-trade for companies selling subscription systems to businesses. Property, including all the stuff on your shelves, floors, closets, garages, and storage spaces—plus your stocks and real estate.

It’s not easy to visualize what a personal AI might do for those, but let’s try. Here’s how Microsoft’s Copilot (or whatever it’s called this week) did it for me before I got rid of all its misspellings and added my own hunks of text:

All that stuff is data. But most of it is scattered between apps and clouds belonging to Apple, Google, Microsoft, Amazon, Meta, phone companies, cable companies, car makers, health care systems, insurance companies, banks, credit card companies, retailers, and other systems that are not yours. And most of them think that data is theirs and not yours.

To collect and manage all that stuff, you need tools that don’t yet exist: tools that are yours and not theirs. We could hardly begin to imagine those tools before AI came along. Now we can.

For example, you should be able to take a picture of the books on your shelves and have a complete record of what those books are and where you got them. You’ll know where you got them because you have a complete history of what you bought, where and from whom. You should be able to point your camera in your closets, at the rugs on your floors, at your furniture, at the VIN number of your car that’s visible under your windshield, at your appliances and plumbing fixtures, and have your AI tell you what those are, or at least make far more educated guesses than you can make on your own.

Yes, your AI should be able to tap into external databases and AI systems for help, but without divulging identity information or other private data. Those services should be dependent variables, not independent ones. For full individual empowerment and agency, you need to be independent. So does everyone else with personal AI.

Now imagine having a scanner that you can feed every bill, every receipt, every subscription renewal notice, and have AI software that tells you what’s what with each of them, and sorts records into the places they belong.

Ever notice that the Amazon line items on your credit card bill not only aren’t itemized, but don’t match Amazon’s online record of what you ordered? Your personal AI can sort that out. It can help say which are business and personal expenses, which are suspicious in some way, what doesn’t add up, and much more.

Your personal AI should be able to answer questions like, How many times have I had lunch at this place? Who was I with? When was it we drove to see so-and-so in Wisconsin? What route did we take? What was that one car we rented that we actually liked?

Way back in 1995, when our family first got on the Internet over dial-up, using the first graphical browsers on our PC, and e-commerce began to take off with Amazon, eBay, and other online retailers, my wife asked an essential question: Why can’t I have my own shopping cart that I take from site to site?

Twenty-nine years later, we still don’t have the answer, because every retailer wants you to use its own. And we’re stuck in that system. It’s the same system that has us separately consenting to what sites ironically call “your privacy choices.” And aren’t.

There are countless nice things we can’t have in the digital world today because we aren’t people there. We are accounts. And we are reduced to accounts by every entity that requires a login and password.

This system is a legacy of client-server, a euphemism for slave-master. We might also call it calf-cow, because that’s how we relate to businesses with which we have accounts. And that model is leveraged on the Web like this:

We go to sites for the milk of content and free cookies, whether we want them or not. We are also just “users.”

In the client-server world, servers get scale. Clients have no more scale than what each account—each cow—separately allows. Sure, users get lots of benefits, but scale across many cows is not one of them. And no, “login with Google” and “login with Facebook” are just passes that let calves of ruling cows wander into vassal pastures.

For individual empowerment and scale to happen, we need to be self-sovereign and independent. Personal AI can give that to us. It can do that by solving problems such as the ones listed above, and by working as agents that represent us as human beings—rather than mere users—when we engage with Big Tech’s cows.

This will be a fight at first, because the cows think they run all of nature and not just their own farms. And $trillions are being invested in the same old cattle industry, with AI painted all over the new barns. Comparatively speaking, close to nothing is going toward giving independent and self-sovereign individuals the kind of power and scale Sam Altman says he wants to give us but can’t because he’s on the big cow side of this thing.

So where do we start?

First, with open source code and open standards. We have some already. Llama 3, from Meta AI, is “your own intelligent assistant,” and positions Meta as a more open and user-friendly cow than OpenAI. Meta is still on the top-down Big Tech side of the war we’re in. But hell, we can use what they’ve got. So let’s play with it.

Here on the ground there are all these (with quotage lifted from their sites or reviews such as this one)—

MindsDB: “an open-source AutoML framework” Alt.ai: “It’s an A.I. which aims to digitize users’ intentions and place it on the cloud to let our clones deal with all digital operations.” Keras: “a multi-backend deep learning framework, with support for JAX, TensorFlow, and PyTorch” PyTorch: “Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration, and Deep neural networks built on a tape-based autograd system Tensor Flow: “open-source framework for machine learning” CoreNet: a deep neural network toolkit for small and large-scale models, from Apple Haystack: an “open source Python framework by deepset for building custom apps with large language models (LLMs).” Image Super-Resolution (ISR): “(an) open source tool employs a machine learning model that you can train to guess at the details in a low-resolution image: Blender: “A rich interface and numerous plugins make it possible to create complex motion graphics or cinematic vistas” DeepFaceLab: “open source deepfake technology that runs on Python” tflearn: “an advanced deep learning library” PYTensor: “a Python library that allows you to define, optimize/rewrite, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.” (Was Theano) LM Studio: “Discover, download, and run local LLMs” HuggingFace Transformers: “a popular open-source library for Natural Language Processing (NLP) tasks” Fast.ai: “a library for working with deep learning tasks” OpenCV: “a popular Computer Vision and Image Processing library developed by Intel” Detectron2: “a next-generation library that provides advanced detection and segmentation algorithm” and “a PyTorch-based modular object detection library” Ivy.ai: “an open-source deep learning library in Python focusing on research and development” OpenAssistant: “a project aimed at giving everyone access to a great chat-based large language model” PaddleNLP: “a popular open source NLP library that you can use to glean search sentiment and flag important entities” Delphi.AI: “Clone yourself. Build the digital version of you to scale your expertise and availability, infinitely.” Fauxpilot: “This is an attempt to build a locally hosted alternative to GitHub Copilot. It uses the SalesForce CodeGen models inside NVIDIA’s Triton Inference Server with the FasterTransformer backend.” Ray: “An open source framework to build and scale your ML and Python applications easily” Solid: “Solid is a specification that lets individuals and groups store their data securely in decentralized data stores called Pods. Pods are like secure web servers for data. When data is stored in a Pod, its owners control which people and applications can access it.” Sagen.ai: “Your very own AI Personal Assistant to manage your digital life.” YOLOv7: “is one of the fastest and most accurate open source object detection tools. Just provide the tool with a collection of images full of objects and see what happens next.”

—and lots of others that readers can tell me about. Do that and I will add links later. This is a work in progress.

Below all of those we still need something Linux-like that will become the open base on which lots of other stuff runs. The closest I’ve seen so far is pAI-OS, by Kwaai.ai, a nonprofit I now serve as Chief Intention Officer. I got recruited by Reza Rassool, Kwaai’s founder and chair, because he believes personal AI is required to make The Intention Economy finally happen. So that was a hard offer to refuse. Kwaai also has a large, growing, and active community, which I believe is necessary, cool, and very encouraging.

As with most (maybe all) of the projects listed above, Kwaai is a grass-roots effort by human beings on the natural, human, and horizontal side of a battle with giants who would rather give us personalized AI than have us meet them in a middle to which we will bring personal AI powers of our own. In the long run, we will meet in that middle, because personal AI will be better for everyone than personalized AI alone.

Watch us prove it. Better yet, join the effort.

*I am indebted to Lavonne Reimer for introducing and co-thinking the horizontal vs. vertical frame, and look forward eagerly to her own writings and lecturings on the topic.


Wrench in the Gears

Unwinding – Smart City to Country Roads

I recognize going on and on about my personal situation may be rather tedious. I really don’t mean to make it all about “me,” because I feel like each of our journeys are part of this unfolding multi-dimensional tapestry. Since this is my blog, it seems like an appropriate place to tell the story from [...]

I recognize going on and on about my personal situation may be rather tedious. I really don’t mean to make it all about “me,” because I feel like each of our journeys are part of this unfolding multi-dimensional tapestry.

Since this is my blog, it seems like an appropriate place to tell the story from my particular point of view. For now, I’m giving myself leeway to share these loose, stream of consciousness “notes” – observations, pictures, mullings over – a work in progress. If this upsets or bores you, feel free to take a break from my blog and check back later in the summer or fall. Hopefully by then, I’ll have found my new footing.

On the way down to the Shenandoah yesterday, I was listening again to the biography of Elizebeth Friedman, a skilled codebreaker, who with her husband William laid the groundwork for cryptanalysis in the twentieth century. They were initially based at Riverbank Labs on the Fox River outside of Chicago, a stone’s throw from the future site of Fermi Lab.

https://www.cabinetmagazine.org/issues/40/sherman.php

The secretive Colonel Fabyan founded one of the first Defense Department think tanks in the 1910s (pre-Alfred Loomis), cultivating ties with the University of Chicago. He used his fortune to underwrite private research into plant and fruit fly genetics , bone structure, x-rays, and acoustics, the latter in partnership with Wallace Sabine. You can still purchase precision tuning forks from Riverbank Labs.

http://riverbanklabs.com/about/

What took up much of the Colonel’s attention was research into the Baconian cipher, supposedly encrypted messages about Elizabethan England hidden in Shakespeare’s first folio. It was work on this cipher that eventually led to Elizebeth and William’s future careers in codebreaking.

https://www.goodreads.com/en/book/show/32025298

https://www.cabinetmagazine.org/issues/40/sherman.php

https://sirbacon.org/bacon-forum/index.php?/topic/107-friedman-cipher-study/

Consider William’s involvement in plant genetics and codebreaking later in this post in relation to the Matter CBD sticker and the push to blockchain cannabis genetics…

https://www.cabinetmagazine.org/issues/40/sherman.php

https://medicinalgenomics.com/blockchained-cannabis-dna/#:~:text=Cannabis%20breeders%20and%20cultivators%20can,file%20for%20their%20own%20patent.

William went on to found the NSA. As a team, he and Elizebeth solved complex puzzles – uniting as an extended complementary mind. That, I think, is what is intended with AR LARPy civic gaming. Maybe we can get out ahead of Niantic, Nvidia, the CIA, the Vatican…and untangle some interdimensional knots on our own?

I invite you to to consider my oversharing in that context. It’s an invitation into a collective puzzle-solving game. The one trick is we don’t have the box top image, and we don’t know what we are supposed to be making with all of these seemingly random loose parts.

Anyway, I’m going to make these notes for myself at the very least. I know looking back months or years later with more context, makes all the difference.

https://en.m.wikipedia.org/wiki/Qualia

Qualia – it’s a topic Stephers has been digging into – personalization, biophysics, digital ledgers, umwelt, creating a big picture – the BIGGEST picture. I hope to talk more about that after I get settled.

https://www.conducttr.com

Settlement was yesterday. I guess shouldn’t have been surprised that the office where we were to sign the documents to finalize the sale of the family home was in the Penn Mutual Life Insurance Building, founded in 1847. Think of the probability and stochastic modeling upon which the industry was built, as well as its ties to web3 through parametric deals that run on ubiquitous sensor networks.

The massive complex takes up an entire block and sits on the south side of Walnut (tree, choice theory) opposite Independence Hall (sovereign agents). It was in a Berkshire Hathaway Real Estate office, evidently they are under The Trident Group. As the notary quipped, “Warren Buffett owns us all.”

A close friend of my husband’s grandfather, Phil Carret, was a role model for Buffett. Carret, then in his 90s, came to our wedding. He helped create the world of mutual funds and was a solar eclipse chaser all his life. I continue to think automated financial markets, linked to collectivity and mutualism, are part of the superorganism emergence trajectory. Gates, Buffett, and their “Giving Pledge” is, in my opinion less about philanthropy, and more about using open source software, sensors, web3, and social impact data, to catalyze new forms of informational “life.”

https://youtu.be/KOqr17GNTJ8?feature=shared

A section of the Penn Mutual complex incorporates the Egyptian Revival facade of another nineteenth-century insurance company. Remember insurance was the brain child of inventor, publisher, electricity explorer, diplomat Ben Franklin, a supreme consciousness engineer.

The facade of the Pennsylvania Fire Insurance Company features palm columns and lintels of winged suns, both linked to longevity and the soul.

Out in front is a random sign for “The Bagel Factory,” but there was no retail store anywhere on the block that I could see. It reminded me of the symbolic everything bagel in the Academy Award winning film “Everything Everywhere All At Once.”

https://www.sportskeeda.com/comics/everything-everywhere-once-what-the-bagel-actually-represent#:~:text=The%20Everything%20Bagel%20becomes%20more,be%20felt%20throughout%20the%20universe.

Nope, we can’t control everything.

After it was all over, my not-quite ex husband, and I parted ways surrounded by the bronze reliefs that comprised a massive narrative portal. An auspicious place, I suppose, to step forward into what comes next.

I couldn’t not cross the street to take one last look at the iconic symbol of freeDOM with its clock and its bell giving form to how we perceive “reality.” After taking the photo below, I turned to walk back to the car when I spied a piece of Bitcoin sticker art stuck to a pseudo-colonial lamp post, above it (upside down) was another sticker for Matter CBD, an organic cannabis outfit based southern Oregon where the growers, Cleve Backster-style, sing and talk to their plants.

https://legacy.iftf.org/fileadmin/user_upload/downloads/blockchain/IFTF_BlockchainFutures_Map.pdf

https://legacy.iftf.org/fileadmin/user_upload/downloads/blockchain/IFTF_BlockchainFutures_Map.pdf

Upside down matter in an era of prescribed plant medicine for altered consciousness where tokens masquerade as money, but are perhaps actually cells of some as yet indescribable life form jumping off bibliometry (Eugene Garfield) and maybe geoengineering? How to account for this sticker – a slightly blurry Bitcoin stencil on a scan of an old library check-out card (also upside down with multiple dates from the height of altered state / human potential 60s and 70s) with a jet plane flying below it all? Folks who know more about numerology, do the featured dates have any significance to you?

I kept walking, on past the American Philosophical Society, an off-shoot of the Royal Society laying out what is proper/acceptable science, while keeping esoteric objects like Kelpius’s Horologium of Achaz away from prying eyes. There you have good old Ben in the alcove, Philadelphia’s business, civic, and intellectual patriarch.

https://www.jstor.org/stable/pdf/983156.pdf?refreqid=fastly-default%3Acb0b6221286adcbe5ea9d638716a01e8&ab_segments=&origin=&initiator=&acceptTC=1

And in the next block, flanking Independence Hall to the east is the Bourse, once a commodities exchange, now an upscale tourist food court. The xoxo sculpture out front reminds me of Galia’s digital heart tokens – part of “the city that loves you back” PR campaign I suppose. Affection as tradable commodity. Boy, I got shorted this round for sure. As if to further sour my entanglement with the city, I came back to a parking ticket on my windshield. I’ve maybe gotten one other in my thirty plus years here, and I still had a half hour paid up. Stupid “smart” parking tech. I guess their sham will be the final dues I pay for my “freeDOM.”

After packing the last of my things into the car, and giving my honey locust street tree one more big hug, I headed to I-95 south. There’s part of me that hopes the children who will grow up in its lacy shade will decorate the sidewalk with chalk art as my child once did. For a time, 24th Street was a wonderful place to be.

I ended up pulling off at the University City exit to say good-bye to Bartram’s Garden where I worked for seventeen years until the lockdowns started stealing things from my life. I wanted to get a jar of local honey to sweeten the parting and see my favorite trees. The tide was in, so no mudlarking this time around. I gathered a bit of compost from inside the ancient yellow wood and rubbed it into my hands, I found a walnut with a hole that looks like a twinned portal and put it in my backpack as a keepsake. When I found my favorite, a majestic London Plane, there was the trickster energy, again. It was hosting a piece of outdoor art featuring a fox with a quizzical expression. Well played Philadelphia, well played.

For the past two nights I have been camping at Bear Den off the Appalachian Trail outside Blumont, VA. The site was the former summer estate of a wealthy DC OB-GYN and his opera singer wife. It sits on veins of milky white quartz. A few miles south is Weather Mountain, a Cold War bunker for the federal government. According to a local history write up in the lodge, Mahlon Loomis, a rather visionary nineteenth century Boston dentist conducted experiments transmitting wireless communications between two hills nearby, decades before Marconi, using gilded kites!

It’s been cold for May, and rainy. But my tent stayed dry and between the showers I explored the grounds and went out to the rocky overlook and gathered up plants to make a heart. I spied a fire pit with lots of charcoal. So, I decided to make one with that in honor of my friend Sean and his family who use biochar to feed the garden beds on their small farm in Washington State.

The lodge’s volunteer caretaker found it and was excitedly telling me about it. She said she took photos to send to her boyfriend, and was pointing it out to the incoming hikers, including a couple who said they’d recently relocated to Harpers Ferry from Washington State, which closes the circle. This heart has rose petals, wood sorrel, bramble blossoms, sassafras, puffballs, hemlock cones, fern, and quartz.

I need to pack up my tent and head a bit farther down the road to Bayse. I hear conversations of families across the clearing hauling dripping blankets and debating the merits of camping after a wet, cold night. I remember those days. I’m not sure we ever had a camping trip with good weather. Now I am a solo traveler with my stuffed bear and my house plants. I will see what it feels like to be on my own reconnaissance, finding my own grounding and my own views.


Ben Werdmüller

Monetizing ShareOpenly

I was asked if I’m planning to monetize ShareOpenly. Short answer: I have no plans to do so. This is a personal project. If it’s wildly successful and the infrastructure costs skyrocket, I may look for donations or sponsorship of some kind in order to cover those costs. I’m not looking for it to be profitable or for it to be my job. It’s intentionally very very lightweight, so I don’t expect

I was asked if I’m planning to monetize ShareOpenly.

Short answer: I have no plans to do so. This is a personal project.

If it’s wildly successful and the infrastructure costs skyrocket, I may look for donations or sponsorship of some kind in order to cover those costs. I’m not looking for it to be profitable or for it to be my job.

It’s intentionally very very lightweight, so I don’t expect that to happen for a long time to come.


Some ShareOpenly updates

It’s been a little over a month since I launched ShareOpenly, my simple tool that lets you add a “share to social media” button to your website which is compatible with the fediverse, Bluesky, Threads, and all of today’s crop of social media sites. You might recall that I built it in order to help people move away from their “share to Twitter” buttons that they’ve been hosting for years. T

It’s been a little over a month since I launched ShareOpenly, my simple tool that lets you add a “share to social media” button to your website which is compatible with the fediverse, Bluesky, Threads, and all of today’s crop of social media sites.

You might recall that I built it in order to help people move away from their “share to Twitter” buttons that they’ve been hosting for years. Those buttons made sense from 2006-2022 — but not so much in a world where engagement on Twitter/X is falling, and a new world of social media is emerging.

People have been using it, and I’ve had lots of great feedback.

So, today, I’m pleased to announce releases for two of the biggest requests people have made for the tool.

A share icon

A share button needs an icon. That was clear from the very beginning. It needs to be something distinctive — this is a different kind of social media share tool — but also immediately recognizable as a share icon.

I reached out to one of the best designers in the field: Jon Hicks, whose excellent work includes the new Thunderbird logo, Disney’s SpellStruck, Spotify’s icon set, and Truck, an excellent record store in my hometown. I was delighted when he agreed to create a share icon for ShareOpenly.

This icon works really well at small and large sizes: in sidebars, in footers, and wherever you need to help people share. Click the version embedded here to share this very post:

A WordPress plugin

Lots of people have asked me for an easy way to embed a ShareOpenly link into WordPress.

David Artiss, a support lead at Automattic’s excellent WordPress VIP service, has written a WordPress plugin that is now available in the official WordPress plugin directory. He writes more about it in an announcement blog post on his site:

Simply download the plugin, activate it and you’ll find a link added to the bottom of every WordPress post or page. A simple settings page allows you to change the sharing text, as well as whether it appears on posts and/or page content.

Boom! It couldn’t be easier.

I really hope that the new icon and the WordPress plugin make it easier to include more open sharing to your website. ShareOpenly is suitable for everything from small blogs to large publishers.

Manually creating a share link

Of course, you don’t need to use the WordPress plugin. You can embed a share icon onto any web page using this code:

<a href="#" id="shareopenly"><img src="https://shareopenly.org/images/logo.svg" alt="Share to social media"></a> <script> document.querySelector('#shareopenly').addEventListener('click', (e) => { e.preventDefault(); let href = 'https://' + 'shareopenly' + '.org/share/?url='; href += `${encodeURIComponent(window.location.href)}&text=${encodeURIComponent(document.title)}`; window.location.href = href; }); </script>

Or you can construct the URL yourself by following the instructions on this page.

Have fun, and please keep the feedback coming! You can always email me at ben@werd.io.


An Interview With Jack Dorsey

This interview is as interesting for what it doesn't mention - fediverse, for example - as for what it does. This helps explain why he distanced himself from Bluesky after he'd previously established it and ensured it had funding: "This tool was designed such that it had, you know, it was a base level protocol. It had a reference app on top. It was designed to be controlled b

This interview is as interesting for what it doesn't mention - fediverse, for example - as for what it does.

This helps explain why he distanced himself from Bluesky after he'd previously established it and ensured it had funding:

"This tool was designed such that it had, you know, it was a base level protocol. It had a reference app on top. It was designed to be controlled by the people. I think the greatest idea — which we need — is an algorithm store, where you choose how you see all the conversations. But little by little, they started asking Jay and the team for moderation tools, and to kick people off. And unfortunately they followed through with it."

That's not actually how Bluesky works - the people who were banned were banned from the reference implementation, not the protocol. And, often, they were banned from the reference community for heinous content that would have prevented other people from being able to make use of that space. Any open social platform that doesn't support moderation will be dead in the water: moderation is a key part of running any community.

I think Jack knows this, so I don't buy it.

Meanwhile, the interviewer is a Partner at Founders Fund who once blocked me on Twitter for being too left-wing, which I think sort of puts the comments about moderation and freedom of speech in context. #Technology

[Link]

Thursday, 09. May 2024

Ben Werdmüller

Slop is the new name for unwanted AI-generated content

Simon Willison has a perfect name for unreviewed content that is shared with other people: "slop". He goes on: "I’m happy to use LLMs for all sorts of purposes, but I’m not going to use them to produce slop. I attach my name and stake my credibility on the things that I publish." I think that's right. I'm less worried about using LLMs internally - as long as you unde

Simon Willison has a perfect name for unreviewed content that is shared with other people: "slop".

He goes on:

"I’m happy to use LLMs for all sorts of purposes, but I’m not going to use them to produce slop. I attach my name and stake my credibility on the things that I publish."

I think that's right. I'm less worried about using LLMs internally - as long as you understand that they're not impartial or perfectly factual sources, and as long as you take into account the methods used to generate the datasets that were used to train them. (Those are some big "if"s.)

But don't just take that output and share it with the public. And *certainly* don't do it so that you can publish content at scale without having to hire real writers. Not only is that not a good look, but you're going to harm your brand and your reputation in the process. #AI

[Link]


A letter to Russell T Davies

Here’s what I would say to Russell T Davies if I could: One of my very first television memories is sitting watching Peter Davison’s Doctor (and reruns of Tom Baker’s) on a tiny 12” TV set, my face probably too close to the screen. My imagination ran wild. There was a large horse chestnut tree set in the playground of my primary school, and it became the console of my own time machine: first by

Here’s what I would say to Russell T Davies if I could:

One of my very first television memories is sitting watching Peter Davison’s Doctor (and reruns of Tom Baker’s) on a tiny 12” TV set, my face probably too close to the screen. My imagination ran wild. There was a large horse chestnut tree set in the playground of my primary school, and it became the console of my own time machine: first by myself, as a lonely, weird little kid, and then more as other children decided to see what on earth I was doing.

When Sylvester McCoy’s era rolled around, we would fold out the sofabed every Wednesday after Wogan and watch the next installment. I remember being particularly drawn in by the continuing story around Ace, the hints about something bigger in the Doctor’s past, and his plans for her.

When it was canceled, I devoured the New Adventures books, starting with the Timewyrm and Cat’s Cradle series.

And then, in 2005, when it all started up again, I would gather up the episodes and watch them over Christmas with my mother, once again. When she became terminally ill and I moved to be closer to her, we watched them all together in real time. We loved the reboot, the reinvigorated ethos and the joy of it, and the continuation of stories that had been in progress since before I was born.

Russell: it wasn’t just a TV show that you resurrected. (Although it was that, too, of course, and a really good one.) It was those times sitting together, the shared family space, the love and togetherness and fun of it all.

She would have loved the bi-generation and Ncuti Gatwa’s sparkling take on the character. She would have been excited for this new season as much as I am.

I can’t wait to watch. I’m excited for all these new stories, new ideas, new provocations. I won’t be alone. Through all those adventures in time and space, I’ll have a companion with me, invisibly sitting close, the sofa bed unfolded, laughing and hiding behind the cushions alongside me.

Thank you for this. Thank you for all of it.

Wednesday, 08. May 2024

Heres Tom with the Weather

60+ Journalism Profs Demand Investigation into Controversial NYT Article

As the “paper of record” this NYT article was extremely influential in shaping public opinion. 60+ Journalism Profs Demand Investigation into Controversial NYT Article Alleging Mass Rape on Oct. 7

As the “paper of record” this NYT article was extremely influential in shaping public opinion.

60+ Journalism Profs Demand Investigation into Controversial NYT Article Alleging Mass Rape on Oct. 7


Ben Werdmüller

Palantir's earnings call rhetoric is terrifying

Mark Nottingham highlighted this alarming quote by CEO Alex Karp from the latest Palantir earnings call: I think the central risk to Palantir and America and the world is a regressive way of thinking that is corrupting and corroding our institutions that calls itself progressive, but actually -- and is called woke, but is actually a form of a thin pagan religion. That is a real danger to our

Mark Nottingham highlighted this alarming quote by CEO Alex Karp from the latest Palantir earnings call:

I think the central risk to Palantir and America and the world is a regressive way of thinking that is corrupting and corroding our institutions that calls itself progressive, but actually -- and is called woke, but is actually a form of a thin pagan religion.

That is a real danger to our society. And it is a real danger to Palantir if we allow -- if we don't discuss these things. The reason we have by far the best product offering in the world is because we have by far the best alignment around how to build software, what it means to build software, full alignment with our customers, a view that some -- the Western way of living is superior and, therefore, it should be supported by the best products.

[…]We believe we are fighting for a stronger, better, less discriminatory, wealthier, more open, and better society by providing the friends of the West, U.S. industry, U.S. government, our allies, with by far superior products.

I find this so alarming. I’m so opposed to this way of thinking that I don’t exactly know where to start. “Woke is paganism” smacks of a deeply regressive way of thinking; not least because “paganism” is bad smacks of a very narrow way of thinking where some religions are better than others. I hate it on every level — and that’s before we get to the US-centric nationalism.

Palantir, of course, is the company whose products and services routinely power systemic human rights abuses. So perhaps I shouldn’t be surprised. But it’s still very striking to see these kinds of words expressed during an earnings call.


Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

"Users who disagree with having their content scraped by ChatGPT are particularly outraged by Stack Overflow's rapid flip-flop on its policy concerning generative AI. For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when

"Users who disagree with having their content scraped by ChatGPT are particularly outraged by Stack Overflow's rapid flip-flop on its policy concerning generative AI. For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts."

This is all about money: "partnering" with OpenAI clearly means a significant sum has changed hands. The same thing may have happened at Valve, which also unblocked AI-generated art from its marketplace.

This feels like short-term thinking to me: while Stack will clearly make some near-term revenue through the deal, it comes at a cost to the health of its community, which is ultimately what drives the company's value. If motivated contributors drop off, the only thing left will be the AI-generated content - and there's no way that this will be as valuable over time.

I'd love to have been a fly on the wall of the boardroom where this deal was undoubtedly decided. What are they measuring that made this seem like a good idea - and what are they not measuring that means they're blind to the community dynamics that drive their actual sustainability? It's all fascinating to me. #AI

[Link]


Meet AdVon, the AI-Powered Content Monster Infecting the Media Industry

"We found the company's phony authors and their work everywhere from celebrity gossip outlets like Hollywood Life and Us Weekly to venerable newspapers like the Los Angeles Times, the latter of which also told us that it had broken off its relationship with AdVon after finding its work unsatisfactory." Even if the LA Times broke off its relationship because the work was uns

"We found the company's phony authors and their work everywhere from celebrity gossip outlets like Hollywood Life and Us Weekly to venerable newspapers like the Los Angeles Times, the latter of which also told us that it had broken off its relationship with AdVon after finding its work unsatisfactory."

Even if the LA Times broke off its relationship because the work was unsatisfactory, the fact that this was attempted in the first place is unsettling. What if the work hadn't been "unsatisfactory"? What if it had been "good enough"?

It's not so much the technology itself as the intention behind it: to produce content at scale without employing human journalists, largely to generate pageviews in order to sell ads. There's no public service mission here, or even a mission to provide something that people might really want to read. It's all about arbitrage. #AI

[Link]


Bookending

"Here’s a small trick that worked for me over the dozen years I led remote teams: at the end of your working day, shut down every app on your machine. Yes, all of them. Stash your tabs somewhere if you must, but close them all down." I do this, including closing all of my tabs. Who really needs to keep hundreds of tabs? You? Why? Let them go! The note-taking aspect of t

"Here’s a small trick that worked for me over the dozen years I led remote teams: at the end of your working day, shut down every app on your machine. Yes, all of them. Stash your tabs somewhere if you must, but close them all down."

I do this, including closing all of my tabs. Who really needs to keep hundreds of tabs? You? Why? Let them go!

The note-taking aspect of this has been my actual use for Obsidian: I take daily notes that plug together my thoughts for the day and some ideas about what I might need to do next, as well as things I'm worried about (I'm always worried about a lot of things).

Not that long ago, I would have turned my computer off at the end of every day. This is kind of a modern version of that. Although, of course, there's something to just switching the computer off, too. #Productivity

[Link]


Wrench in the Gears

Creative Placemaking As Cryptographic Cultural Computation

I’ll be leaving Philadelphia tomorrow after the house closing paperwork is signed. The cleaners are getting it ready for its new family. A chapter closes, a new one opens. I’ll slowly make my way down to Little Rock, camping along the way. After three back-to-back, three-day, 1,300-mile u-haul drives in as many weeks, I’m sick [...]

I’ll be leaving Philadelphia tomorrow after the house closing paperwork is signed. The cleaners are getting it ready for its new family. A chapter closes, a new one opens. I’ll slowly make my way down to Little Rock, camping along the way. After three back-to-back, three-day, 1,300-mile u-haul drives in as many weeks, I’m sick of bland hotels with oddly coded decor and ready to sleep under the stars without worrying if someone broke into the truck while I was sleeping.

I built in a few extra days to poke around Oak Ridge outside Knoxville and see the colossal Athena in Nashville’s 1931 Parthenon replica with the snakey Erichthonius (born of Hephasteus’s semen falling to the soul). Interesting allusions to biogeochemistry and the origins of life in a city growing by leaps and bound on economies of biotech and blockchain health systems.

Yesterday I hit REI and got some new Tevas a year after my 20-year old pair gave up the ghost. I need shower shoes for the campgrounds, and Arkansas lakes are clear with sharp rocky bottoms. I also grabbed a tiny USB-rechargeable lantern. It’s been raining a lot. If I get stuck in the tent, I have “Godel, Escher, Bach” to keep me busy – all strange loops, and recursion, self-referencing refrains that call to mind for me reincarnation or many worlds depending on your ideas about spacetime. Using digital light to process print information seems par for the course.

I’d hoped to be able to tour the Barnes Foundation today, but it was closed to the general public. In reading how Godel assigned symbols to numbers to generate a coded language that generated insights into the limitations of mathematical proofs, I started to think more about Steganography, secretly encoding objects with alternate meanings. It is a form of information exchange that can exist out in the open, but only able to be translated by those who know the coding. Stephers brought it to my attention a few years back.

The Barnes Foundation has an unusual, and highly politicized history. Albert Barnes, a Philadelphian, got his MD at the University of Pennsylvania, then trained as a corporate chemist in Heidelberg, Germany, and later made a fortune selling, Argyrol, silver nitrate drops used in infants eyes after delivery.

He used his fortune to acquire a vast, eclectic collection of Impressionist, Post-Impressionist, and Modern art. These pieces are displayed in methodical arrangements, or ensembles, that incorporate pieces of decorative arts and furniture as well as ritual objects from a range of world cultures. Barnes created sets, and the interplay of the objects in each set generates additional meaning, context, through juxtaposition.

Barnes focused on the application of scientific principles to art, and established education programs around the collection. His wife Laura managed an arboretum on the property, also focused on continuing adult education. They considered the estate in Lower Merion to be a laboratory.

Albert, who had grown up poor, was often at odds with Philadelphia’s social elite. When he died, management of the collection and its programs was entrusted to a board run by Lincoln University, a historically Black college in Chester County. For decades there was rancor with the Barnes neighbors in Merion, and the elite wanted to break the trust, push out the Lincoln trustees and bring Barnes’s art lab, in a rather more neutered format to the Ben Franklin Parkway, which is where I’m now sitting killing time under the fancy blooming buckeyes until my car is ready.

The “new” Barnes on the Ben Franklin Parkway

Now I am factoring in artifacts and Godel’s numbering system that to me has digital ID overtones, as well as agents, civic gaming, tokens, signals, and computation within layers of meaning where some (maybe many) interconnected information fields are obscured or subject to the Pitt/Ober Athenian “knowledge management” protocols.

What is AR crypto “play to earn,” really? If I listen to my gut, I would say it is about massive parallel post-Moore’s law information processing through networked biophysics and social physics and dynamic game mechanics linked to the sensors and actuators of spatial computing.

Are we meant to be Godel Numbers, dancing, harmonizing, particles in a cosmic conversation only our subconscious can access?

Maybe notes in a musical performance? Orchestral music? Jazz? Sacred? Profane? Remember, Tik Tok started out as ByteDance, and a Conductrr crisis simulation modelling system that promotes the use of xAPI to build fluid reality has been running in the background gor a decade or more. I was intrigued to read that Ada Lovelace imagined Babbage’s computing machine might be used to realize complex scientific forms of music. Was it music of the spheres, cymatics, manifestation?

If Galia of Bancor seeks to harness the power of mothers’ hearts through tokens, what does it mean that Joseph Lubin, founder of Consensys and a lead builder of smart contract worlds got his start in Princeton in robotics and automated music?

Jamie Wheal’s book “Stealing Fire” has a whole chapter on the tech behind the electronic dance music scene as a catalyst for ekstasis. Is the point to use token engineering combined with nano and frequency to network human embodied intelligence into a global composition so that we collectively unlock access to subconscious fields of information and serve as some kind of vast liquid crystal manifesting system that uses encoded cultural artifacts as a kind of creative language? These are all crazy possibilities I am mulling over.

I just can’t help but wonder about Barnes and Heidelberg and baby eyes and arranged artifacts and encoded (perhaps interdimensional?) communication and botany and arborization and many worlds and the Montecarlo method and what jokes this gritty, esoteric city is playing on me…

Two imposing sculptures flank the median of the Parkway, The Soldiers and Sailors Civil War Monument, erected by the city during World War I. I can’t help but think of the intergenerational polarity generated by the Civil War, still reverberating through our shared consciousness as the culture wars, now with memes, hashtags, social graphs, and digital stigmergy. When I read the statements about freeing slaves and freedom under a unified Constitution, one destiny, I picture a future with Nicole Shanahan’s equitable smart contract laws, logic melded culture that has been codified, standardized, and made accessible to a global outside-in robot that has no soul.

“In Giving Freedom To The Slave, We Assure Freedom To The Free”

I think of sovereign agents in a never-ending series of loosely programmed scenarios, degrees of freedom set in advance, where AI life coaches log choices made on permanent ledgers, and assess our relative worth in the current fitness landscape. Not slaves, but “free” agents responding to a digital constitution reimagined as a dynamic, responsive gameboard, cells in the collective Athenian smart-city superorganism.

“One Country, One Constitution, One Destiny”

As I mull all of this over – my city speaking to me in steganography – you are an agent who is failing to advance the noetic biohybrid computer. Therefore you are a cancer to be shunned, excluded, expelled. That’s just how the program works. I see it, and my bags are packed. Rafael says the fluids are all topped off and my old dumb Subaru is ready to head South into unknown territory.

I made some site visits over the last few days, left a few hearts behind – shocker, right? The first was under an ancient oak behind Jeffrey Yass’s Susquehanna International Group headquarters in Bala Cynwyd on the Main Line. I think the fact that it’s labelled “revolutionary” is significant.

I used pollen and fallen oak leaves with pine cones and dandelions – so cheerful and humble. In the center I placed a rose nicked from the parking lot of my Jungian analyst whom I’m sad to leave. Even though he didn’t follow a lot of what I said, he valued my authenticity and my journey, as odd as it is turning out to be. I told him the heart would be a tribute to entelechy, as represented by the acorn – the tiny spark of powerful potential that is our soul’s quest. I had to look really hard to find three tiny acorns, but I did and placed them on some mugwort (good for dreams).

I hugged the massive girth of the oak, a symbol of arborization, anima mundi, branching choices, montecarlo, and asked for guidance in what comes next – to be of service with the time I have left.

The sculpture below is right out in front of the building. It is white metal and consists of three overlapping circles topped by linear waves. To me this connected to Hofstadter’s discussion of recursion, cannon, and fugues. Remember SIG was the esrly lead investor in Tik Tok, pendulum, time, entrainment.

Yesterday I visited the Kelpius cave, and to my surprise it seems like some Pythagoreans had gotten there first. When I arrived there were sizeable sticks, along with rocks and rubble, arranged in a large triangle.

Inside was a base and from it extended several smaller sticks that created a simple “tree” with three branches. There was a symbol with paired triangles, like the Star of David etched in the damp earth near the doorway. Scattered around seemed to be broken remnants of a popsicle stick project, which to me symbolized configurations of crystal lattice.

Upon that I placed my heart of purple Pawlonia blossoms, dandelions, fern, may apple, pine cones, and tulip poplar petals. Scattered around the edge were Star of Bethlehem flowers.

When I finished I chatted with a white-haired gentleman, a birder named Paul (which factors into my cathedral visit) who’d been watching migrating warblers around the cave. He told me he had a friend who he used to walk with who had been sort of an informal caretaker of the cave. I pointed out the fading Nephele graffiti and showed him the heart and branched sticks, and then went into avianmagnetoreception as a study case for quantum biology and how that tied into US Navy funded anthropology research into Micronesian open water navigation. He told me a bit about local garnets, and I shared some mica flakes I found. It was a gentle way to close my time in the Wissahickon.

So then on my drive to take my car to the mechanic, I stopped at a light on Broad Street at MARCONI Plaza and in the median was a woman in a sun hat digging dandelions!

Then heading home I walked through Love Park, noticing it was now sponsored by Bank of America. There were some guys with big cameras set up to take pictures of people taking pictures posing in front of the sculpture. I wondered if they were working on a documentary, so I tried to chat them up about gap junctions, group mind, the energy of the heart torus field, and Lev “Heart” being the name of Galia Benarzi’s first digital community currency – structured around an Israeli mothers’ babysitting co-op. They were not very interested.

The next person to not be interested was the gal staffing the open budget pop-up cargo container civic education junket sponsored in part by Mural Arts. I tried to explain web3 token voting linked to AI twins and social impact betting. Then a guy showed up with flyers about advice on home ownership and career services, and I had to point out that when Philadelphia’s City Council had anti-poverty hearings five years ago, the region’s job growth report said the average wage was going to be $15 an hour. You can’t buy a house in Philadelphia with that income. I pointed to the office buildings around us and said – those will be redone as affordable rental housing, only it will be like a dorm and your bedroom will have no windows, and you will pay your rent with UBI and data tied to sustainability and wellness behaviors. If you want sunlight the common area by the windows will be full of AI vision cameras and sensors like at WeWork (now Flow) and if you play the game well you can earn a small bit of equity in the ant computer. This will be set up by strange bedfellows partnerships between private equity (Blackstone/Ancestry.com DNA), community banks, high net individuals, and religious groups.

They barely registered what I was saying. In the end I told her to look up Cesar Hidslgo’s TED talk on radical participatory voting. They are training voters for the old game, just as they are about to be taken for a ride by the new web3 tokenomics game.

And why is Mural Arts behind this? Think cultural artifacts, symbols, civic gaming. This is a useful paper on social impact finance linked to public art. The city remade as a digital museum for collective manipulation and “creative placemaking.”

https://www.sciencedirect.com/science/article/abs/pii/S0304422X18303747

My final stop was at the Catholic Cathedral of Saints Peter and Paul overlooking Logan Circle. I hadn’t actually been inside before, but yesterday was the day. I entered the foyer with plaques commemorating masses given by Pope Francis and Pope John Paul II, then I went inside. What struck me was all the different ways sacred light was communicated, the many references to paired keys (encryption), and the book and the sword (Paul) that evoked for me Philip Pullman’s subtle knife that can cut through dimensions. or perhaps information fields as represented by the good book.

There was a huge painting of the Magi under a beaming star of Bethlehem. I’d used Star of Bethlehem blossoms in my Kelpius heart.

The final synchronicity was the strange appearance of a new white ball cap with a Northeastern University Huskies logo on it. I’d been going back and forth the day before, texting with a friend about the oversize role it seemed the Boston university was playing in the AI, biotech, ed-tech transformation of Maine under the guidance of former Jackson Labs spokesperson David Roux. And now here was a hat for that very school, the first thing I saw upon entering the sanctuary.

So my last day in the city of brotherly love is winding down. I read this passage on the subway today – how to step outside the system.

I don’t think it’s truly possible to exit, but for now I’m unplugging from Philly, a city that taught me much and broke my heart. With time I hope it will heal, and I can become a wise old crone.


Ben Werdmüller

40 years later, a game for the ZX Spectrum will be once again broadcast over FM radio

"There were times when Sinclair ZX Spectrum games were copied over the radio waves across Slovenia. Radio Študent broadcast screeching, beeping and whining, which we recorded on tape and played a game a few hours later." I love this! I never had a ZX Spectrum, but I did have a ZX81, one of its precursors, and have fond memories of loading games from tape. The idea that you

"There were times when Sinclair ZX Spectrum games were copied over the radio waves across Slovenia. Radio Študent broadcast screeching, beeping and whining, which we recorded on tape and played a game a few hours later."

I love this! I never had a ZX Spectrum, but I did have a ZX81, one of its precursors, and have fond memories of loading games from tape. The idea that you could broadcast a game over FM radio is delicious - just start recording via tape and then you're good to go. A great way to spread free software and free culture before the advent of the commercial internet.

And I love that they're going to do it again! I wonder who still has a ZX Spectrum ready to go? #Technology

[Link]


Options are a lottery ticket

Update: I wrote a longer post that explains this argument less flippantly and in more detail. This post is anecdotal and should not be considered to be investment advice. A company I used to be associated with sent out an email yesterday that essentially explained that the effective share price was lower than some people had bought options at, and that preferred shares were now common stock. I

Update: I wrote a longer post that explains this argument less flippantly and in more detail.

This post is anecdotal and should not be considered to be investment advice.

A company I used to be associated with sent out an email yesterday that essentially explained that the effective share price was lower than some people had bought options at, and that preferred shares were now common stock. I’m not mad about it: in fact, I think the restructuring was a good thing, and the cap table is now optimized for employees of the current phase of the company, which is how it should be. (The company, which will remain nameless, used to be troubled but is now doing really well under a new CEO. I like both the old and new CEOs very much, and there seems to be alignment between them on what needs to happen, which helps.)

I did not exercise my options at that company, so I have lost exactly nothing. In fact, I’ve never exercised options at any company I’ve been a part of.

This is maybe a bit of a self-own: that implies I’ve never been a part of a company that I felt strongly enough about that I wanted to own part of it. That’s actually not true. I own a significant chunk of Latakoo, the company that powers video delivery for news networks around the world — but I bought those shares as a direct investment at a low price while I was a very early employee, rather than as options. I also own shares in a few other companies that I’ve either advised or been a part of. (I’m also always interested in advisory roles in other companies in exchange for equity.)

But in general, for regular employees, I think options are rarely worth it. They typically require an up-front investment that many employees simply can’t make, so it’s a bit of a fake benefit to begin with, and their future value is little more certain than a lottery ticket. It’s a nice sign for founders when you can buy in, but those employees tend to be already-wealthy. Unless you’re very early at a company, the options are very cheap, and the prospects look amazing, I think it’s usually better investment to optimize for cashflow and save a portion of your money in traditional funds. Perhaps that’s a boring idea, but there it is. The promise of getting rich quick through options is what every get rich quick scheme is: too good to be true. Take the salary and bank it.


Patrick Breyer

Gutachten zum Computerspiele-Sterben: Piraten wollen klare EU-Regeln gegen willkürliches Lahmlegen von Spielen

Vor dem Hintergrund der Spielerinitiative „Stop killing games“, die sich gegen das Lahmlegen von Computerspielen durch ihre Hersteller wendet, hat der Wissenschaftliche Dienst des Europäischen Parlaments …

Vor dem Hintergrund der Spielerinitiative „Stop killing games“, die sich gegen das Lahmlegen von Computerspielen durch ihre Hersteller wendet, hat der Wissenschaftliche Dienst des Europäischen Parlaments im Auftrag der Europaabgeordneten der Piratenpartei Dr. Patrick Breyer die einschlägigen EU-Gesetze untersucht und sieben potenziell relevante Regelungen aufgeführt. Breyer sieht Lücken im Rechtsrahmen:

„Es fehlen klare EU-Verbraucherschutzregelungen gegen die branchenübliche Praxis, dass Spielehersteller beliebte Spiele gewinnbringend verkaufen, sich dabei aber die willkürliche, jederzeitige Unbrauchbarmachung vorbehalten und nicht einmal eine Rückerstattung des Kaufpreises anbieten. Es gibt zwar eine allgemeine EU-Richtlinie gegen ‚missbräuchliche Klauseln in Verbraucherverträgen‘, aber deren Auslegung muss langwierig von Gerichten entschieden werden. Es gibt außerdem eine EU-Richtlinie, derzufolge digitale Inhalte während der Vertragsdauer bereit gestellt werden müssen, aber gegen die kurzfristige Kündigung und Einstellung eines gerade erst gekauften Spiels schützt sie nicht. Die EU-Regeln werden der kulturellen Bedeutung von Games nicht gerecht, die gewachsene Gemeinschaften von Millionen von Spielern zusammen bringen können. Wir Piraten fordern, dass die EU Games als Kulturgut schützt, ein Lahmlegen nach Gutdünken verbietet und, wenn der Hersteller Games aufgibt, ein Weiterführen durch die Community ermöglicht.“

Tuesday, 07. May 2024

Just a Theory

Mini Summit Five

Links, notes, and commentary from Yurri Rashkovskii’s presentation, “Universally Buildable Extensions: Dev to Prod,” at the fifth Postgres Extension Ecosystem Mini-Summit on May 1, 2024.

The video for Yurri Rashkovskii’s presentation at the fifth Postgres Extension Ecosystem Mini-Summit last week is up. Links:

Video PDF Slides

Here’s my interpolation of YouTube’s auto-generated transcript, interspersed with chat activity.

Introduction I opened the meeting and introduced Omnigres’s Yurri Rashkovskii. Presentation

Yurri: Today I’m going to be talking about universally buildable extensions. This is going to be a shorter presentation, but the point of it is to create some ideas, perhaps some takeaways, and actually provoke a conversation during the call. It would be really amazing to explore what others think, so without further ado…

I’m with Omnigres, where we’re building a lot of extensions. Often they push the envelope of what extensions are supposed to do. For example, one of our first extensions is an HTTP server that embeds a web server inside of Postgres. We had to do a lot of unconventional things. We have other extensions uniquely positioned to work both on developer machines and production machines — because we serve the the developers and devops market.

The point of Omnigres is turning Postgres into an application runtime — or an application server — so we really care how extensions get adopted. When we think about application developers, they need to be able to use extensions while they’re developing, not just in production or on some remote server. They need extensions to work on their machine.

The thing is, not everybody is using Linux Other people use macOS and Windows and we have to account for that. There are many interesting problems associated with things like dependencies.

So there’s a very common approach used by those who who try to orchestrate such setups and by some package managers: operating out of container. The idea is that with a can create a stable environment where you bring all the dependencies that your extension would need, and you don’t have to deal with the physical reality of the host machine. Whether it’s a developer machine, CI machine, production machine, you always have the same environment. That’s definitely a very nice property.

However, there are some interesting concerns that we have to be aware when we operate out of a container. One is specifically mapping resources. When you have a container you have to map how many cores are going there, memory, how do we map our volumes (especially on Docker Desktop), how we connect networking, how we pass environment variables.

That means whenever you’re running your application — especially locally, especially in development — you’re always interacting with that environment and you have to set it up. This is particularly problematic with Docker Desktop on macOS and Windows because these are not the same machines. You’re operating out of a virtual machine machine instead of your host machine, and obviously containers are Linux-specific, so it’s always Linux.

What we found is that often times it really makes a lot of sense to test extensions, especially those written in C, on multiple platforms. Because in certain cases bugs, especially critical memory-related bugs, don’t show up on one platform but show up on another. That’s a good way to catch pretty severe bugs.

There are also other interesting, more rare concerns. For example, you cannot access the host GPU through Docker Desktop on macOS or through Colima. If you’re building something that could have use the host GPU that would work on that machine it’s just not accessible. If you’re working something ML-related, that can be an impediment

This also makes me wonder: what are other reasons why we’re using containers. One reason that struck out very prominently was that Postgres always has paths embedded during compile time. That makes it very difficult to ship extensions universally across different installations, different distributions. I wonder if that is one of the bigger reasons why we want to ship Postgres as a Docker container: so that we always have the same path regardless of where where it’s running.

Any questions so far about Docker containers? Also if there’s anybody who is operating a Docker container setup — especially in their development environment — if you have any thoughts, anything to share: what are the primary reasons for you to use a Docker container in your development environment?

Jeremy S in chat: When you say it’s important to test on multiple platforms, do you mean in containers on multiple platforms, or directly on them?

Jeremy S in chat: That is - I’m curious if you’ve found issues, for example, with a container on Mac/windows that you wouldn’t have found with just container on linux

Daniele: Probably similarity with the production deployment environments. That’s one. Being free from whatever is installed on your laptop, because maybe I don’t feel like upgrading the system Python version and potentially breaking the entire Ubuntu, whereas in a Docker container you can have whatever version of Python, whatever version of NodeJS or whatever other invasive type of service. I guess these are these are good reasons. These were the motivation that brought me to start developing directly in Docker instead of using the desktop.

Yurri: Especially when you go all the way to to production, do you find container isolation useful to you?

Daniele: Yeah I would say so; I think the problem is more to break isolation when you’re are developing. So just use your editor on your desktop, reload the code, and have a direct feedback in the container. So I guess you have to break one barrier or two to get there. At least from the privilege points of having a Linux on desktop there is a smoother path, because it’s not so radically different being in the container. Maybe for Windows and macOS developers it would be a different experience

Yurri: Yeah, I actually wanted to drill down a little bit on this In my experience, I build a lot on macOS where you have to break through the isolation layers with the container itself and obviously the VM. I’ve found there are often subtle problems that make the experience way less straightforward.

One example I found it that, in certain cases, you’re trying to map a certain port into the container and you already have something running [on that port] on your host machine. Depending on how you map the port — whether you specify or don’t specify the address to bind on — you might not get Docker to complain that this port is actually overridden.

So it can be very frustrating to find the port, I’m trying to connect to it but it’s not connecting to to the right port. There’s just very small intricate details like this, and sometimes I’ve experienced problems like files not perfectly synchronizing into the VM — although that has gotten a little better in the past 2–3 years — but there there were definitely some issues. That’s particularly important for the workflows that we’re doing at Omnigres, where you’re running this entire system — not just the database but your back end. To be able to connect to what’s running inside of the container is paramount to the experience.

Daniele: Can I ask a question about the setup you describe? When you go towards production, are those containers designed to be orchestrated by Kubernetes? Or is there a different environments where you have your Docker containers in a local network, I assume, so different Dockers microservices talking to each other. Are you agnostic from what you run in it, or do you run it on Kubernetes or on Docker Compose or some other form of glue that you you set up yourself, or your company has set up?

Steven Miller in chat: … container on Mac/windows [versus linux] Steven Miller in chat: Have seen with chip specific optimizations like avx512

Yurri: Some of our users are using Docker Compose to run everything together. However, I personally don’t use Docker containers. This is part of the reason why the topic of this presentation is about universally buildable extensions. I try to make sure that all the extensions are easily compilable and easily distributable on any given supported platform. But users do use Docker Compose, it’s quite common.

Does anyone else here have a preference for how to move Docker containers into production or a CI environment?

Nobody? I’ll move on then.

Steven Miller in chat: Since in docker will run under emulation, but on linux will run with real hardware, so the environment has different instruction set support even though the docker —platform config is the same

Jeremy S in chat: That makes sense

Yurri: I wanted to show just a little bit of a proof of concept tool that we’ve been working on, on and off for the last year—

David Wheeler (he/him): Yurri, there are a couple comments and questions in chat, I don’t know if saw that

Yurri: I didn’t see that sorry.

Jeremy is saying, “when you say it’s important to test on multiple platforms do you mean in containers on multiple platforms or directly on them?” In that particular instance I meant on multiple platforms, directly.

The other message from Jeremy was, “I’m curious if you found issues for example with a container on Mac or Windows that you wouldn’t have found with just container on Linux?” Yeah I did see some issues depending on the type of memory-related bug. Depending on the system allocator, I was either hitting a problem or not. I was not hitting it on Linux, I believe and it was hidden macOS. I don’t remember the details right now, unfortunately, but that difference was indicative of a bug.

Steven wrote, trying to connect this… “Have * seen chip-specific optimizations for containers?” And, “Docker will run under emulation but on Linux will run with real Hardware.” Yeah that’s an interesting one about ax512. I suppose this relates to the commentary about about GPU support, but this is obviously the other part of supporting specific hardware, chip-specific optimizations That’s an interesting thing to learn; I was not aware of that! Thank you Steven.

Let’s move on. postgres.pm is a pro of concept that I was working on for some time. The idea behind it was both ambitious but also kind of simple: Can we try describing Postgres extensions in such a way that they will be almost magically built on any supported platform?

The idea was to build an expert system of how to build things from a higher level definition. Here’s an example for pgvector:

:- package(vector(Version), imports([git_tagged_revision_package(Version)])). git_repo("https://github.com/pgvector/pgvector"). :- end_package.

It’s really tiny! There are only two important things there: the Git tagged revision package and Git repo. There’s nothing else to describe the package.

The way this works is by inferring as much information as possible from what’s available. Because it’s specified as a Git-tagged revision package, it knows that it can download the list of version-shaped revisions — the versions — and it can checkout the code and do further inferences. It infers metadata from META.json if it’s available, so it will know the name of the package, the description, authors, license, and everything else included there.

David G. Johnston in chat: PG itself has install-check to verify that an installed instance is functioning. What are the conventions/methods that extension authors are using so that a deployed container can be tested at a low level of operation for the installed extensions prior to releasing the image to production?

It automatically infers the build system. For example for C extensions, if it sees that there’s a Makefile and C files, it infers that you need make and a C compiler and it tries to find those on the system: it will try to find cc, gcc, Clang — basically all kinds of things.

*David Wheeler (he/him)() in chat: Feel free to raise hands with questions

Here’s a slightly more involved example for pg_curl. Ah, there was a question from David Johnson. David says, “PG has install-check to verify that installed instance is functioning. What are the conventions methods that extension authors are using so the deployed container can be tested at a low level of operation for the installed extension prior to releasing the image to production?”

I guess the question is about general conventions for how extension authors ensure that the extensions work, but I suppose maybe part of this question is whether that’s also testable in a production environment. David, are you talking about the development environment alone or both?

David G. Johnston: Basically, the pre-release to production. You go in there in development and you cut up an extension and source and then you build your image where you compile it — you compile PG, you compile it, or you deploy packages. But now you have an image, but you’ve never actually tested that image. I can run installcheck on an installed instance of Postgres and know that it’s functioning, but it won’t test my extension. So if I install PostGIS, how do I test that it has been properly installed into my database prior to releasing that image into production?

Tobias Bussmann in chat: shouldn’t have the extension a make installcheck as well?

Yurri: To my knowledge there’s no absolutely universal method. Of course the PGXS methods are the most standard ones — like installcheck — to to run the tests. In our [Omnigres’s] case, we replaced pg_regress with pg_yregress, another tool that we’ve developed. It allows for more structural tests and tests that test certain things that pg_regress cannot test because of the way it operates.

David Wheeler (he/him) in chat: https://docs.omnigres.org/pg_yregress/intro/

I can share more about this later if that’s of interest to anybody. So we basically always run pg_yregress on our extensions; it creates a new instance of Postgres — unless told to use a pre-existing instance — and it runs all the tests there as a client. It basically deploys the the extension and runs the set of tests on it.

David G. Johnston: Okay.

Yurri: I guess you know it depends on how you ship it. For example, if you look at the pgrx camp, they have their own tooling for that, as well. I’ve also seen open-source extensions where they could be written in, say, Rust, but still using pg_regress tests to test their behavior. That would often depend on how their build system is integrated in those tests. I guess the really short answer is there’s probably no absolutely Universal method.

David thank you for pasting the link to pg_yregress. If there are ny questions about it, feel free to ask me. Any other thoughts or questions before I finish this slide? Alright will carry on then.

:- package(pg_curl(Version), imports(git_explicit_revision_package(Version))). :- inherit(requires/1). git_repo("https://github.com/RekGRpth/pg_curl"). git_revisions([ '502217c': '2.1.1', % ... older versions omitted for now ... ]). requires(when(D := external_dependency(libcurl), version::match(D, '^7'))). :- end_package.

The difference between this example and the previous one is that here it specifies that there will be an explicit revision map because that project does not happen to have version tags, so they have to be done manually. You can see that in the Git revision specification. But what’s more interesting about this is that it specifies what kind of dependency it needs. In this particular instance it’s libcurl, and the version has to match version 7 — any version 7.

These kinds of requirements, as well as compiler dependencies, make dependencies, and others are always solved by pluggable satisfiers. They look at what’s available depending on the platform — Linux, a particular flavor of Linux, macOS, etc — and picks the right tools to see what’s available. In the future there’s a plan to add features like building these dependencies automatically, but right now it depends on the host system, but in a multi-platform way.

David Wheeler (he/him) in chat: How does it detect that libcurl is required?

The general idea behind this proof of concept is that we want to specify high level requirements and not how exactly to satisfy them. If you compare this to a Docker file, the Docker file generally tells you exactly what to do step by step: let’s install this package and that package, let copy files, etc. so it becomes a very specific set of instructions.

Jeremy S in chat: And how does it handle something with different names in different places?

There was a question: “how does it detect that libcurl is required?” There there is this line at the bottom says “requires external dependency libcurl, so that was the definition.”

The other question was “how does it handle something with different names in different places?” I’m not sure I understand this question.

Jeremy S: I can be more spe specific. A dependency like libc is called libc on Debian platforms and it’s called glibc on Enterprise Linux. You talked about available satisfiers like Homebrew, Apt and package config, but what if it has a different name in Homebrew than in Apt or something like? Does it handle that or is that just something you haven’t tackled yet?

Yurri: It doesn’t tackle this right now, but it’s part of the division where it should go. For certain known libraries there’s an easy way to add a mapping that will kick in for a distribution, and otherwise it will be a satisfier for another one. They’re completely pluggable, small satisfiers looking at all the predicates that describe the system underneath.

David G. Johnston in chat: How is the upcoming move to meson in core influencing or impacting this?

Just for point of reference, this is built on top of Prolog, so it’s like a knowledge base and rules for how to apply on this knowledge to particular requirements.

Tobias Bussmann in chat: Prolog 👍

Shaun Thomas in chat: What if there are no satisfiers for the install? If something isn’t in your distro’s repo, how do you know where to find the dependency? And how is precedence handled? If two satisfiers will fulfill a requirement, will the highest version win?

Jeremy S: I remember Devrim talking about, if you read through the [RPM] spec files, what find is all this spaghetti code with #ifdefs and logic branches and in his case is just dealing with differences between Redhat and SUSE. If this is something that we manually put in, we kind of end up in a similar position where it’s on us to create those mappings, it’s on us to maintain those mappings over time — we kind of own it — versus being able to automate some kind of automatic resolution. I don’t know if there is a good automatic way to do it. David had found something that he posted, which I looked at a little bit, but Devrim talked about how much of maintenance overhead it becomes in the long run to constantly have to maintain this which seemed less than ideal.

Yurri: It is less than ideal. For now, I do think that would have to be manual, which is less than ideal. But it could be addressed at least on on a case-by-case basis. Because we don’t really have thousands of extensions yet — in the ecosystem maybe a thousand total — I think David Wheeler would would know best from his observations, and I think he mentioned some numbers in his presentation couple of weeks ago. But basically handling this on on a case-by-case basis where we need this dependency and apparently it’s a different one on a different platform, so let’s address that. But if there can be a method that can at least get us to a certain level of unambiguous resolution automatically or semi-automatically, that would be really great.

Samay Sharma in chat: +1 on the meson question.

Jeremy S: I think there’s a few more questions in the chat.

Yurri: I’m just looking at them now. “how is the upcoming move to meson and core influencing or impacting this?” I don’t think it’s influencing this particular part in any way that I can think of right now. David, do you have thoughts how it can? I would love to learn.

David G. Johnston: No, I literally just started up a new machine yesterday and decided to build it from meson instead of make and the syntax of the meson file seems similar to this. I just curious if there are any influences there or if it’s just happenstance.

Yurri: Well from from what I can think right now, there’s just general reliance on either implicitly found PG config or explicitly specified PG config. That’s just how you discover Postgres itself. There’s no relation to how Postgres itself was built. The packaging system does not handle say building Postgres itself or providing it so it’s external to this proof of concept.

David G. Johnston: That’s a good separation of concerns, but there’s also the idea that, if core is doing something, we’re going to build extensions against PostgresSQL, if we’re doing things similar to how core is doing them, there’s less of a learning curve and less of everyone doing their own thing and you have 500 different ways of doing testing.

Yurri: That’s a good point. That’s something definitely to reflect on.

I’ll move on to the next question from Sean. “What if there are no satisfiers for the install? If something isn’t in your distro how do you know where to find the dependency?” And “if two satisfiers will fulfill a requirement, will the highest version win?” If there are no satisfiers right now it will just say it’s not solvable. So we fail to do anything. You would have to go and figure that out. It is a proof of concept, it’s not meant to be absolutely feature complete but rather an exploration of how we can describe the the packages and their requirements.

David Wheeler (he/him): I assume the idea is that, as you come upon these you would add more satisfiers.

Yurri: Right, you basically just learn. We learn about this particular need in a particular extension and develop a satisfier for it. The same applies to precedence: it’s a question of further evolution. Right now it just finds whatever is available within the specified range.

If there are no more pressing questions I’ll move to the next slide. I was just mentioning the problem of highly specific recipes versus high-level requirements. Now I want to shift attention to another topic that has been coming up in different conversations: whether to build and ship your extension against minor versions of Postgres.

Different people have different stances in this, and even package managers take different stands on it. Some say, just build against the latest major version of Postgres and others say build extensions against every single minor version. I wanted to research and see what the real answer should be: should we build against minor versions or not?

I’ve done a little bit of experimentation and my answer is “perhaps”, and maybe even “test against different minor versions.” In my exploration of version 16 (and also 15 bu Id didn’t include it) there there are multiple changes between minor versions that can potentially be dangerous. One great example is when you have a new field inserted in the middle of a structure that is available through a header file. That definitely changes the layout of the structure.

typedef struct BTScanOpaqueData { - /* these fields are set by _bt_preprocess_keys(): */ + /* all fields (except arrayStarted) are set by _bt_preprocess_keys(): */ bool qual_ok; /* false if qual can never be satisfied */ + bool arrayStarted; /* Started array keys, but have yet to "reach + * past the end" of all arrays? */ int numberOfKeys /* number of preprocessed scan keys */ }

In this particular case, for example, will not get number of keys if you’re intending to. I think that change was from 16.0 to 16.1. If you build against 16.0 and then try to run on 16.1, it might not be great.

The other concern that I found is there are new apis appearing in header files between different versions. Some of them are implemented in header files, either as macros or static and line functions. When you’re building against that particular version, you’ll get the particular implementation embedded.

Others are exports of symbols, like in this case, try index open and contain mutable functions after planning, if you’re using any of this. But this means that these symbols are not available on some minor versions and they’re available later on, or vice versa: they may theoretically disappear.

There are also changes in inline behavior. There was a change between 16.0 and 16.1 or 16.2 where an algorithm was changed. Instead of just > 0 there’s now >= 0, and that means that particular behavior will be completely different between these implementations. This is important because it’s coming from a header file, not a source file, so you’re embedding this into your extension.

David Wheeler (he/him) in chat: That looks like a bug fix

Yeah it is a bug fix. But what I’m saying is, if you build your extension against say 16.0m which did not have this bug fix, and then you deploy it on 16.1, then you still have the bug because it’s coming from the header file.

*David Wheeler (he/him): Presumably they suggest that you build from the latest minor release and that’s Backward compatible to the earlier releases.

Yurri: Right and that’s a good middle ground for this particular case. But but of course sometimes when you do a minor upgrade you have to remember that you have to rebuild your extensions against that minor version so you can just easily transfer them yeah.

Jeremy S in chat: The struct change in a minor is very interesting

*David Wheeler (he/him)Jeremy points out that struct change is pretty interesting.

Yurri: Yeah, it’s interesting because it’s super dangerous! Like if somebody is expecting a different versioned structure, then it can be pretty nasty.

Shaun Thomas in chat: Yeah. It’s a huge no-no to insert components into the middle of a struct.

Jeremy S: Is that common? I’m really surprised to see that in a minor version. On the other hand, I don’t know that Postgres makes promises about — some of this seems to come down to, when you’re coding in C and you’re coding directly against structures in Postgres, that’s really interesting. That’s — I’m surprised to see that still.

Steven Miller in chat: In the case of trunk, we would have built against minor versions in the past then upgrade the minor version of postgres without reinstalling the binary source of the extension, so this is an issue

David G. Johnston in chat: Yeah, either that isn’t a public structure and someone is violating visibility (in which case yes, you should be tracking minor builds)

Shaun Thomas in chat: I’m extremely shocked that showed up in 16.2.

Yurri: Yeah, I didn’t expect that either, because that’s just a great way to have absolutely undefined behavior. Like if somebody forgot to rebuild their extension against a new minor, then this can be pretty terrible.

But my general answer to all of this unless you’re going really deep into the guts of Postgres, unless you’re doing something very deep in terms query planning, query execution, you’re probably okay? But who knows.

Jason Petersen in chat: yeah it feels like there’s no stated ABI guarantee across minor updates

Jason Petersen in chat: other than “maybe we assume people know not to do this"

David Christensen in chat: yeah ABI break in minor versions seems nasty

Jeremy S: But it’s not just remembering to rebuild your extension. Let’s let’s suppose somebody is just downloading their extensions from the PGDG repo, because there’s a bunch of them there. They’re not compiling anything! They’re they’re downloading an RPM and the extension might be in a different RPM from Postgres and the extension RPMs — I don’t know that there have been any cases with any of the extensions in PGDG, so far, where a particular extension RPM had to have compatibility information at the level of minors.

Shaun Thomas in chat: There was actually a huge uproar about this a couple year ago because they broke the replication ABI by doing this.

David G. Johnston in chat: I see many discussions about ABI stability on -hackers so it is a goal.

Steven Miller in chat: PGDG is the same binaries for each minor version because the postgres package is only major version, right?

Yurri: Yeah, that’s definitely a concern, especially when it comes to the scenario when you rebuild your extensions but just get pre-built packages. It’s starting to leak out of the scope of this presentation, but I thought it was a very interesting topic to bring to everybody’s attention.

Jason Petersen in chat: “it’s discussed on hackers” isn’t quite the same as “there’s a COMPATIBILITY file in the repo that states a guarantee”

Jason Petersen in chat: (sorry)

My last item. Going back to how we ship extensions and why do we need complex build systems and packaging. Oftentimes you want your extensions to depend on some library, say OpenSSL or SQLite or whatever, and the default is to bring the shared dependency that would come from different packages on different systems.

What we have found at Omnigres is that it is increasingly simpler to either statically link with your dependencies — and pay the price of larger libraries — but then you have no questions about where it comes from — what what package, which version – you know exactly what which version it is and how it’s getting built. But of course you also have a problem where, if you want to change the version of the dependency it’s harder because it’s statically linked. The question is whether you should be doing that or not, depending on the authors of the extension and their promises for compatibility with particular versions of their dependencies. This one is kind of naive and simple, as in just use static. Sometimes it’s not possible or very difficult to do so, some some libraries don’t have build systems amenable to static library production.

What we found that works pretty nicely is using rpath in your dynamic libraries. You can use special variables — $ORIGIN or @loader_path on Linux or macOS, respectively, to specify that your dependency is literally in the same folder or directory where your extension is. So you can ship your extension with the dependencies alongside, and it will not try to load them immediately from your system but from the same directory. We find this pretty pretty useful.

That’s pretty much it. Just to recap I talked about the multi-platform experience, the pros and cons of containers, inferencing how you build and how you can build extensions with dependencies, static and rpath dependencies, and the problems with PG minor version differences. If anybody has thoughts, questions, or comments I think that would be a great. Thank you.

Discussion

David Wheeler (he/him): Thank you, Yurri, already some good discussion. What else do you all have?

David G. Johnston: PG doesn’t use semantic versioning. They we have a major version and a minor version. The minor versions are new releases, they do change behaviors. There are goals from the hackers to not break things to the extent possible. But they don’t guarantee that this will not change between dot-three and dot-four. When you’re releasing once a year that’s not practical if things are broken, you can’t wait nine months to fix something. Some things you need to fix them in the next update and back-patch.

Steven Miller in chat: Thank you, this is very useful info

Jeremy S in chat: Dependency management is hard 🙂 it’s been a topic here for awhile

David G. Johnston: So we don’t have a compatibility file, but we do have goals and if they get broken there’s either a reason for it or someone just missed it. From an extension standpoint, if you want to be absolutely safe but absolutely cost intensive, you want to update every minor release: compile, test, etc. Depending on what your extension is, you can trade off some of that risk for cost savings. That’s just going to be a personal call. The systems that we build should make it easy enough to do releases every “dot” and back-patching. Then the real cost is do you spend the time testing and coding against it to make sure that the stuff works. So our tool should assume releasing extensions on every minor release, not every major release, because that’s the ideal.

Shaun Thomas in chat: It’s good we’re doing all of this though. It would suck to do so much work and just become another pip spaghetti.

Yurri: That’s exactly what I wanted to bring to everybody’s attention, because there’s still a lot of conversations about this and there was not enough clarity. So that helps a lot.

Jeremy S: Did you say release or did you say build with every Miner? I think I would use the word “build”.

David G. Johnston: Every minor release, the ones that go out to the public. I mean every commit you could update your extension if you wanted. but really the ones that matter are the ones that go public. So, 16.3 or 16.4 comes out, automation would ideally would build your extension against it run your test and see if anything broke. And then deploy the new [???] of your extension against version 16.3. Plus that would be your your release.

Jeremy S: I think there are two things there: There’s rebuilding it — because you can rebuild the same version of the extension and that would pick up if they they added a field in the middle of a struct which is what happened between 16.0 and 16.1, rebuild the same version. Versus: the extension author … what would they be doing? If they they could tag a new version but they’re not actually changing any code I don’t think it is a new release of the extension, because you’re not even changing anything in the extension, you’re just running a new build. It’s just a rebuild.

David Wheeler (he/him) in chat: It’d be a new binary release of the same version. In RPM it goes from v1.0.1-1 to v1.0.1-2

It reminds me of what Alvaro did in his his OCI blog post, where he said you really have to … Many of us don’t understand how tightly coupled the extensions need to be to the database. And these C extensions that we’re we’re building have risks when we separate them don’t just build everything together.

David G. Johnston: The change there would be metadata. Version four of my extension, I know it works on 16.0 to 16.1. 16.2 broke it, so that’s where it ends and my version 4.1 is known to work on 16.2.

Jeremy S: But there is no difference between version 4 and version 4.1. There’s a difference in the build artifact that your build farm spit out, but there’s no difference in the extension, right?

Keith Fiske in chat: Still confusing if you don’t bump the release version even with only a library change

Keith Fiske in chat: How are people supposed to know what library version is running?

David G. Johnston: Right. If the extension still works, then` your metadata would just say, “not only do I work through version 16.2, I now work through 16.3.

Jeremy S: But it goes back to the question: is the version referring to a build artifact, or is the version referring to a version of the code? I typically think of versions as a user of something: a version is the thing. It would be the code of the extension. Now we’re getting all meta; I guess there are arguments to be made both ways on that.

Jason Petersen in chat: (it’s system-specific)

Jason Petersen in chat: no one talks in full version numbers, look at an actual debian apt-cache output

David Wheeler (he/him): Other questions? Anybody familiar with the rpath stuff? That seems pretty interesting to me as a potential solution for bundling all the parts of an extension in a single directory — as opposed to what we have now, where it’s scattered around four different directories.

Jason Petersen: I’ve played around with this. I think I was trying to do fault injection, but it was some dynamically loaded library at a different point on the rpath. I’m kind of familiar with the mechanics of it.

I just wanted to ask: In a bigger picture, this talks about building extensions that sort of work everywhere. But the problems being solved are just the duplication across the spec files, the Debian files, etc. You still have to build a different artifact for even the same extension on the same version of Postgres on two different versions of Ubuntu, Right? Am I missing something? It is not an extension that runs everywhere.

Yurri: No, you still have to build against the set of attributes that constitute your target, whether that’s architecture, operating system, flavor. It’s not yet something you can build and just have one binary. I would love to have that, actually! I’ve been pondering a lot about this. There’s an interesting project, not really related to plugins, but if you’ve seen A.P.E. and Cosmopolitan libc, they do portable executables. It’s a very interesting hack that allows you to run binaries on any operating system.

Jason Petersen: I expected that to be kind of “pie in the sky.”

Yurri: It’s more of a work of art.

Jason Petersen: Do you know of other prior art for the rpath? Someone on Mastodon the other day was talking about Ruby — I can’t remember the library, maybe it was ssh — and they were asking, “Do I still have to install this dynamic library?” And they said, “No, we vendor that now; whenever you install this it gets installed within the Ruby structure.” I’m not sure what they’re doing; maybe it’s just a static linking. But I was curious if you were aware of any prior art or other packaging systems where system manages its own dynamic libraries, and use rpath to override the loading of them so we don’t use the system ones and don’t have to conflict with them. Because I think that’s a really good idea! I just was wondering if there’s any sort of prior art.

Daniele: There is an example: Python Wheels binaries us rpath. A wheel is a ZIP file with the C extension and all the depending libraries the with the path modified so that they can refer to each other in the the environment where they’re bundled. There is a tool chain to obtain this packaging — this vendoring — of the system libraries. There are three, actually: one for Unix, one for macOS, one for Windows. But they all more or less achieve the same goal of having libraries where they can find each other in the same directory or in a known directory. So you could take a look at the wheel specification for Python and the implementation. That could be a guideline.

Jason Petersen: Cool.

Yurri: That’s an excellent reference, thank you.

David Wheeler (he/him): More questions?

Jeremy S: Yeah, I have one more. Yurri, the build inferencing was really interesting. A couple things stood out to me. One that you mentioned was that you look for The META.json file. That’s kind of neat, just that it’s acknowledged a useful thing; and a lot of extensions have it and we want to make use of it. I think everybody knows part of the background of this whole series of meetings is — one of the things we’re asking is, how can we improve what’s the next generation of META.json to make all of this better? Maybe I missed this, but what was your high-level takeaway from that whole experience of trying to infer the stuff that wasn’t there, or infer enough information to build something if there isn’t a META.json at all? Do you feel like it worked, that it was successful? That it was an interesting experiment but not really viable long term? How many different extensions did you try and did it work for? Once you put it together, were you ever able to point it at a brand new extension you’d never seen before and actually have it work? Or was it still where you’d try a new extension and have to add a little bit of extra logic to handle that new extension? What’s your takeaway from that experience?

Yurri: The building part is largely unrelated to META.json, that was just primarily the metadata itself. I haven’t used in a lot of extensions because I was looking for different cases — extensions that exhibit slightly different patterns — not a whole ton of them yet. I would say that, so far, this is more of a case-by-case scenario to see for a particular type of or shape of extension what we need to do. But generally, what I found so far that it works pretty nicely for C extensions: it just picks up where all the stuff is, downloads all the necessary versions, allows to discover the new versions — for example you don’t need to update the specification for a package if you have a new release, it will just automatically pick that up rom the list of tags. These these were the current findings. I think overall the direction is promising, just need to continue adjusting the results and see how much further it can be taken and how much more benefit it can bring.

Jeremy S: Thank you.

Yurri: Any other comments or thoughts?

David Wheeler (he/him): Any more questions for Yurri?

David Wheeler (he/him): I think this is a an interesting space for some research between Devrim’s presentation talking about how much effort it is to manually maintain all the extensions in the Yum repository. I’ve been doing some experiments trying to build everything from PGXN, and the success rate is much lower than I’d like. I think there are some interesting challenges to automatically figuring out how things work versus convincing authors to specify in advance.

Jeremy S: Yep. Or taking on that maintenance. Kind of like what a spec file maintainer or a Debian package maintainer is doing.

Yurri: Yeah, precisely.

Wrap Up

David Wheeler (he/him): Thanks, Yurri, for that. I wanted to remind everyone that we have our final Mini-Summit before PGConf on May 15th. That’s two weeks from today at noon Eastern or 4 pm UTC. We’re going to talk about organizing the topics for the Summit itself. I posted a long list of stuff that I’ve extracted from my own brain and lots more topics that I’ve learned in these presentations in the Slack. Please join the community Slack to participate.

The idea is to winnow down the list to a reasonable size. We already are full with about 45 attendees, and we we can maybe have a few more with standing room and some hallway track stuff. We’ll figure that out, but it’s a pretty good size, so I think we’ll be able to take on a good six or maybe eight topics. I’m going to go over them all and we’ll talk about them and try to make some decisions in advance, so when we get there we don’t have to spend the first hour figuring out what we want to, we can just dive in.

And that’s it. Thank you everybody for coming, I really appreciate. We’ll see you next time

Tobias Bussmann in chat: Thanks for the insights and discussion!

Jeremy S: Thank you!

More about… Postgres Yurri Rashkovskii PGXN Extensions PGConf Summit

The Pragmatic Engineer

Getting an Engineering Executive Job

An overview of successful, tried-and-true routes into CTO, VPE, and Head of Engineering jobs, from the new book, ‘The Engineering Executive’s Primer’ by Will Larson.

So, just how do you win that first, coveted CTO, VP of Engineering, Head of Engineering, or Director of Engineering role? I know several professionals who have achieved such promotions internally, and a couple who were successful external candidates for their first CTO role.

These jobs are usually classed as executive roles, and author and CTO, Will Larson, has just published a handy book about these upper echelons. Before this latest title, he wrote a book on engineering management (An Elegant Puzzle,) and one on staff+ engineering (Staff Engineer). Side note; we almost overlapped at Uber, with him departing the San Francisco office just as I joined the company in Amsterdam. We met in-person last year when I was in SF.

I picked up this new book, The Engineering Executive’s Primer, and can report that I’m thoroughly impressed. It’s honest in how it covers topics like navigating tricky power dynamics with CEOs, surviving “peer panic,” dealing with competition within leadership teams, and in its overall breadth and depth. There’s the usual topics: onboarding, hiring, engineering processes, performance processes, compensation, etc, and the book also goes deep into how to navigate mergers and acquisitions, cultural surveys, and how to onboard peer executives.

My copy of The Engineering Executive’s Primer

I reached out to Will to ask if he’d be open to publishing a chapter from the new book in this newsletter, and he generously agreed. In today’s issue, we cover two topics:

The book’s and Will’s backgrounds. How did the idea of the book come about, how long did it take to write, and what is Will’s advice for software engineers aiming to be engineering executives.

Chapter 1. The opening chapter is titled “Getting the Job.” It’s a thorough summary of how to get that first engineering executive position; more so than I’ve read elsewhere. An excerpt from the book.

As usual with all my recommendations, I am not paid to recommend this book, and none of the links are affiliate ones. See my ethics statement for more detail.

1. The book’s and Will’s backgrounds

Before we jump into the chapter, Will answered a few questions:

How did you get your first engineering executive job?

‘In late 2019, I was getting close to four years at Stripe, and understood the problems I was working on a bit too well. I started thinking about how to get my hands on a new set of problems. In particular, I was interested in finding ones to bring me back into product engineering, rather than continuing along the infrastructure engineering path, which I got on to somewhat accidentally at Uber.

‘Most inbound interest I received was for roles similar to the one I was in. So I asked some recently-hired executives how they’d found their jobs. Most mentioned working with executive recruiters, and I asked for referrals. This led me to Sam, the executive recruiter who helped find my CTO role at Calm, which I accepted following deep discussions with 3-4 other companies.’

What made you decide to write this book on executive roles, and can you discuss your choice to be so candid about the behind-the-scenes reality?

‘This is my third book, and I learned a lot from writing ‘An Elegant Puzzle’ and ‘Staff Engineer.’ There’s a curse in authorship, where I see so many things I could have done better in writing those books! Still, writing two intentionally different books helped me understand what I want to accomplish with my writing.

‘My goal when writing is to create something both useful and unique. Being useful is the only good reason to write a professional book, while being unique is the only way to create something durable that resonates with readers over time, and doesn’t fade away after a year or two.

‘When I first became an executive, I learned a lot by making mistakes. I wanted to scrutinize those errors, figure out frameworks that would’ve helped me avoid them, and collate these insights in a book to help others entering that role. Helping others avoid my mistakes is my best tool for advancing the technology industry, which is the professional goal I’m trying to channel my energy toward.’

How did you write the book, and how long did it take? I guess you started taking notes a long time ago, with so many observations across 24 chapters and more than 300 pages.

‘I have two concurrent writing processes:

I write down things I learn, as I learn them. This is the closest I have to a separate note-taking process. For example, I wrote the blog posts Hard to work with and Reading a Profit & Loss statement in 2022 as standalone pieces, and updated versions of each show up in The Engineering Executive’s Primer. 

I write based upon an outline of topics for a book. The chapter, Measuring an engineering organization, is a good example, and was the first piece I wrote explicitly with the new book in mind.

‘Writing this book took about 14 months. I started working on the outline in late 2022, and iterated on it three or four times, before it stabilized roughly into these chapters. I wrote about two-thirds of chapters before I started talking with the publisher, O’Reilly, about whether they’d be interested in it. 

‘After signing the contract, I reworked the topics a bit with my editor, Virginia Wilson, completed the remaining chapters, and revised individual chapters several times based on feedback from her and reviewers. Writing books really does get easier the more you do it, and this book was relatively peaceful compared to my first two.’

You’ve been a software engineer, and are now a CTO. What are the biggest differences between these roles? 

‘A friend recently emailed me asking if it’s possible to have friends at work as an executive, because he was finding that harder and harder, the more senior he’d gotten. In general, I’ve found it’s complex to have work friends as an executive, and that it only works when both individuals take the long view on outcomes. I miss my earlier roles where these things were less fraught.

‘Conversely, I’m really motivated by being able to engage with problems blocking forward progress, and being in an effective executive team is a job characterized by exclusively dealing with that kind of problem. For me, engaging with problems is the most energizing part of being an executive. At the same time, in my earlier work as an engineer I often found myself stymied by organizational constraints. To be fair, I was also a lot less effective at navigating organizational constraints back then.’

What’s your advice for software engineers ambitious to be a Head of Engineering, VP of Engineering, or CTO?

‘Spend more time understanding why people’s perspectives are “right,” rather than why they’re “wrong.” I think of this as extracting the core of what people say. 

‘Being able to learn from those who don’t communicate clearly is a super power. Many who are theoretically good communicators – maybe the executives you work with – are too busy to always communicate clearly, and getting good at understanding them despite messy formatting is invaluable.

‘Spend less time on pursuits you don’t find energizing. For example, I’ve seen so many people try to become “content creators” to further their career, despite having no interest in creating content. The vast majority of successful executives don’t write stuff online and don’t speak at conferences. Do so if it’s energizing for you, but if it isn’t, find something that is!’

Thanks Will for sharing your thoughts! With that, we dive into Chapter 1 of the book.

2. Getting an Engineering Executive Job

The below excerpt is from The Engineering Executive's Primer, by Will Larson. Copyright © 2024 Will Larson. Published by O'Reilly Media, Inc. Used with permission.

At Digg, I ended up running Engineering, but I certainly wasn’t hired to do so. It wasn’t until a decade later, when I joined Calm, that a company deliberately hired me into my first executive role. If you start researching executive career paths, you’ll find folks who nominally became Engineering executives at 21 when they found a company, and others who were more than 30 years into their career before taking an Engineering executive role.

As these anecdotes suggest, there is no “one way” to get an Engineering executive job. However, the more stories you hear about folks assuming executive roles, the more they start to sound pretty similar. I’ve condensed the many stories I’ve heard, along with my own experiences, into a repeatable process that prospective candidates typically follow.

This chapter will cover:

deciding whether to pursue an executive role

why each executive job search is unique, and how that will shape your process

finding executive roles externally and internally

navigating the often chaotic executive interview process after you’ve gotten comfortable interviewing in well-designed middle management interview processes

negotiating an executive contract, particularly the terms that rarely come up in the non-executive contracts you may have negotiated prior

deciding whether to accept an executive offer once you have it

If you’re kicking off the search for your first executive role, reading through this chapter will provide a clear roadmap through the process.

Why Pursue an Executive Role?

If you’re spinning up your first executive role search, you should have a clear answer to a question you’ll get a number of times, “Why are you looking for an executive role?” It’s important to answer this for yourself, as it will be a valuable guide throughout your search. If you’re not sure what the answer is, spend time thinking this through until you have a clear answer (maybe in the context of a career checkup).

There’s no right answer, but here are some examples from real people:

“I’m heavily motivated by learning. I’ve directly reported into an Engineering executive for my past two roles, and I’m looking to step into the role myself.”

“I’ve enjoyed working in a fast-growing company, but I also miss the direct ownership and pace of working at a small company. I’m excited to combine my previous startup experience with my recent experience at scale as an Engineering executive.”

The rationale doesn’t need to be particularly compelling, just something positive that expresses your excitement and qualification for the role. Don’t get discouraged if your statement isn’t profound—there are very few profound ways to say that it’s the next logical step in your career. Once you’ve written your rationale down, review it with a few peers or mentors who have already been in executive roles. Incorporate their feedback, and you’re done. (If you don’t have peers or mentors in executive roles, do some cold outreach to executives at companies you’ve worked at with your rationale and see if they’ll weigh in.)

The other side of this is that interviewers are also very curious about your reason for pursuing an executive role, but not necessarily for the reason you’d expect. Rather than looking for your unique story (although, yes, they’ll certainly love a memorable, unique story), they’re trying to filter out candidates with red flags: ego, jealousy, excessive status-orientation, and ambivalence.

One of One

Limited-release luxury items like fancy cars sometimes label each item with their specific production number, along with the size of the overall run. For example, you might get the fifth car in a run of 20 cars overall. The most exclusive possible production run is “one of one.” That item is truly bespoke, custom, and one of a kind.

All executive roles and processes are “one of one.”

For non-executive roles, good interviewing processes are systematized, consistent, and structured. Sometimes the interview processes for executive roles are well-structured, but more often they aren’t. If you approach these bespoke processes like your previous experiences interviewing, your instincts may mislead you through the process.

The most important thing to remember when searching for an executive role is that while there are guidelines, stories, and even statistics, there are no rules when it comes to finding executive jobs. There is a selection bias in executive hiring for confidence, which makes it relatively easy to find an executive who will tell you with complete confidence how things work but be a bit wary.

It’s not just the hiring process that is not standardized; the Engineering executive roles themselves vary greatly as well. Sometimes they’ll include managing Product Management, and sometimes they’ll exclude managing some parts of Engineering. Working with technology-oriented founders, you may provide more organizational support than technical guidance, whereas working in an older business may mean there are few other executives with a technology background. “One of one” means that anything is possible, in both the best and worst possible sense.

Finding Internal Executive Roles

Relatively few folks find their first executive job through an internal promotion. These are rare for a couple reasons. The first is that each company only has one Engineering executive, and that role is usually already filled. The second is that companies seeking a new Engineering executive generally need someone with a significantly different set of skills than the team they already have in place.

Even in cases where folks do take on an executive role at their current company, they often struggle to succeed. Their challenges mirror those of taking on tech lead manager roles, where they are stuck learning how to do their new job in addition to performing their previous role. They are often also dealing with other internal candidates who were previously their peers and who may feel slighted by not getting the role themselves. This makes their new job even more challenging, and can lead to departures that hollow out the organization’s key leaders at a particularly challenging time.

That’s not to say that you should avoid or decline an internal promotion into an executive engineering role; just that you should go into it with your eyes open. In many ways, it’s harder to transition internally than externally. Because of that, even if an internal transition into an executive role goes poorly for you, don’t assume that means you wouldn’t do well as a newly hired executive at another company.

Finding External Executive Roles

Most executive roles are never posted on the company’s jobs page. So before discussing how you should approach your executive job search, let’s dig into how companies usually find candidates for their executive roles. Let’s imagine that my defunct company Monocle Studios had been a wild success and we wanted to hire our first CTO.

How would we find candidates? Something along the lines of:

Consider any internal candidates for the role.

Reach out to the best folks in my existing network, seeing if any are interested in interviewing for the role.

Ask our internal executive recruiter to source candidates. (I’d skip this step if we didn’t have any executive recruiters internally, as generally there’s a different network and approach to running an executive search than a non-executive search; executive candidates also tend to ask different questions than non-executive candidates, which makes hiring them with non-executive recruiters even messier.)

Reach out to our existing investors for their help, relying on both their networks and their firms’ recruiting teams.

Hire an executive recruiting firm to take over the search.

Certainly not every company does every job search this way, but it does seem to be the consistent norm. This structure exposes why it’s difficult to answer the question, “How do I find my first executive role?” The quick answer is to connect with an executive recruiter—ideally one that peers have worked with before—but that approach comes with some implications on the sort of roles you’ll get exposed to. Typically, these will be roles that have been challenging to fill for some reason.

It’s important to note that the most desirable roles, and roles being hired by a well-networked and well-respected CEO, will never reach an executive recruiting firm. If you try to enter your search without an established network and rely solely on executive recruiters to find roles, you are almost certain to be selecting from second-tier opportunities.

This is, by the way, absolutely not a recommendation against using executive recruiters. Executive recruiting firms can be fantastic. A good executive recruiter will coach you through the process much more diligently than the typical company or investor’s in-house recruiter. I found my first executive role through an executive recruiter, as did the majority of my peers. (Note that the executive recruiters of tomorrow are your internal recruiting colleagues of today, so learning to partner effectively with Recruiting will pay dividends in both your current hiring and your long-term career options.) Similarly, it’s not true that all founder-led searches are for desirable jobs—almost all executive roles start as founder-led searches before working their way through the pipeline.

​​Looking at the pipeline, there are many ways to increase your odds of getting executive opportunities at each step. The basics still matter: Maintain an updated LinkedIn profile, and respond politely to recruiters who do reach out. Both have a surprising way of creating job search serendipity, and ensuring your network is aware that you’re looking. If you don’t personally know many recruiters at investors or executive recruiters, your network can be particularly helpful for making those introductions.

There are also a small number of companies that do post executive roles publicly, and there’s certainly no harm in looking through those as well. The one challenge is that you’ll have to figure out whether it’s posted publicly because the company is very principled about searching for talent outside their personal networks (often a good sign), or if the role has already passed unsuccessfully through the entire funnel described above (often not a good sign). Most companies with strong principles like to talk about them a lot, and you should be able to find public evidence to support their posting coming from a principled belief. If you can’t, then it’s likely desperation.

Finally, if you’re laying the groundwork for an executive search a few years down the road, there’s quite a bit you can do to prepare. You can join a large or high-growth company to expand your network (more on this in Chapter 12), work in a role where you get exposure to the company’s investors, create more visibility of your work (more on this in Chapter 12 and Chapter 15) to make it more likely for founders to reach out to you, or get more relevant experience growing and operating an Engineering organization.

Interview Process

The interview process for executive roles is always a bit chaotic. The most surprising thing for most candidates is that the process often feels less focused or effective than their other recent interviews. This is because your hiring manager as a director of Engineering is usually an experienced engineering leader, but your hiring manager as an Engineering executive is usually someone with no engineering experience at all. In the first case, you’re being interviewed by someone who understands your job quite well, and in the second, the interviewer usually has never worked in the role.

There are, inevitably, exceptions! Sometimes your interviewer was an Engineering executive at an earlier point in their career, but that usually isn’t the case. A relatively common scenario in startups is when a technical founder interviews you for the role, potentially with them staying as the CTO and you taking on the VPE title. But, even then, it’s worth noting that the title is a bit of a smokescreen, and they likely have limited experience as an Engineering executive.

Consequently, Engineering executive interviews depend more heavily on perceived fit, prestige, the size of the teams you’ve previously managed, being personable, and navigating the specific, concrete concerns of would-be direct reports and peers. This makes the “little things” particularly important in executive interviews: send quick and polite follow-ups, use something like the STAR method to keep your answers concise and organized, prepare questions that show you’re strengthening your mental model of how the company works, and generally show energy and excitement.

The general interview process that I’ve seen for executive roles is as follows:

Call with a recruiter to validate you meet the minimum requirements, are a decent communicator, and won’t embarrass them if you talk to the CEO. Recruiters are heavily scrutinized on the quality of candidates they bring forward and will go out of their way to help you show up well. This is also a good opportunity for you to understand whether there are obvious issues that might make this a bad role for you, such as wrong job location, wrong travel expectations, and so forth.

Call with the CEO or another executive to assess interest in the role, and very high-level potential fit for the role. You’ll be evaluated primarily on your background, your preparation for the discussion, the quality of your communication, and perceived excitement for the company.

Series of discussions with the CEO or founder, where you dig into the business and their priorities for the role. This will be a mix of you learning from them, them learning about you, and getting a mutual sense of whether you’ll work well together. The exact structure will vary depending on the CEO or founder, and it will give you an understanding of what kind of person they are to work with.

One-on-one discussions with a wide smattering of peer executives and members of the team that you would manage. These vary widely across companies, and it is surprisingly common for the interviews to be poorly coordinated—for example, the same topics may come up multiple times across different interviewers. This is somewhat frustrating. Generally, it means the company is missing someone with the right position, experience, and energy to invest into designing the loop. I’ve had these interviews turn into general chats, programming screens, architecture interviews, and anything else you can imagine. All I can say is: Roll with it to the best of your ability.

Presentation interview to the executive team, your directors, or a mix of both. Usually, you’ll be asked to run a 60-minute presentation describing your background, a point of view on what’s important for the business moving forward, your understanding of what you would focus on in the new role if hired, and your plan for your first 90 days.

Here are a few tips that I’ve found effective for these interviews:

Ask an interviewer for feedback on your presentation before the session.

Ask what other candidates have done that was particularly well received.

Make sure to follow the prompt directly.

Prioritize where you want to spend time in the presentation (on the highest-impact topics).

Make sure to leave time for questions (while also having enough bonus content to fill the time if there aren’t many).

If this sounds surprisingly vague and a bit random, then you’ve read it correctly. Gone are the days of cramming in all the right answers. Now, it’s a matter of reading each individual effectively and pairing the right response to their perspective. If that feels arbitrary, keep in mind that navigating various perspectives will be a significant part of your role as an executive going forward!

Negotiating the Contract

Once a company decides to make you an offer, you enter into the negotiation phase. While the general rules of negotiation still apply—particularly, don’t start negotiating until the company knows it wants to hire you—this is a moment when it’s important to remember that these are one of one jobs. Compensation consultants and investors will have recommended pay ranges, but each company only hires one Engineering executive at a time, and every company is unique.

Fair pay will vary greatly depending on the company, the size of its business, your location, and your own background. Your best bet will be reaching out to peers in similar roles to understand their compensation. I’ve found folks to be surprisingly willing to share compensation details. It’s also helpful to read DEF 14A filings for public companies, which explain their top executives’ base, bonus, and equity compensation (for example, here is Splunk’s DEF 14A from 2022).

There are a few aspects of this negotiation that are sufficiently different from earlier compensation negotiations:

Equity

Equity is issued in many formats: stock options, Restricted Stock Units, and so on. Equity is also issued with many conditions: vesting periods (often 4 years), vesting cliffs before vesting accrues (often 1 year), and the duration of the period after you depart when you’re able to exercise options before they expire (often 90 days).

Most of these terms are negotiable in an executive offer, but it all comes down to the particular company you’re speaking with. You may be able to negotiate away your vesting cliff, and immediately start vesting monthly rather than waiting a year; or negotiate an extended post-departure exercise window, even if that isn’t an option more widely; or have the company issue you a loan to cover your exercise costs, which combined with early exercise might allow you to exercise for “free” except for the very real tax consequences.

To determine your negotiation strategy, I highly recommend consulting with a tax advisor, as the “best” option will depend on your particular circumstances.

Equity acceleration

Equity acceleration is another negotiation point around equity. This is worth calling out as it’s common in executive offers, and extremely uncommon in other cases. Acceleration allows you to vest equity immediately if certain conditions are met. Many consider this a standard condition for a startup contract, although there are many executives who don’t have an acceleration clause.

One topic that gets perhaps undue attention is the distinction between single and double trigger acceleration. “Single trigger” acceleration has only one condition to be met (for example, your company is acquired), whereas “double trigger” acceleration will specify two conditions (for example, your company is acquired and you lose your job). My sense is that people like to talk about single and double triggers because it makes them sound knowledgeable about the topic rather than it being a particularly nuanced aspect of the discussion.

Severance packages

Severance packages can be negotiated, guaranteeing compensation after you exit the role. There is little consistency on this topic. Agreements range from executives at very small companies that have pre-negotiated a few months’ salary as a severance package, to executives leaving highly compensated roles that require their new company to make them whole on the compensation they’re leaving behind. There are also many executive contracts that don’t pre-negotiate severance at all, leaving the negotiation until the departure (when you admittedly have limited leverage).

Bonus

Bonus size and calculation can be negotiated. On average, bonus tends to be a larger component of roles outside of engineering, such as a sales executive, but like everything, this is company- and size-specific. A CTO at a public company might have their bonus be equal in size to their salary. A CTO at a Series C company might have a 20% bonus. A CTO at a 50-person company might have no bonus at all.

In addition to the size of your bonus, you may be able to negotiate the conditions for earning it. This won’t matter with companies that rely on a shared bonus goal for all executives (sometimes excluding sales), but may matter a great deal with other companies that employ bespoke, per-executive goals instead.

Parental leave

Parental leave can be negotiated. For example, some companies might only offer paid parental leave after a year of service, but you can absolutely negotiate to include that after a shorter amount of service. (It’s worth noting that this is often negotiable in less senior roles, as well.)

Start date

Start date is generally quite easy to negotiate in less senior roles but can be unexpectedly messy for executive roles. The reason it gets messy is that the hiring company often has an urgent need for the role to be filled, while also wanting to see a great deal of excitement from the candidate about joining.

The quiet part is that many recruiters and companies have seen executive candidates accept but later not join due to an opposing offer being sweetened, which makes them uncomfortable delaying, particularly for candidates who have been negotiating with other companies, including their current one.

Support

Support to perform your role successfully is another point that can be negotiated. The typical example of vain requests for support are guaranteed business- or first-class seats on business travel, but there are other dimensions of support that will genuinely impact your ability to perform your role. For example, negotiating for an executive assistant can free up hours every week for focus work, and negotiating a sufficient budget to staff your team can easily be the difference between a great and terrible first year.

The negotiation phase is the right time to ask for whatever you’ll need to succeed in the role. You’ll never have an easier time to ensure you and your organization can succeed.

Negotiate knowing that anything is possible but remember that you have to work with the people you’re negotiating with after the negotiation ends. If you push too many times, you won’t be the first candidate to have their offer pulled because the offering company has lost confidence that you really want to be there.

Deciding to Take the Job

Once you get an offer for an executive position, it can be remarkably hard to say no. The recruiters you’re working with will push you to accept. The company you’re speaking with will push you to accept. You’ll have invested a great deal of work into the process, and that will bias you toward wanting to accept as well.

It’s also challenging to evaluate an executive offer, because ultimately you’re doing two very difficult things. First, you’re trying to predict the company’s future trajectory, which is hard even for venture capitalists who do it a lot (and they’re solving for an easier problem as they get to make many concurrent investments, and you can only have one job at a time). Second, you’re trying to make a decision that balances all of your needs, which a surprising number of folks get wrong (including taking prestigious or high-paying jobs that they know they’re going to hate, but just can’t say no to).

I can’t really tell you whether to accept your offer, but there are a few steps that I would push you to take before finalizing your decision:

Spend enough time with the CEO to be sure you’ll enjoy working with them, and that you’ll trust them to lead the company. While it changes a bit as companies scale, and particularly as they go public, the CEO is the person who will be deciding company direction, determining the members of the executive team, and taking responsibility to resolve the trickiest decisions.

Speak to at least one member of their board. Admittedly, board members won’t directly tell you anything too spicy, but their willingness to meet with you is an important signal, and it’s the best opportunity to start building your relationship with the board.

Make sure you’ve spoken with every member of the executive team that you’d work with regularly. Sometimes you’ll miss someone in your interview process due to scheduling issues, and it’s important to chat with everyone and make sure they’re folks you can build an effective working relationship with.

Make sure they’ve actually answered your questions. I once interviewed to be a company’s head of Engineering, and they refused to share their current valuation with me! I pushed a few times, but ultimately they told me it was unreasonable to ask, and I decided I couldn’t move forward with a company that wouldn’t even share their valuation with an executive candidate.

Don’t assume they’ll disclose this information after you join the company if they won’t tell you when trying to convince you to accept their offer. You will never have more leverage to get questions answered than during the hiring process: If it’s important and they won’t answer, be willing to walk away.

If the company has recently had executives depart, see if you can get an understanding for why. This could be learned through mutual friends with the departed executive, or even chatting with them directly. Sometimes you’ll even have executives who interviewed you depart before, or shortly after, you join. You should absolutely reach out to them and understand the reasons for their departure.

As you work through these steps, ask yourself: Are you still excited? Have you explained your thinking about the role to at least two friends (who didn’t raise any concerns)? If the answer to these questions is yes, then take the job!

Not Getting the Job

You can’t talk about running an Engineering executive search without talking about not getting the job. Who doesn’t have a story of getting contacted by a recruiter who then ghosts them after an initial screen? A public company recently invited a friend of mine to interview in their CTO search. They got my friend very excited, and then notified them the next week that they had already tentatively filled the role. I’ve had first discussions with CEOs where we both immediately knew we wouldn’t be a good fit to work together. I’ve discussed roles where both I and the CEO wanted to move forward, but where I lacked a specific skill they felt was required to succeed (for example, deep experience in machine learning).

Although rejection isn’t fun, the perspective that I find helpful is: The goal of your search is not to find an executive job, but rather to find an executive job where you will thrive. It’s much better to realize a job isn’t the right fit for you before taking it, and each opportunity that doesn’t move forward is for the best.

Gergely again. I hope you have enjoyed this in-depth look into how to get a coveted engineering executive role, via The Engineering Executive’s Primer.

Check out the full book

Many things Will discusses above are open secrets among engineering leaders; like that each executive role and hiring process is unique, and CTO roles at different companies often have vastly different processes, expectations, and compensation ranges!

Nearer the heights, compensation also becomes more negotiable; not only equity and salary, but also equity acceleration. That’s why, when negotiating a compensation package in a new country, it’s sensible to invest in tax advice about the equity component from a local accountant or equity expert. I recently spoke with an engineering director in San Francisco who was offered a CTO role with generous equity in Germany; but equity taxation is quite different there and they were researching it, first.

The rest of this book keeps up the quality, with lots of behind-the-scenes insights. If you’ve gotten something from this excerpt, you’ll likely get even more from the rest. So, if you’re an engineering executive, or want to become one, I reckon The Engineering Executive’s Primer is a very useful volume, as the author hoped for. 

This week, there won’t be a new issue of The Pulse on Thursday, as I’m on spring break. As of next Tuesday onward, things are back to the usual schedule. Thanks for your support!


Doc Searls Weblog

Burning to Write. And Vice Versa.

Among all artists, writers alone suffer the illusion that the world needs to hear what they have to say. I thought that line, or something like it, came from Rollo May, probably in The Courage to Create. But a search within that book says no. ChatGPT and Gemini both tell me May didn’t say it […]
I shot this photo in July 2007, while sitting with friends in that same booth at the Eagle and Child in Oxford.

Among all artists, writers alone suffer the illusion that the world needs to hear what they have to say.

I thought that line, or something like it, came from Rollo May, probably in The Courage to Create. But a search within that book says no. ChatGPT and Gemini both tell me May didn’t say it anywhere.

Still, I think it’s true that writers write because they have to. They can’t not write. It’s what they do, how they live. And I’m one of them.

The need to write is for me a vivid fact this morning, a few days past a catheter ablation on my heart. There is so much I need to say, because I suffer the illusion that the world needs to hear it. Does it?

I am sure C.S. Lewis and J.R.R. Tolkein had the same affliction. I am also sure that the world is better for having read both of them, even if the world could have lived without their writing.

As for time, I have had twelve more years to write than Lewis got, so far, and five less than Tolkein.

Time to say what?

I want to say that personal AI will do far more for all of us than what we will ever get from AI as a corporate service. And to say it better than I just did.

I want to say that we will do better with rulers who care about people than with rulers who merely rule. And to say that better than I just did.

I want to complete the work of John McPhee by reporting as best I can what has happened to the great characters that anchored every one of his essays and books. But that project is not on the back burner. It’s in the fridge, where I’ve kept it for decades (while continuing to read the entire McPhee oeuvre, much of it repeatedly).

Speaking of burning, I am impelled by Dylan Thomas, who wrote “Do not go gentle into That Good Night,” before dying at just thirty-nine. The poem was for his father:

Old age should burn and rave at close of day,
Rage, rage against the dying of the light.

And so I do.


Michael Ruminer

Let Me Save You Some Time on Azure OpenAI And Copilot

I continue to develop my interest and some proficiency in AI. I am specifically focusing on Microsoft’s Copilot and Azure offerings. How did I end on that choice? I used to be a Microsoft MVP and also have a personal Visual Studio license that comes with $150 a month in Azure credits. So I have a long history in the Microsoft stack and tools plus some credits to play with each month. Those credits

I continue to develop my interest and some proficiency in AI. I am specifically focusing on Microsoft’s Copilot and Azure offerings. How did I end on that choice? I used to be a Microsoft MVP and also have a personal Visual Studio license that comes with $150 a month in Azure credits. So I have a long history in the Microsoft stack and tools plus some credits to play with each month. Those credits don’t go far when dealing with Azure AI pricing but I appreciate them. This last weekend I was going through an exercise I found on connecting up Azure AI search with Azure OpenAI layered on top, and then letting that be the foundation for a Copilot. You can find the excellent video by Lisa Crosbie here. This is a post on what I encountered that took me a while to overcome so that if you have a similar set of conditions you may spend less time to get running.

The crux of my arrangement and some heartache is the personal Visual Studio Azure subscription id; an emphasis on “personal”. This is where I have the monthly credits, but you need a business Microsoft account to use the Copilot Studio.

Here is the short end of it. If you can call this short. It’s complicated, what I did to get there, but doesn’t have to be too complicated for you. Here are the steps I went through so you don’t have to:

I needed to apply for a Microsoft OpenAI Studio preview access. I needed to use a non-personal email. No outlook.com, gmail etc. Once approved the following day when trying to kick off a Copilot creation from the deployed OpenAi instance it asked me to log in and would only accept a business or school Microsoft account — my personal account wouldn’t work. I created a new Microsoft business account by subscribing to Office 365 basic business I tried to kick off the Copilot creation from the OpenAI deployment on my personal account and when it asked me to log in for the Copilot I put in my new business Microsoft account. It complained that it couldn’t connect with the Azure OpenAi deployment. Which made total sense- it was under a different account than the Copilot I was trying to create, but I had to try it. So, I subscribed to Azure using that newly minted business account. It came with $200 credit the first month. I tried to apply for OpenAI using the “business” email address the Azure subscription gave me when subscribing to Azure- a ruminio944.onmicrosoft.com domain. It immediately denied me saying that the email address was a personal email account. I wasn’t expecting that. I had a dormant domain sitting in GoDaddy (who doesn’t) and added it to my Azure account as a domain and set up a shared email for that domain. I now had factraft@factraft.com email address. I am sure I could have set up an email box directly on GoDaddy for the domain but this was more fun and a learning experience. I now had a MS business login with an Azure subscription and $200 credit to play with and what should be considered a business email. I applied for the OpenAI Studio Preview again this time using my new Azure subscription id and new factraft email. I expected it to be denied as all the other information such as name, address, website etc. was the same as I had already submitted and been approved for under my personal Azure subscription id. Surprisingly, the next day I received a welcome message for OpenAi Studio Preview. I went through the video exercise in my new Azure account and most things worked as in the video. The Lesson

To make it work you need the OpenAI Studio Preview which requires a business email address, a website, and an approved use case to request admission to the preview, and that is no guarantee of approval. You’ll need a business Microsoft account to log into in order to kick off the Copilot studio. Personal emails and a personal Microsoft account won’t cut it. I created a business Microsoft account by subscribing to Office 365 business basic for about $8. Then added an Azure instance to this with $200 credits for the first month. Then I was off to the races- mostly. I was able to make it all work for a mere mortal for the cost of about $8, the one month Office 365 subscription, and a partial days effort . All in all, not bad. I’ll make another write up on what I discovered in the creation itself. If just the right person(s) finds this trying to play with Azure OpenAI and Copilot studio then it might save them a fair amount of time.

Monday, 06. May 2024

Patrick Breyer

Computerspiele-Sterben: PIRATEN fordern Kulturschutz für Games

In Zusammenarbeit mit der Spielerinitiative „Stop killing games“, die sich gegen das Lahmlegen von Computerspielen durch ihre Hersteller wendet, haben der Europaabgeordnete der Piratenpartei Dr. Patrick …

In Zusammenarbeit mit der Spielerinitiative „Stop killing games“, die sich gegen das Lahmlegen von Computerspielen durch ihre Hersteller wendet, haben der Europaabgeordnete der Piratenpartei Dr. Patrick Breyer und ein Mitglied des Kulturausschusses eine Dringlichkeitsanfrage an die EU-Kommission gerichtet. Hintergrund des Gamer-Protests ist die Entscheidung des französischen Computerspieleherstellers Ubisoft, das bis Dezember 2023 verkaufte Rennspiel „The Crew 1“ ab April 2024 unbenutzbar zu machen. Breyer erklärt:

„Wir wollen eine Stellungnahme der EU-Kommission dazu, ob Spielehersteller beliebte Spiele gewinnbringend verkaufen, sich dabei aber die willkürliche, jederzeitige Unbrauchbarmachung vorbehalten dürfen. Diese Praxis der Branche dürfte als ‚missbräuchliche Klausel in Verbraucherverträgen‘ illegal sein. Sie wird vor allem der kulturellen Bedeutung von Games nicht gerecht, die gewachsene Gemeinschaften von Millionen von Spielern zusammen bringen können. Wir Piraten fordern, dass die EU Games als Kulturgut schützt, ein Lahmlegen nach Gutdünken verbietet und, wenn der Hersteller Games aufgibt, ein Weiterführen durch die Community ermöglicht.“

Die Digitalexpertin und Spitzenkandidatin der Piratenpartei zur Europawahl Anja Hirschel ergänzt: „Aus Sicht der Spieler ist es geradezu unverschämt, ein Spiel anzubieten und dieses dann plötzlich vollständig ‚abzuwürgen‘. Eine gewisse Nutzungsdauer ab Kaufdatum muss garantiert werden. Daher ist dies auch eine Frage des Verbraucherschutzes.“

Breyer hatte bereits vor einem Monat die EU-Kommission zur Einstellung von Computerspielen befragt, wobei die Antwort aussteht. Die jetzt eingereichte Dringlichkeitsanfrage muss innerhalb von drei Wochen, also bis 16. Mai, beantwortet werden.

Friday, 03. May 2024

Wrench in the Gears

Could Bobby’s Tokenized Environmental Markets Really Be About Using Web3 to Game Quantum Mechanics?

This will be my last post for a few weeks. During my trips down to Arkansas, I’ve been reading Gary Zukav’s 1979, “The Dancing Wu Li Masters,” a lay person’s guide to the “new” physics that emerged out of meetings he had at Esalen, home base for the human potential movement. I recognize there continue [...]

This will be my last post for a few weeks. During my trips down to Arkansas, I’ve been reading Gary Zukav’s 1979, “The Dancing Wu Li Masters,” a lay person’s guide to the “new” physics that emerged out of meetings he had at Esalen, home base for the human potential movement. I recognize there continue to be many questions about quantum mechanics. At this point, I am leaning towards Hugh Everett’s “Many World’s Theory” and John Wheeler’s  “Participatory Universe.” I don’t claim to have any of this pinned down, and yet I wanted to share some thoughts I’ve been having on relativity, perception, spacetime, lattices, and construction of personalized “realities” using consciousness and information theory.

When Leo first started his work on Web3 tokenization of natural capital, he looked into the origins of the International Geophysical Year, use of radio isotopes to track energy transfer in ecosystems, and UNESCO’s “Man and the Biosphere” (Noosphere?) programme. I am starting to think that the focus on One Health, climate, and the UN Sustainable Development Goals could actually be about relativity – employing digital identity and Web3 to situate individual representations of consciousness within webs of relationship over time. That data could then be used not only to discern patterns but to incorporate game mechanics towards some form of networked collective manifestation.

Individual experiences of the ant computer would be shaped by each participant’s environmental context. Through gamification and spatial computing we, as agents, would emit signals that could be used to coordinate a vast sociotechnical computational system of domesticated consciousness. But first, the system would need to know where each of us is at a given time, what information flows we have consumed, what our perception of reality is, and how best to use the archetypal fields that surround us to nudge us towards pre-programmed goal-oriented behaviors, how to optimize the agents with the least amount of friction. Go with the flow….

Thinking of carbon-credit behaviors in this way – as gamified agent signaling within fractal, nested complex systems – has some legs I think. In any event, it is a much more interesting scenario than Klaus wants to turn you into a puppet, take all your things, and make you eat bugs, right?

So here is the video I made last night.

It will be interesting to watch as the RFK Jr. campaign tries to weave together the split personalities of Market-Based Environmental Solutions Bobby and Health FreeDOM Bobby. Liam Sturgess’s initial attempt was rather weak.

https://www.depauw.edu/news-media/latest-news/details/19648/

I expect a lot more information flows will need to pour forth in order to create a narrative compelling enough to get people to sign on to big data, AI for good green progress. Stay tuned.

This is the excerpt I read. Below it are screenshots of the featured maps if you’d like to explore further. It’s quite fascinating to see how the various threads spread out over many maps have come together around the Camelot candidate.

Interactive Map: https://embed.kumu.io/10176e997f243e67277e7d615ba4bbb1#untitled-map?s=bm9kZS1HYVJhVVpTNA%3D%3D

Interactive Map: https://embed.kumu.io/45f342eb350f7b4c707484423f71d6ff#untitled-map?s=bm9kZS1XWmNzcFV0Mg%3D%3D

Interactive Map: https://embed.kumu.io/c9645fc5fbabfed02dd66e0a0aea9905#untitled-map?s=bm9kZS05N3prSzlrVg%3D%3D

Interactive map: https://embed.kumu.io/ddfc18d01ccaffb4db6b8ac0711ec30a

Interactive Map: https://embed.kumu.io/a8d369aeea2eb819d4961542e0e3f91c

Interactive Map: https://embed.kumu.io/2f5210a439fe8343ae208a20c70d83fc

Interactive Map: https://embed.kumu.io/d52b8a5f6b97bfde04624f29a903ef4c

Interactive Map: https://web.archive.org/web/20210327185247/https://littlesis.org/oligrapher/6635-metronome

Interactive Map: https://embed.kumu.io/0712f3e054d2055302b00b258e24e1c0#untitled-map?s=bm9kZS1TRkFROG0xRw%3D%3D

Thursday, 02. May 2024

Aaron Parecki

OAuth for Browser-Based Apps Working Group Last Call!

The draft specification OAuth for Browser-Based Applications has just entered Working Group Last Call!

The draft specification OAuth for Browser-Based Applications has just entered Working Group Last Call!

https://datatracker.ietf.org/doc/html/draft-ietf-oauth-browser-based-apps

This begins a two-week period to collect final comments on the draft. Please review the draft and reply on the OAuth mailing list if you have any comments or concerns. And if you've reviewed the document and are happy with the current state, it is also extremely helpful if you can reply on the list to just say "looks good to me"!

If joining the mailing list is too much work, you're also welcome to comment on the Last Call issue on GitHub.

In case you were wondering, yes your comments matter! Even just a small indication of support goes a long way in these discussions!

I am extremely happy with how this draft has turned out, and would like to again give a huge thanks to Philippe De Ryck for the massive amount of work he's put in to the latest few versions to help get this over the finish line!


Wrench in the Gears

Austin’s Learn Capital Takes A Field Trip to the Vatican To Talk Entrepreneurial Social Impact

Late last April, Lynn Davenport and I recorded a conversation unpacking the visit Evan Baehr, managing partner and social impact investor with Learn Capital, made to the Vatican in the fall of 2022 for the Humanity 2.0 conference hosted in partnership with Harvard University. I think it is useful to consider Humanity 2.0 as an [...]

Late last April, Lynn Davenport and I recorded a conversation unpacking the visit Evan Baehr, managing partner and social impact investor with Learn Capital, made to the Vatican in the fall of 2022 for the Humanity 2.0 conference hosted in partnership with Harvard University. I think it is useful to consider Humanity 2.0 as an extension of the eugenics (optimization) movement that became aligned with the Human Potential Movement in the 1970s and is now merging with quantified-self “wellness management” and alt-health biohacking.

The social science division of the Catholic Church appears to be totally onboard with the idea of aligning big data impact investing with preventative care (cue Bobby’s “chronic health” epidemic plans) using their far-reaching health system infrastructure. You can explore overlaps between Social Finance’s Ronald Cohen and Rome here, here, here, and here. In particular, note this post on the role of Providence Health, a Catholic Health System, in the early days of Covid.

Lumedic = light. Note the lighting in the feature image.

Interactive Map: https://embed.kumu.io/052a2ea893a8f845488a6f5b14515d4d#untitled-map?s=bm9kZS1iMzZHYXkwaw%3D%3D

We need to understand that networked biology and social gaming will be at the core of light-based, post-Moore’s law computing. Such a system will combine non-traditional interfaces, cyber-physical systems (including nanotechnology and frequency transmission), and spatial computing. In such a system our participation, through the perpetual signaling of our “choices,” will dynamically reshape the “game,” though we may not ever truly understand what the ultimate goal of the game is due to the deployment of sophisticated “knowledge management” techniques.

In my opinion, digital education linked to biometric monitoring is an essential part of a plan that aims to harness embodied group consciousness for collective problem-solving and/or pattern identification, possibly within the context of an abstracted mathematical / metaphysical spacetime continuum the general public cannot readily comprehend.

Networked consciousness, De Chardin’s noosphere, is the goal. Widespread deployment of a “free” “open” global ed-tech protocol layer will hasten the planned evolutionary trajectory using Google’s Lamarckian “selfish ledgers.” Social Impact will be used to justify ongoing monitoring and management of key performance indicators, ubiquitous transmission parsed by machine learning within an ever evolving computational “game of life” where Earth’s beings are remade as agents in an infinitely complex cellular automata program. That’s something surely Nicole Shanahan, an alumna of Santa Clara University (lead in the social impact deployment space) and Sergey Brin’s ex would know about, right?

Shortly after we recorded it, life got complicated, and the clips got put on the back burner. This week I pulled everything off the shelf, blew off the dust, did some editing and found the content surprisingly current with respect to the manufactured school choice wars, funded by the likes of free market options traders like Jeff Yass (see maps below) as well as the Catholic undertones of Bobby’s presidential campaign. It’s the fourth installment in a series on the roll out of Open Education Resources (aka digital playlist education with competency badges for global gig labor).

Here are links to the other installments if you want context for this one: 

OER in Texas: Building Digital Twins for a Global Gig Economy: Here

VR Apprenticeships, Income Sharing Agreements and the Omega Point: Here

Learn Capital Comes to Austin: Here

 

 

OER Slide Deck – Presentation Starts At Slide 98.

Interactive Map: https://embed.kumu.io/91af355393887ce4efd9c139cf0d69d4#untitled-map?s=bm9kZS1LeHFNU2Zkcg%3D%3D

 

Interactive Map: https://embed.kumu.io/cbaaef670490292b6d28fd89a578a55c

Learn Capital, launched by Greg Mauro on Powder Mountain outside Salt Lake City, opened their Austin campus last year and has been an influential player in promoting adoption of digital education and blockchain / web3 technology in the Lone Star state. Lynn and I have a lot more to talk about with respect to Texas, classical antiquity, and archetypal morphic fields. Check out this years Garden’s of Greece Dallas Arboretum benefit! A tribute to Athena, goddess of wisdom and the smart city? 

See this post on Mondragon and the Catholic Church’s Focolare movement for a model of the freeDOM cell / intentional community collective that I suspect will be the model for the ant computer system. 

Interactive Map: https://embed.kumu.io/76cd68ced810f82da69e19f2dca42b47

 

 

 


The Pragmatic Engineer

The Pulse #92: GitHub’s AI-assisted developer workflow vision

Also: Google laying off engineers as their business is booming; a good period for startup fundraising and IPOs; and how WorkOS acquired devtools startup Warrant

Programming note: next week, I’ll be on spring break, spending time in Florida, visiting my in-laws. This will mean no The Pulse on Thursday: but there will still be a Tuesday article, as usual. Thank you for your understanding and support!

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Cloudflare handles an extra 12M req/sec with no issues, how does being a PM at Google or Meta compare, and are nearly 1 in 10 of all developers worldwide using GitHub Copilot? And more.

GitHub’s AI-assisted developer workflow vision: Copilot Workspace. Several startups are aiming to build an AI coding tool that goes beyond GitHub Copilot. With exceptionally fast execution, GitHub seems to have beaten them to it. It’s called GitHub Copilot Workspace, and doesn’t aim to replace devs. A review of this developer-driven, AI-assisted workflow, including insider details from the GitHub team.

Google lays off devs while business booms. Google made layoffs in its Python and Flutter platform teams, right as the company’s stock price hit an all-time high. 

Good time for tech scaleups fundraising and IPOs. It’s been a while since we’ve had so much positive news on tech IPOs, and late and early-stage fundraising. Hopefully, this trend continues!

Platform scaleup buys feature startup: WorkOS acquires Warrant. Warrant is an innovative fine-grained access control seed-stage devtools startup, built on the principles of Google’s authorization system. In what looks like a win-win outcome, WorkOS has acquired it, intending to keep intact the team, product, and all existing customers. Details on this acquisition from the founder and CEO of WorkOS.

1. Industry pulse Cloudflare doesn’t notice 12M extra req/sec

Cloudflare engineering director Benjamin Yule shared an interesting non-event:

I have so many questions as 12.5 million requests per second is a huge load! A few thousand requests per second is usually considered high, and handling a load in the millions surely requires thousands of servers (if not more!) to be spun up to handle it.

How much load a single server can handle depends on how long a request takes to handle, how much memory a request uses, and then calculating whether a request is limited in CPU or memory. Whichever resource (CPU capacity or memory) runs out first determines a server’s maximum load. You can push things pretty far with clever workarounds; it’s how the WhatsApp team served 2 million requests per second on a server with 24 cores and 100GB RAM, back in 2012!

In this case, Cloudflare soaked up the load by using Cloudflare Workers, a serverless application platform, which spun up 40,000 CPU cores (an average of 320 requests/second per CPU core.) The tweet says the Cloudflare Workers team didn’t even notice the increase, which suggests they could be handling 50-100x more load on the service! If we assume a 100x load, that could mean operating 4 million CPU cores or more simultaneously, just for Workers. By comparison, we previously covered travel booking service Agoda operating a total of 300,000 physical cores and 600,000 virtual ones. It’s likely Cloudflare is operating more than 10x this number.

Do nearly 1 in 10 developers use GitHub Copilot?

Read more


Patrick Breyer

Der Pirat-o-mat: Onlineanwendung zum Vergleich der politischen Meinung mit echten Abstimmungen im EU-Parlament startet

Mit dem Pirat-o-mat (piratomat.de) ist ab sofort ein innovatives Tool verfügbar, das Nutzern dabei hilft, ihre Standpunkte zu Themen wie Netzpolitik, digitale Rechte und Demokratie …

Mit dem Pirat-o-mat (piratomat.de) ist ab sofort ein innovatives Tool verfügbar, das Nutzern dabei hilft, ihre Standpunkte zu Themen wie Netzpolitik, digitale Rechte und Demokratie zu entwickeln und zu vergleichen. Die Besonderheit: Im Quiz sind 12 Fragen zu beantworten, die im EU-Parlament seit 2019 tatsächlich zur Abstimmung standen. In der Auswertung erfährt der Nutzer seine Übereinstimmung mit dem Abstimmungsverhalten des Europaabgeordneten der Piratenpartei und der anderen deutschen Europaabgeordneten.

Doch der Piratomat bietet mehr als nur einen einfachen Vergleich: Geordnet nach Parteien und Themengebieten ist potenziell problematisches Abstimmungsverhalten zusammengestellt. Der Schwerpunkt der ausgewählten Abstimmungen liegt auf Netzpolitik, aber auch die Themen Gesundheit, Demokratie, Umweltschutz und Transparenz sind vertreten.

Der EU-Abgeordnete der Piratenpartei Dr. Patrick Breyer, der die Website bereit stellt, erläutert:

“Als Normalbürger bekommt man in aller Regel nichts davon mit, wer in Brüssel wie abstimmt und ob das Abstimmungsverhalten den Versprechen einer Partei gerecht wird. Weil das EU-Parlament seine Abstimmungsaufzeichnungen maximal unverständlich macht, sorgen unsere Abstimmungsgrafiken für Transparenz. Der Pirat-o-mat wird viele Nutzer überraschen, manche verärgern und hoffentlich auch zu Nachfragen und Gesprächen führen. Unsere Demokratie lebt von Transparenz, Beteiligung, Glaubwürdigkeit und Rechenschaft – der Pirat-o-mat ist unser Beitrag dazu.”

Zum Pirat-o-maten

Wednesday, 01. May 2024

Phil Windleys Technometria

Internet Identity Workshop XXXVIII Report

We recently completed the 38th edition of the Internet Identity Workshop. We had 330 people from around the world who called 169 sessions. As usual there was lots of energy and thousands of side conversations. IIW is a place to get things done and it showed in the excitement in the rooms and the comments people made to me about how much they enjoyed it.

We recently completed the 38th edition of the Internet Identity Workshop. We had 330 people from around the world who called 169 sessions. As usual there was lots of energy and thousands of side conversations. IIW is a place to get things done and it showed in the excitement in the rooms and the comments people made to me about how much they enjoyed it.

Opening on Tuesday

As you can see by the pins in the map at the top of this post, there were attendees from all over the world. Not surprisingly, most of the attendees were from the US (241), followed by Canada (11). Germany, India, and Switzerland rounded out the top five with 9, 8, and 7 attendees respectively. Attendees from India (5), Thailand (3), and Korea (3) showed IIW's diversity with attendees from APAC. And there were 4 attendees from South America this time. Sadly, there were no attendees from Africa again. Please remember we offer scholarships for people from underrepresented areas, so if you'd like to come to IIW39, please let us know. If you're working on identity, we want you there.

Session on Personal AI

For states and provinces, California was first with 122. Washington (16), Utah (10), Texas (10) and New York (10) rounded out the top five. San Francisco (14) Oakland (13), San Jose (12), Seattle (11), and New York (9) were the top cities.

Demo Hour on Wednesday

In addition to sessions, we have a demo hour on Wednesday that is a little like speed dating. There were 20 different projects highlighted.

Drummond Reed hosts a session

There's always more than one session that I want to attend in any given time slot and choosing is hard. That's a common refrain. Luckily we have sessions notes that we publish in a Book of Proceedings. You can find additional photos from Doc Searls: Day 1 and Day 2.

IIW is where you will meet people to help you solve problems and move your ideas forward. Please come! IIW 39 will be held October 29-31, 2024 at the Computer History Museum. We'll have tickets available soon.

Tuesday, 30. April 2024

Patrick Breyer

Piraten zur Vorratsdatenspeicherung: Wo ein Trog ist, sammeln sich die Schweine

Der Europäische Gerichtshof hat heute seine Rechtsprechung zur Vorratsdatenspeicherung aufgeweicht und die Nutzung auf Vorrat gespeicherter Internet-Verbindungsdaten schon zur Verfolgung von Filesharing zugelassen. Der Europaabgeordnete der Piratenpartei und langjährige Gegner …

Der Europäische Gerichtshof hat heute seine Rechtsprechung zur Vorratsdatenspeicherung aufgeweicht und die Nutzung auf Vorrat gespeicherter Internet-Verbindungsdaten schon zur Verfolgung von Filesharing zugelassen. Der Europaabgeordnete der Piratenpartei und langjährige Gegner der Vorratsdatenspeicherung Dr. Patrick Breyer fordert:

„Zuerst hat der EuGH zur Verfolgung von Kindesmissbrauchsdarstellungen und anderer schwerer Straftaten eine flächendeckende Vorratsdatenspeicherung über die Internetverbindungen der gesamten Bevölkerung erlaubt. Ab heute soll unsere Internetnutzung aber schon zur Verfolgung von Filesharing und anderer Bagatelldelikte offengelegt werden. Wo ein Trog ist, sammeln sich die Schweine. Deswegen müssen wir alles daran setzen, die grenzenlose Datengier der Konservativen und Sozialdemokraten zu stoppen. Obwohl Deutschland Internetdelikte auch ohne IP-Vorratsdatenspeicherung sehr erfolgreich aufklärt, arbeitet die EU im Verborgenen bereits daran uns nach der Europawahl wieder eine Internet-Vorratsdatenspeicherung aufzuzwingen. Wir Piraten werden für unser Recht auf anonyme Internetnutzung kämpfen!

IP-Vorratsdatenspeicherung ist, wie wenn jede:r Bürger:in ein sichtbares Kennzeichen um den Hals gehängt bekäme und dieses auf Schritt und Tritt notiert würde. Niemand würde sich eine solche Totalerfassung des täglichen Lebens gefallen lassen. IP-Vorratsdatenspeicherung würde jeden Internetnutzer unter Generalverdacht stellen und die Internetnutzung der gesamten Bevölkerung, die unsere intimsten Vorlieben und Schwächen abbildet, nachvollziehbar machen. Eine so totale Erfassung würde Kriminalitätsvorbeugung durch anonyme Beratung und Seelsorge, Opferhilfe durch anonyme Selbsthilfeforen und auch die freie Presse gefährden, die auf anonyme Informanten angewiesen ist. 99,99 Prozent dieser Daten wären völlig nutzlos, da sie Bürger:innen betreffen, die nie auch nur in den Verdacht einer Straftat kommen.“

Dr. Patrick Breyer

Hintergrund: Der Vorsitzende von CDU/CSU im EU-Parlament Weber forderte am Samstag eine EU-weite IP-Vorratsdatenspeicherung. Unterdessen arbeitet die sog. #EUGoingDark-Arbeitsgruppe im Auftrag der EU-Kommission und EU-Regierung entsprechende Pläne aus.


The Pragmatic Engineer

What is Security Engineering? Part 2.

A broad overview of the ever-developing security engineering field; a domain that can feel intimidating to some software engineers. With Nielet D'Mello, security engineer at Datadog. Part 2.

👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

Q: “As a software engineer, I’d like to learn more about security engineering. What’s a good way to understand this vast field?”

This is the second and final part of exploring this important – and, yet, often intimidating! – topic of security engineering. Giving us an overview of this field is Nielet D'Mello: a security engineer at Datadog (previously at Intel and McAfee).

In Part 1 we already covered:

Myths and misconceptions about security engineering

History of security engineering

The present

A mental model: seven core dimensions to think about application security

Towards a secure software development lifecycle (SDLC).

In today’s issue, Nielet takes us through:

Defining the criticality of a system. Security dimensions to consider as we talk about a service or systems’s criticality.

Scoring a system’s criticality. The “napkin math” approach for scoring a system’s security criticality, and a case study to bring all it to life.

Threat modeling. A criteria for threat modeling, and pre-work for this exercise.

Security paved roads. For platform teams, building pre-approved security solutions and configurations is pragmatic.

“Defense in depth,” “least privilege,” and “zero trust.” A strategy, a principle, and a security model. Use in combination to build more layered, secure systems.

The bottom of this article could be cut off in some email clients. Read the full article uninterrupted, online.

Read the full article online

With that, it’s over to Nielet.

Common security engineering terms

As a brief refresher, we use three terms frequently in this article, so let’s start by defining them:

Vulnerability: An exploitable flaw or weakness in a system’s design, implementation or deployment

Threat: The potential for a threat actor to exploit the vulnerability

Risk: Loss or damage that could occur when a threat actualizes

1. Defining the criticality of a service or system

Do all services and systems need to invest in a security design review? Not necessarily, as the need for a review depends on a service’s or system’s business risk profile. Vulnerabilities will surface as you identify security concerns in a system’s design and architecture. Code reviews and dynamic testing also surface security issues.

For critical systems, it’s worth investing in processes like security design reviews. However, how do you decide just how critical  a service or system is? Use the dimensions below for a better sense of this:

Business purpose

Public access

Custom access controls

Users of the system

Deployment environments

Data classification

Business purpose

What are the primary objectives and functions of the service or system within the context of the organization's business operations? Identify how the service contributes to achieving business goals, generating revenue, or providing stakeholder value. To figure out the risks, it’s essential to know:

The nature of business

The industry the business operates in

Regulatory requirements

Sensitivity of data involved. For example, is it restricted, or subject to PII?

Public access

Is the service accessible to external users outside of the organization's network, or the general public? Public access systems offer expanded attack surfaces.

For these, you need to assess the potential exposure to security threats and risks associated with providing services over the internet, or other public networks, as these systems are at a much higher risk of automated bot attacks, for example.

Custom access controls

All systems need custom access controls for their data, apps and resources to determine who has access to it and in what circumstances. Role-based access control (RBAC,) or attribute-based access control (ABAC,) are two examples of custom access controls. These have specific access permissions defined for users and identities, and restrictions tailored to the service’s requirements and security needs to ensure confidentiality. 

The decision to build custom access controls is usually made with the following factors in mind:

Granularity

Dynamic decisions based on real-time information and conditions

Implementation efforts

Simplicity

Flexibility

Users of the system

What different types of users interact with the service? This is key information for defining:

User roles

Authentication mechanisms

Access requirements

User activity auditing

Threat detections associated with anomalous user behavior patterns 

Adherence to regulatory compliance

The last one is especially important. Several regulatory frameworks and industry standards mandate the protection of sensitive data through user identification and access controls. Examples of such frameworks include the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA). In these cases, putting these in place is not “just” about making the system secure; it’s about what ensures the system is compliant with privacy and security regulation.

Users include:

Internal users: employees, administrators

External users: customers, partners, third-party vendors. 

Deployment environments

Development, testing, staging, and production environments in which the service operates. Each environment may have different security requirements and configurations. These varying requirements depend on:

Level of risk tolerance

Need for data protection

Data availability requirements

Compliance with industry standards, regulations, and legal requirements. 

For example, a staging environment may have broader internal employee access, meaning it can be accessed by most (if not all) employees. However, the production environment tends to have much stricter access control: only specific employees or groups can access it, and even fewer will have the rights to deploy to it. And while the staging environment is unlikely to have data that is considered confidential customer data: the production environment will! So the production environment will have much more strict data security and monitoring measures deployed on its  infrastructure.

It’s pretty common for an environment to be a shared infrastructure for various services. When this is the case, robust security controls (like stricter isolation for applications and databases) are even more important! Multi-tenant architectures are a good example for such “shared infrastructure”  where stricter security controls are necessary.

Data classification

This refers to labeling data based on sensitivity, confidentiality, and regulatory requirements. Understanding the classification of data helps determine:

Appropriate security controls

Suitable encryption methods

Access restrictions for safeguarding sensitive information and preventing unauthorized disclosure or misuse.

2. Scoring a system’s “criticality”

It’s helpful to calculate a criticality score for services. For this, I like to assign weights to the security dimensions. Below is a sample of how these scores could look. It’s just an example; simpler than I usually use, and it doesn’t encompass all factors relevant for every system. Just treat it as inspiration:

Now we’ve established the basic factors for understanding risk and criticality, we can do some napkin math with criticality scores, based on characteristics:

Calculating criticality, simplified: define dimensions by important security factors

A simple way to think about a total risk “score” is to add together the weights for each dimension. So, in this case: Total Risk Score = BP + PA + CAC + US + DE + DC.

Scoring criticality

Let’s take the example of building a payment system for an e-commerce site. It needs to process a high volume of transactions via credit cards, debit cards, and other payment methods. It also needs to have payment gateway integration, account for fraud prevention, and is subject to PCI DSS compliance.

Let’s do the napkin math for this system’s criticality:

Scoring the criticality of this example system

We get the total risk score by adding up the dimensions. In this case, it comes to 15 out of a maximum of 18 points (3 + 1 + 2 + 3 + 3 + 3.) This score indicates we are talking about a critical system from a security standpoint.

All companies have unique risk-scoring and risk-tracking processes. As a software engineer, you need to figure out what a “high” service risk score means, and at what point you should reach out to the security team at your organization, if there is one.

3. Threat modeling

Read more


Just a Theory

Mini Summit: Universally Buildable Extensions

Tomorrow, May 1 2024, Yurii Rashkovskii of Omnigres will give the Postgres Extension Mini-Summit talk, “Universally buildable extensions: dev to prod.” at noon Eastern / 16:00 UTC, followed by community discussion.

Well that snuck up on me. Tomorrow, May 1 2024, Yurii Rashkovskii of Omnigres will be giving a presentation at the fifth installment of the Postgres extension ecosystem mini-summit. The tal, “Universally buildable extensions: dev to prod”, should provoke some interesting asymmetrical thinking and discussion. I hope to see you there.

Note! If you reserved a spot at a prior mini-summit, you will need to do so again for each subsequent event or get no reminders from Eventbrite. If it’s sold out just email david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord for a link or, if you’d like a calendar invitation.

More about… Postgres Extensions PGConf PGXN Trunk Summit

Wrench in the Gears

AI Wellness + Tokenized Democracy = Game B: Will The Health FreeDOM Community Buy Into Bobby’s BS?

In between packing and getting my third and final UHaul round ready this weekend, I managed to put together a selection of clips touching on RFK Jr.’s blockchain / token habit and automated Stanford law specialist Nicole Shanahan’s plans to data mine precision health systems, an effort that will underpin quantified self social impact bonds [...]

In between packing and getting my third and final UHaul round ready this weekend, I managed to put together a selection of clips touching on RFK Jr.’s blockchain / token habit and automated Stanford law specialist Nicole Shanahan’s plans to data mine precision health systems, an effort that will underpin quantified self social impact bonds and global ant computer operations. Last night after stopping for the night in Staunton “Game B – Eugenics – Woodrow Wilson” Virginia, I did an overview of the themes covered in the clip compilation, to make the concepts clearer. After speaking out against blockchain digital ID for four years, it makes me ill to see the manufactured “hero” that is Camelot Bobby pushing blockchain government, crypto, and AI / machine learning for social good. The health freeDOM – biohacking community IS Game B. So many are already totally invested in playing the game, even as they imagine themselves to be the rebels. Once you see it, you can’t unsee it, but it’s lonely looking at the world from this vantage point. 

PS: If you want to better understand my concerns around Bobby’s planned blockchain budget, check out some of my old blog posts. The first goes into Michael Bloomberg’s plans for “what works” data-driven government along with a public presentation I did at Wooden Shoe Books in Philadelphia in the summer of 2019. The second touches on token engineering and participatory “democracy.”

 

 

We have to start to wrap our minds around the idea that the AI automated smart contract law that Shanhan specializes in is actually about running cyberphysical / sociotechnical systems. Sergey Brin’s father Michael, was a specialist on complex dynamical systems. We as agents will knowingly or unknowingly participate in managing the game board for emergent behavior via governance tokens. THESE are the ideas more people need to start educating themselves about and discussing. 

 

Cyberphysical Systems and Choice Link

 

What Works Government Link

 

 


Tokenized Participatory Governance Link

 

 

Link to Moneyball Government Slide Deck – May 2019

 

 

 

Monday, 29. April 2024

Kent Bull

IIW 38 KERI & ACDC Session List

It’s been two weeks since the 38th bi-annual Internet Identity Workshop. The KERI community had a big showing with 19 session across three days. This included a variety of sessions on the vLEI ecosystem, the Trust Over IP (ToIP) Foundation, the Trust Spanning Protocol, KERI basics, CESR 2.0, new Witness […]

It’s been two weeks since the 38th bi-annual Internet Identity Workshop. The KERI community had a big showing with 19 session across three days.

This included a variety of sessions on the vLEI ecosystem, the Trust Over IP (ToIP) Foundation, the Trust Spanning Protocol, KERI basics, CESR 2.0, new Witness infrastructure, and business-focused sessions. A developing tradition of the “Bearer Tokens are bad, don’t use them” session was held this year as well by Sam Smith titled “I don’t sign my credentials and neither should you!”

See the list below for a complete reference of KERI sessions and their meeting notes.

IIW 38 – 19 KERI and ACDC Sessions Day 1 – Apr 16 The Hitchhiker’s Guide To KERI  / Nuttawut Kongsuwan Trust Over IP (ToIP) Foundation / Judith Fleenor Organizational Identity & Verifiable Authority / Timothy Ruff EBA Pilot Using the vLEI / Karla McKenna & Lance Byrd Simple SSI – How to make the SSI Codebase attractive for devs to use? / Jonathan Rayback The Business of SSI & Authentic Data / Timothy Ruff TSP draft Part I / Wenjing Chu Day 2 – Apr 17 KERI for Dummies / Timothy Ruff. Phil Feairheller The 5 Rights of Secure Health Data = a Proof of Concept – KERI-ACDC / Jared J and Phil Feairheller TSP Part II / Wenjing Chu (notes missing/blank) CESR 2.0 Performance features – Deep dive annotation – Comparison to JSON CBOR and more / Sam Smith Day 3 – Apr 18 Deploying Multi Tenant Secure Witnesses for KERI / Phil Feairheller TSP Draft Part III Implementation / Wenjing Chu Running Lean w/SSI The business model to go from Idea to Company / Jared J & Timothy Ruff Five Failed Blockchains – Why Trade Needs Protocols, Not Platforms / Timothy Ruff Cardano Transactions and KERI / Can a cardano Tx be affected based on AID’s current keys? / Ed Eykholt vLEI (verified Legal Entity Identifier) Demystified / Nuttawut Kongsuwan (notes missing/blank) I don’t sign my credentials and neither should you! Why unbound signatures  (tokens) are insecure and how to use KERI instead. KERI for Enterprise / Sam Smith

For more sessions come to the next Internet Identity Workshop in Fall of 2024 from OCTOBER 29, 2024 – OCTOBER 31, 2024.


Patrick Breyer

Leak: EU-Regierungen wollen datenschutzfreundliche und verschlüsselte Messaging-Dienste mit Chatkontrolle bestrafen

Ein neues Dokument zur geplanten Chatkontrolle, das zuerst von netzpolitik.org veröffentlicht und dann auch dem Nachrichtenportal Contexte zugespielt wurde, enthüllt weitere Details über die Pläne der EU-Ratspräsidentschaft: Wenn Dienste …

Ein neues Dokument zur geplanten Chatkontrolle, das zuerst von netzpolitik.org veröffentlicht und dann auch dem Nachrichtenportal Contexte zugespielt wurde, enthüllt weitere Details über die Pläne der EU-Ratspräsidentschaft: Wenn Dienste unter Pseudonymen oder ohne Nutzeraccount, über VPNs oder mit Verschlüsselung genutzt werden, schneiden sie auf einer Risikoskala schlechter ab und sollen eher zur Chatkontrolle verpflichtet werden. Die Akzeptanz von Kryptowährungen zählt ebenso als Risikofaktor. Wenn ein Dienst den „direkten Austausch von Inhalten ohne Verwendung zentraler Server“ (P2P) ermöglicht, wird dies als Risiko eingestuft. Wenn eine datenschutzfreundliche Plattform keine Daten zur Überwachung des Nutzerverhaltens sammeln kann oder will, wird sie schlechter bewertet. Dienste, bei denen die Nutzer „überwiegend öffentlich kommunizieren“ (d.h. nicht über private Chats), sollen besser abschneiden und somit weniger wahrscheinlich zur Chatkontrolle verpflichtet werden.

Der Piratenabgeordnete und digitale Freiheitskämpfer Dr. Patrick Breyer, im EU-Parlament Schattenberichterstatter des Gesetzentwurfs zur Chatkontrolle, kommentiert: „Grundlegende Dienste wie Signal, TOR, verschlüsselte E-Mail-Dienste wie ProtonMail oder Torrenting-Plattformen sollen verteufelt und per Verpflichtung zur Chatkontrolle letztlich zerstört werden. Bie bisher datenschutzfreundlichesten Kommunikationsdienste sollen künftig zu den am stärksten überwachten Diensten werden. Das durchgesickerte Papier offenbart Massenüberwachung und Unterwanderung sicherer Verschlüsselung von Diensten als wahres Ziel der meisten EU-Regierungen. Im Gegensatz dazu will das Europäische Parlament nur Chats von Personen mit Verbindung zu sexuellem Kindesmissbrauch abhören lassen, stattdessen aber viel mehr Sicherheitsmaßnahmen zum Schutz vor der Anbahnung sexueller Kontakte zu Kindern vorschreiben, die der EU-Rat in seinem Papier nur erwähnt. Dieses Leak ist ein Augenöffner. Wir Piraten werden nicht aufhören, für unser Grundrecht auf digitales Briefgeheimnis und sichere Verschlüsselung zu kämpfen!“

Im Mai sind zwei Beratungsrunden über den Vorschlag zur Chatkontrolle geplant, bevor die EU-Innenminister im Juni entscheiden sollen.

Breyers Informationspostal zur Chatkontrolle


Mike Jones: self-issued

Securing Verifiable Credentials using JOSE and COSE is now a W3C Candidate Recommendation

The Securing Verifiable Credentials using JOSE and COSE specification (a.k.a. VC-JOSE-COSE) has reached W3C Candidate Recommendation status. The Candidate Recommendation milestone is described in the W3C Process document. Please review the Candidate Recommendation of VC-JOSE-COSE. Thanks especially to Gabe Cohen, Orie Steele, and Brent Zundel for doing the hard work of getting us to this […]

The Securing Verifiable Credentials using JOSE and COSE specification (a.k.a. VC-JOSE-COSE) has reached W3C Candidate Recommendation status. The Candidate Recommendation milestone is described in the W3C Process document. Please review the Candidate Recommendation of VC-JOSE-COSE. Thanks especially to Gabe Cohen, Orie Steele, and Brent Zundel for doing the hard work of getting us to this point!

Since I last wrote about this work, the W3C Verifiable Credentials Data Model (VCDM), which is also at Candidate Recommendation stage, has been narrowed to only use JSON-LD to represent credentials. VC-JOSE-COSE secures VCDM payloads with JOSE, SD-JWT, or COSE signatures. While I’m admittedly not a fan of JSON-LD, to the extent that Verifiable Credentials using the VCDM are in use, I’m committed to finishing a solid VC-JOSE-COSE specification so there is a simple, secure, standards-based way to sign these credentials.

Of course, there are lots of Verifiable Credential formats to choose from, and more on the way. Choices already existing include ISO mDoc, IETF SD-JWT, IETF JSON Web Proof (JWP), and W3C VCDM. The IETF is also planning to create a CBOR-based selective disclosure representation in the newly formed SPICE working group. It will be interesting to see how these all shake out in the marketplace!

Saturday, 27. April 2024

Werdmüller on Medium

Where I’m coming from

Why I fight for a more inclusive, international perspective. Continue reading on Medium »

Why I fight for a more inclusive, international perspective.

Continue reading on Medium »

Friday, 26. April 2024

Patrick Breyer

Hessens GroKo will flächendeckende Internet-Vorratsdatenspeicherung durch den Bundesrat einführen

Der hessische Ministerpräsident Boris Rhein (CDU) und Justizminister Christian Heinz (CDU) haben einen „Entwurf eines Gesetzes zur Einführung einer Mindestspeicherung von IP-Adressen für die Bekämpfung schwerer Kriminalität“ (PDF) vorgelegt. Am …

Der hessische Ministerpräsident Boris Rhein (CDU) und Justizminister Christian Heinz (CDU) haben einen „Entwurf eines Gesetzes zur Einführung einer Mindestspeicherung von IP-Adressen für die Bekämpfung schwerer Kriminalität“ (PDF) vorgelegt. Am Freitag, 26. April 2024 soll das geplante Gesetz im Bundesrat behandelt werden.

Der Europaabgeordnete Dr. Patrick Breyer (Piratenpartei / Grüne/EFA) kommentiert:

„Dieser Vorstoß ist ein Rückschritt für die Kinderschutz-Debatte. Echter Kinderschutz wird seit Jahrzehnten vernachlässigt – statt für echte Lösungen zu sorgen wird das Problem als Vorwand für eine Internet-Massenüberwachung genutzt, die pauschal alle Bürgerinnen und Bürger unter Generalverdacht stellen würde. Der vorgelegte Gesetzentwurf ignoriert nicht nur dringend notwendige Maßnahmen für echten Kinderschutz, sondern auch den Fakt, dass es keinen messbaren Nutzen einer solchen Massenüberwachung gibt. Anstatt echte Lösungen auszuarbeiten, wollen Ministerpräsident Boris Rhein (CDU) und Justizminister Christian Heinz (CDU) offenbar das Rad zurückdrehen:

Vor gerade einmal zwei Wochen hat die Ampel anstelle anlassloser IP-Speicherung den Weg frei gemacht für das anlassbezogene und zielgerichtete Quick Freeze-Verfahren, wie es beispielsweise auch in Österreich eingesetzt wird. Der hessische Vorschlag hat das Potenzial die Einführung eines verhältnismäßigen Quick Freeze-Instruments aufzuhalten.

Rückschrittlich ist der Entwurf auch, weil das Bundesverwaltungsgericht erst 2023 die von CDU und SPD beschlossene Vorratsdatenspeicherung in vollem Umfang für unionsrechtswidrig und damit für nicht anwendbar erklärt hat – inklusive der IP-Vorratsdatenspeicherung.

IP-Vorratsdatenspeicherung ist, wie wenn jede:r Bürger:in ein sichtbares Kennzeichen um den Hals gehängt bekäme und dieses auf Schritt und Tritt notiert würde. Niemand würde sich eine solche Totalerfassung des täglichen Lebens gefallen lassen. IP-Vorratsdatenspeicherung würde jeden Internetnutzer unter Generalverdacht stellen und die Internetnutzung der gesamten Bevölkerung, die unsere intimsten Vorlieben und Schwächen abbildet, nachvollziehbar machen. Eine so totale Erfassung würde Kriminalitätsvorbeugung durch anonyme Beratung und Seelsorge, Opferhilfe durch anonyme Selbsthilfeforen und auch die freie Presse gefährden, die auf anonyme Informanten angewiesen ist. 99,99 Prozent dieser Daten wären völlig nutzlos, da sie Bürger:innen betreffen, die nie auch nur in den Verdacht einer Straftat kommen.“


Just a Theory

Mini Summit Four

Links, notes, and commentary from Jonathan Katz’s presentation at the fourth Postgres Extension Ecosystem Mini-Summit, “Trusted Language Extensions for PostgreSQL”.

My thanks to Jonathan Katz for his presentation, “Trusted Language Extensions for PostgreSQL”, at last week’s Postgres Extension Ecosystem Mini-Summit. As usual I’ve collected the transcript here interspersed with comments from the chat window. First, links!

Video PDF Slides [TBD]

And now, rivers of text!

Introduction I opened the meeting and introduced Jonathan Katz. Presentation

Thank you for having me. I’m very excited to talk about this, and extensions in general. I see a lot of folks here I know and recognize and some new folks or folks I’ve not met in person yet.

Borrowed from the original presentation on TLEs from November of 2022, to level set on why we built it. I know this is a slightly more advanced audience, so some stuff that might seem relatively introductory to some of you, though there is some material on the internals of extensions.

The premise is why we decided to build TLEs, what were the problems we’re trying to solve. Part of it is just understanding Postgres extensions. In general this group is very familiar with Extensions but there are two points I want to hit on.

One of the things that excites me most about Postgres is that, when you look back at Postgres as the Berkeley database project researching how to create an object relational database, an accidental or intentional features is not just that Postgres is an object-relational database, but that Postgres is an extensible database, built from the get-go to be able to add functionality without necessarily having to fork it.

Early on you’d have to Fork the database to add additional functionality, but the beauty of the Postgres design was the ability to keep adding functionality without forking.

It did require making changes to Postgres to further enhance that capability, which we’ll discuss in a bit, but that’s a really powerful concept.

The second point is that there is the large landscape of both open source and Commercial extensions. I think a lot of folks in this group are very familiar with the available open source extensions, but there are entire businesses built on, effectively, commercial extensions on top of Postgres. Again, that’s a really powerful notion!

It’s kind of like the Postgres economy: we created something that’s so special that it’s able to spawn all these different businesses whether it’s building things on top of Postgres or using Postgres as the heart of their business. Extensions have a very large role to to to play in that.

Which gets us to the history of extensions. The Syntax for Postgres extensions has been around for over a decade, since Postgres 9.1. I know there’s folks here well familiar with building extensions prior to that syntax! But we’re very grateful for the extension syntax because, as a recovering application developer, I would say it made things a lot easier.

Instead of having to futz around the file system to figure out where to install everything (wink wink nudge nudge on the topic today), you had a very simple syntax. Once an extension is installed, I can CREATE EXTENSION postgis (or whatever extension) and there you go! I have PostGIS installed.

Again, that’s really cool! Because anything we can do to make it simpler to install and use extensions further drives their adoption, and ultimately makes it even easier to develop and build applications with Postgres and continues to drive that forward.

So what can you build with Postgres, what extensions are available? It’s a whole range of things. For starters, there are extensions that I call “the ones that you take for granted”. If you’re using any monitoring tool you’re likely running and you may not even realize it. pg_stat_statements helps to aggregate statistics as queries execute and bubbles it up to whatever monitoring tool you use. It’s a great tool for performance tuning.

The example I like to give for my personal life was that, back when I was an application developer trying to manage my own databases, I had some query that was blocking my entire logical decoding system, so we weren’t keeping up with transactions. Looking in pg_stat_statements I see a recursive query where I should have had a UNION with SELECT DISTINCT instead of a gigantic query that was amassing so many rows. Fixed it: I had a 100x speed up in the query. Thank you pg_stat_statements!

Side note: I got to say, “hey, I sped up this query by 100x,” even though it was my fault it was slow to begin with.

There are utility functions to help with data types. UID OSSP is very widely used. Newer versions of Postgres have a random UUID function, but previously, anytime you needed a UUI you would CREATE EXTENSION "uuid-ossp".

The other fun thing about this extension is that developers learned about SQL identifiers that require double quotes to install the extension.

I think I saw Keith on here today. pg_partman! What’s really cool about pg_partman too is that a lot of it is PL/pgSQL. This PL/pgSQL code provides a way to manage partitions across all your tables in your entire database. Again, that’s really powerful because Postgres has added declarative partitioning in version 10, but pg_partman is still incredibly useful because there are all sorts of aspects to partition management not supported in Postgres today. This is another example where Postgres provides the core functionality and you can use the extension to package additional functionality that makes it easier for use.

Foreign data wrappers.Postgres has a whole interface to interface with other databases. It could be other Postgres databases, other relational databases, non-relational databases, file systems, etc. Postgres has a C-level interface that allows you to design the calls to optimally use all these different databases. Again, packaged up as an extension, being able to add things as we go on.

I’ll wait till the end to answer questions this will be a relatively short presentation, so we should have some time for discussion

Last but not least, a little bit on PostGIS. I think this is one of the most powerful aspects of Postgres. PostGIS itself is incredibly powerful because you have a geospatial database that happens to be Postgres underneath. A lot of heavy PostGIS users don’t even realize they’re using Postgres! They think they’re using PostGIS. That is really the power of Postgres extensibility in a nutshell: It looks like you have like a brand new, domain-specific database – and yet underneath it it’s just boring old Postgres doing all the things you expect a database to do. That is also a very powerful notion/

Tobias Bussmann in chat: Many PostGIS users don’t know they are using PostgreSQL 🤣

nils in chat: 🎉

To add a coda to it, you have pgRouting, an extension built on top of PostGIS, which is built on top of Postgres. So you have a cascading effect of extensions building on top of extensions building on top of Postgres.

So we’re supposed to talk about trusted language extensions. To really to TLEs it’s important to understand the anatomy of an extension. There are certain things that you need in order to have an extension: You need a control file, which is kind of like your packaging manifest. it tells you what’s in the extension. It goes into a directory.

You have SQL files, which effectively map out the objects that you’re going to have in your database. If you have functions that need to map to a C function or if you need to create a table access method in order to build your new your new storage layer, the SQL files are the building block.

If you have C- specific code, it goes in a library file or a shared object file that gets stored in a library directory.

It’s a very simple layout. What’s cool is if you go to create an extension, there’s a particular pattern that it forms: You need to know that when you have an extension, the information in the control file goes into the Postgres catalog. Then, if there are any functions or objects or whatever in that extension, we need to install the object itself, but we also need to make sure that there’s a dependency on the extension. That way, if we need to remove the extension or upgrade it, we know all the objects that we’ve collected.

So why this “extension building 101”? This gets at the heart of why we built TLes. Because the first thing to notice is that we install extensions directly on the file system. There are a lot of domains where that’s restricted — certainly managed service providers.

I worked a lot with containers previously, and a container is effectively an immutable file system: once you have things installed on it it’s installed. You typically don’t want to give your app developers access to your production systems, because your app developers are going to install everything under the sun on them, myself included. You certainly want to be able to restrict certain domains.

But we also don’t want to inhibit what developers want to build. We want to make it as easy as possible for them to manage their code and be able to install in different environments. That gets to another point beyond restricting the file system. Some extensions may not be universally available, depending on where you’re running them. You might be running an extension on an on-premise environment that might not work in a managed service provider. Or different managed service providers have different extensions available.

The final bit — and something that I’ve definitely personally experienced — is that, between major versions of Postgres, the API or the ABI will change. These are the interface points at the C layer. When they change it can break extensions. Ask any of the package managers how much they need to nag people to upgrade their extensions: they always want to make sure that they’re keeping it up-to-date, and ensuring that it’s compatible.

But this can also lead to other issues, because as the end user, this makes it challenging to perform major version upgrades — particularly if I’m dependent on an extension that hasn’t been updated to work with the latest version of Postgres. A subtle line of code change in Postgres could end up breaking an extension.

Quick story: that actually happened to me while I was managing pg_tle. I think it was a change in Postgres 15.1 actually broke something in the pg_tle extension. I had to to fix it. I think that’s part of the point: if you’re able to use a language that’s on top of C, and you have the C layer abstracted away, in theory it could make it easier to perform major version upgrades.

That leads into TLE.

I think there were two Notions behind trusted language extensions, or TLEs, when they were initially designed. The first is giving power to app developers to be able to build extensions. It’s actually one thing I noticed as I started making the journey from app developer to quasi-DBA to, ultimately, product manager not writing code. Part of that Journey was the power of putting some of my business logic in the database.

There’s always this tension between how much business logic to put in application code versus the database. But there are certain things that were just clear wins for me when they wer in the database. The first was a lot of search functions I wrote where filtering data down to a very small set in the database and returning to the application would save on network time, processing time on the app side, etc. There were some very clear wins by encapsulating them in functions.

But also solving things that were just much easier to solve in the database. Having specific data types that solve a particular problem — geospatial extensions keep coming to mind, pgvector dealing with Vector data, and being able to store it in a database without having delegate it out into an application certainly is a is a clear win.

The other thing was installing extensions. I think the notion of portability is very powerful. If I have a way to manage my extensions from a SQL interface, it makes it much easier to move it between different systems.

Now, we do need to be careful as soon as we start saying “SQL interface”. I don’t want to suggest that we should have a SQL interface to ship C code everywhere We know there are some challenges with C code. C is highly performant, you can effectively build anything under the sun using C, but it’s not memory-safe, and it’s very easy if you’re not familiar with what you’re doing — and even if you are familiar with what you’re doing! — you can easily make mistakes that could either lead to crashes or or possibly worse.

As we were thinking about all this with TLE, there’s three things. First, we need an interface to be able to install and manage extension code effectively regardless of environment. We need a SQL interface to do that. We also need to make sure there’s an appropriate trust boundary. Now, Postgres provides a trust boundary with the notion of a trusted language. But there are other things we need to build with trust, as well.

For example, you might not want everyone in your environment to be be able to install the a TLE, so we need to make sure there’s appropriate access controls there.

Finally, we need a way to package it up — which I think is what we’re going to talk about at the Extension Summit.

If there are any takeaways from why we built TLE (I think this is the the slide that encapsulates it), it’s that, by using by using Postgres’s built-in trusted language interface it allows you to write extension code in languages that we know are going to respect Postgres security boundaries.

Postgres has this definition of a trusted language which, if you look at for it, you have to effectively dance around the documentation to find it.

But effectively I’d summarize as, if you allow an unprivileged user to write code in a trusted language, they can’t do anything to escalate their privileges, access the file system directly, or do anything that would violate Postgres’s security boundary.

It’s a pretty good definition. Arguably, the easiest way to violate that definition is that you as the database administrator mark an untrusted language as trusted in the catalog. But I strongly advise to not do that!

What trusted languages are available. There’s a great wiki page called the “PL Matrix” on the Postgres Wiki that shows the status of all all the known PLs in Postgres and whether they’re trusted or not. I suggest looking at that.

David Wheeler (he/him) in chat: https://wiki.postgresql.org/wiki/PL_Matrix

Jeremy S in chat: Linux had kernel modules forever, but many people were rightfully hesitant because a kernel module could easily crash your entire system. One of the reasons eBPF is exploding today is because it’s verified and safe and enables code (like entire networking apps) to run directly in the linux kernel.

I see TLE similarly

A big thing regarding a trusted language is performance. There are a variety of trusted languages, and they all have different properties you know around them. The ones I’m showing today are the ones available in RDS Postgres. But the reason I want to show them is that, part of the idea of trusted language extensions is allowing app developers who may be less familiar with C to write extension code and access some of the same internals as a C extension, but from one of these languages.

Here are some of the known trusted languages today that work with TLE. If you’re using the TlE open source project, you can use any available trusted language — or you can use untrusted languages an just use the TLE packaging mechanism. In that case you lose the trusted part, but gain the extension installation aspect of TLE.

There are a few things included in TLE to make sure that TLE can be installed safely. It is an opt-in feature. We do have a shared preload library for pg_tle called “pg underscore TLE”, and you do need to have your database super user install pg_le initially. This ensures that we’re respecting your security boundary, If you’re going to use trusted language extensions, you do have an explicit opt-in to using it.

David Wheeler (he/him) in chat: https://github.com/aws/pg_tle

After that, an app developer can create their own trusted language extension.

Here’s a simple example from the TlE announcement with two functions and packaged into an extension you can install. You can give it a name like any Postgres extension; this one is called “tle_test”. The code looks like the SQL file in any extension. And it’s effectively packaged up like an extension using the pgtle.install_extension command. If you go to the pg_le GitHub project you can see the the different interface points.

Once it’s installed you can use CREATE EXTENSION like any other extension: it follows all the usual Postgres semantics: extension installation, uninstallation, software life cycle management. pg_tle has its own interface for that functionality, bu once you’ve installed it, managing the extension is just like managing any other Postgres extension, and follows those conventions.

Effectively TLE is offering, loosely, a packaging mechanism (I think packaging has a lot more connotations): it’s a grouping mechanism for your code. One of the parts that I always found most useful in pg_tle was this part, effectively versioning my store procedures.

When I talked about the example where I was putting business logic into the database, one part I would usually mess up is: what version of my stored procedures was running in a given database. Particularly if you have hundreds or thousands of databases that you’re managing, that can be a challenge.

Now I had far fewer databases I was managing, I was more dealing with our developer environments: staging and production. But I was managing the store procedures within our migration scripts — which is totally fine, because if I know what version of the migration that I ran then I would know what version of the stored procedures are on that database. Kind of. Sort of. Maybe. You know: unless someone manually modified it — in which case shame on me for giving access to the database. But there basically could be some gaps in knowing what version of a stored procedure was on a particular server.

With pg_le we can significantly reduce the risk of that problem because we have a way to version our store procedures, and be able to know exactly what we’re running at any given time, and create a consistent packaging mechanism wherever we’re running our code. And it goes beyond stored procedures because there’s far more that you can can build with your code.

What else does TLE add? We discussed was the packaging mechanism, but a lot of the power of Postgres extensions is the ability to use the underlying Postgres internals. One of these types of internals is called a “hook”.

Hooks are the Postgres feature that you’ve never heard of, that are not well documented, and yet are the foundational part of many extensions. Hooks are almost everywhere in Postgres. You particularly see a lot of them during the query execution process. For example the process utility hook which allows you to modify any utility command, anything that’s not a direct SQL statement. There are all sorts of hooks: there are password check hooks, client authentication hooks, hooks called around shared memory allocation, hooks called at each step of the the execution phase.

Florents Tselai in chat: Hooks are almost undocumented indeed

The best resources I’ve found:

https://github.com/taminomara/psql-hooks and the standard: https://wiki.postgresql.org/images/e/e3/Hooks_in_postgresql.pdf

Hooks are very powerful; particularly enabling a lot of extensions adding different semantic behavior to Postgres. We could probably do a whole series of talks just on all the different ways you can extend Postgres. I mean, that’s why David has organized the summit! But hooks are very simply a powerful mechanism to define behavior and Postgres.

Because they’re so powerful, for the hooks that we expose in tle we make sure that there is a super user opt-in. Remember, an unprivileged user can define this behavior but you do need someone with privilege to be able to enable something like a hook.

For example, a password check hook probably means that you have the ability to evaluate a plain text password that’s coming through. Now on that topic we can have a very long debate, but let’s save that for Vancouver. But with this hook, you do have the ability to do password checks, so you want to make sure that, when you enable a function that calling a password check hook that there’s a certain level of privilege to that function. Or you you know you want to make sure you do your appropriate evaluation to make sure that you trust that function.

In addition to that check, there’s an additional check from the pg_tle admin role that requires someone with administrative privileges over your TLE to register that hook. The concept of “TLE features” are the way to map hooks into the TLE. We’ve been building it up since we launched TLE by adding a few hooks. There’s both the check password hook and the client authentication hook.

There’s also the ability to register custom data types — which is pretty cool, because data types are what attracted me to Postgres when I was an app developer: “Oh! There are all these data types! I can do all these rich comparisons against an index? Cool! Oh wait, you can even add custom data types? That’s even cooler!”

TLE allows you to create the base data type, so you can really expand the data types that you’re able to add. This is what TLE features does: it enables that safe mapping between trusted language code and the Postgres C internals.

In order to create a hook, you need to match the hook function definition. The TLE documentation documents how to create it appropriately, but it doesn’t need all the parameters that you would find in the hook function.

In this check password hook — I call this the “delay check password test”, meaning you’re probably trying to avoid someone trying to guess your password repeatedly, and if they keep failing so what, because they’re not going to brute force it anyway. There are actually more practical examples of check password hooks. But what’s cool is that you can define everything around the your hook behavior from within the hook function and then it acts as if you wrote a C-based hook! You just happen to write it in a in a trusted language.

Hooks do execute with elevated privileges, particularly around authentication you want to be very careful. So there are some safeguards built into TLE to make sure that you only enable hooks when you want to.

Last but not least: choosing a trusted language. I know this group is more focused on extension building, but I do want to talk about what an app developer goes through when choosing a trusted language.

Because everything has its trade-offs to consider. The Golden Rule (I actually took this from Jim Mlodgensky) is: when in doubt use PL/pgSQL, because it does have a lot of access to context that’s already available in Postgres. What’s interesting about this is that what we see today is based on PL/SQL. PL/pgSQL was developed to try to make it simpler to migrate from Oracle, but at the same time to provide a lot of rich functionality around Postgres.

As someone much more familiar with Ruby and Python, I can tell you that PL/pgSQL can be a little bit quirky. But it is very well documented, and it can solve all the problems that you need to in Postgres. And it already has a lot of very simple ways to directly access your data from Postgres. Certainly an easy choice to go with.

But wait, there’s more!

like PL/v8, writing JavaScript in your database, this is really cool! I remember when it came out and how mind-blowing it was, in particular for JSON processing. PL/v8 is awesome. PL/v8 came out right around the same time as the document database! So you kind of had perfect storm of being able to process JSON and write it in JavaScript — both within your Postgres database and it could be quite powerful.

Another really cool feature of PL/v8 is the ability to directly call another function or another PL/v8 function from within PL/v8, and not have to go through Postgres function processing, which adds a lot of additional overhead.

And now the one that’s all abuzz right now: PL/Rust. Being able to write and execute Rust code within Postgres. This is pretty cool, because Rust is a compiled language! There’s a trusted way to run PL/Rust within Postgres. There are a few techniques to do it. First, whenever you’re running Rust on your server, to make sure that you’re guarding against breakouts.

There is a library, I believe it called postgres FTD, that effectively compiles out some of the less dressed parts of Rust, such as unsafe function calls. But you can still get everything that you want in PL/Rust today: you get the Rust standard Library, the ability to run crates — and you do want to evaluate crates to make sure that you’re comfortable running them in your environment. But then you get this compiled language that is CPU efficient, memory efficient, and memory safe. (Well, a lot of Rust is memory safe) It’s pretty cool!

Steven Miller in chat: In PL/Rust, does it run the compilation when the function is created? Then if there is a compiler issue it just shows up right there?

I wrote a blog post last year that compared some different function calls between PL/pgSQL, PL/v8, and PL/Rust. First I was doing some array processing, and you could see that the Pl/Rust calls were very comparable to the C calls. And then there’s some additional Vector processing, given that I’ve been obsessing on vectors for the past 14 months. Seeing rust actually win against PL/pgSQL and PL/v8 (I don’t remember the numbers off the top of my head I can look up that blog as soon as I switch windows). Pretty cool!

This brings us in some ways to the best of all worlds, because I can take an extension that normally I would write in C, particularly because I’m focused on performance, I can write it in PL/Rust, package it as a trusted language extension, and run it anywhere that TLE and PL/Rust are supported. Again, that is very powerful, because suddenly I have what I hope is the best of all worlds: I have this portability, I don’t have to worry as much about major version upgrades because pg_le is acting as that abstraction layer between the Postgres C code and the application code that I’m writing.

Jeremy S in chat: Versioning of stored procedures is a very interesting use case

Darren Baldwin in chat: Agreed! Moving stuff to the database layer seems to be something very foreign and “scary” to most app devs I’ve talked to

Anup Sharma in chat: Is TLE a requirement for any PostgreSQL extension, or is it dependent?

Steven Miller in chat: So during a major version upgrade, the function declaration stays the same, so that’s why your application doesn’t need to change with respect to the extensions during a major version upgrade. And at some point during the migration, you create the function again, which recompiles. So it all works the same! That’s great

Last slide, then I’m certainly looking forward to discussion. pg_tle is open source, and it’s open source for a lot of reasons. A lot of it is because we want to make sure that trusted language extension are as portable as possible. But in some ways the ideas behind TLE are not original. If you look at other databases there is this notion of, let’s call it inline extensions, or inline SQL, ou call them modules, you call them packages. But the idea is that I can take reusable chunks of code, package them together, and have them run anywhere. It doesn’t matter where the database is located or hosted.

This is something that I personally want to work with folks on figuring out how we can make this possible in Postgres. Because even in Postgres this is not an original idea. Dimitri Fontaine was talking about this as far back as 2012 in terms of his vision of where of the extension framework was going.

What I’m looking forward to about this Extension Summit — and hopefully and hopefully I’m not in conflicting meetings while it’s going on — is talking about how we can allow app developers to leverage all the great parts of Postgres around function writing, function building, and ultimately packaging these functions, and making it simple simpler for them to be able to move it wherever their applications are running.

So it is open source, open to feedback, under active development, continue to add more features to support Postgres. Iltimately we want to hear what’ll make it easier for extension writers to be able to use TLE, both as a packaging mechanism and as a as a development mechanism.

So with that uh I that is the end of my slides and happy to uh get into a discussion about this.

Discussion

David Wheeler (he/him): Awesome, thank you Jonathan. there was one question about PL/Rust in the comments. Stephen asks whether it compiles when you create the function, so if there are compiler issues they they show up there.

Jonathan Katz: Correct It compiles when you create the function and that’s where you’ll get compile errors. I have definitely received my fair share of those [chuckles]. There is a Discord. PL/Rust is developed principally by the folks uh responsible for the pgrx project, the folks at ZomboDB, and they were super helpful and debugging all of my really poor Rust code.

David Wheeler (he/him): While while people are thinking about the questions I’ll just jump in here. You mentioned using crates with PL/Rust. How does that work with pg_le since they have to be loaded from somewhere?

Jonathan Katz: That’s a good question. I kind of call it shifting the problem. TLE solves one problem in that you don’t need to necessarily have everything installed on your on your local file system outside of pg_tle itself. If you’re using PL/Rust and you need crates, you do need those crates available either within your file system or within whatever package management tools you’re using. So it shifts the problem. I think it’s going to be a good discussion, about what we can do to help ensure that there is a trusted way of loading those.

David Wheeler (he/him): Yeah I wonder if they could be vendored and then just included in the upload through the function call.

Anup Sharma asked asked if pg_tle s a requirement any extension or is it dependent.

Jonathan Katz: It’s not requirement. This is a project that is making it possible to write Postgres extensions in trusted languages. There ar plenty of extension authors on this call who have written very, very, very good extensions in C that do not use TLE.

David Wheeler (he/him): You can use trusted languages to write extensions without TLE as well. It’s just a way of getting it into the database without access to the file system, right?

Jonathan Katz: Correct. I think I saw Keith here. pg_partman is PL/pgSQL.

Anup Sharma in chat: Understood. Thanks

Tobias Bussmann in chat: I think it is important not to confuse Trusted Language Extensions TLE with “trusted extensions” which is a feature of Postgres 13

Keith Fiske in chat: Pretty much all of it is. Just the background worker isn’t

Jonathan Katz in chat: hat’s what I thought but didn’t want to misspeak 🙂

David Wheeler (he/him): Right Any other questions or comments or any implications that you’re thinking about through for extension distribution, extension packaging, extension development?

Steven Miller in chat: Is background worker the main thing that a TLE could not do in comparison to traditional extensions?

Jason Petersen: The crates thing kind of raised my interest. I don’t know if Python has this ability to bring in libraries, or if JavaScript has those dependencies as well. But has there been any thought within pg_tle for first classing the idea of having a local subdirectory or a local file system layout for the “native” dependencies? I’m using “native” in quotes here because it could be JavaScript, it could be Python, whatever of those languages, so they could be installed in a way that’s not operating system independent.

I know this is kind of a complex setup, but what I’m getting at is that a lot of times you’ll see someone say “you need to install this package which is called this and Red Hat or this on Mac or this on Debian — and then you can install my extension. Has there been any push towards solving that problem by having your TLE extensions load things from like a a sort of Walled Garden that you set up or something? So it’s specific to the database instead of the OS?

Jonathan Katz: That’s a good question. There has been thought around this. I think this is going to be probably something that requires a thorough discussion in Vancouver. Because if you look at the trusted languages that exist in Postgres today, the definition of trusted language is: thou shall not access the file system. But if you look at all these different languages, they all have external dependencies in some in some way shape or form. Through Perl there’s everything in CPAN; through JavaScript there’s everything in npm. Let’s say installed the appropriate CPAN libs and npm libs within uh your database for everything I recall from playing with trusted PL/v8 and PL/Perl is that you still can’t access those libraries. You can’t make the include or the require call to get them.

Where PL/Rust is unique is that first off we just said, “yes, you can use your Cargo crates here.” But I think that also requires some more thinking in terms of like how we make that available, if it’s OS specific, vendor specific, or if there’s something universal that we can build that helps to make that more of a trusted piece. Because I think at the end of the day, we still want to give the administrative discretion in terms of what they ultimately install.

With the trusted language extensions themselves, we’re able to say, “here’s the post security boundary, we’re operating within that security boundary.” As soon as we start introducing additional dependencies, effectively that becomes a judgment call: are those dependencies going to operate within that security boundary or not. We need to be make sure that administrators still have the ability to to make that choice.

I think there are some very good discussion topics around this, not just for something like PL/Rust but extension distribution in general I think that is you know one of the I think that’ll be one of the key discussions at the Extension Summit.

David Wheeler (he/him) in chat: What if the required modules/packages/whatever were in a table. e.g. in Perl I do use Foo::Bar and it has a hook to load a record with the ID Foo::Bar from a table

David G. Johnson: Has there been any thought to having the default version of an extension tied to the version of PostgreSQL? Instead of it just being 1.3 and, whether I’m on version 12 or 15, because 1.3 might not even work on version 12 but it would work on version 15. The versioning of the an extension and the versioning of PostgreSQL seem like they’re almost too independent.

Jonathan Katz: So David, I think what you need to do is chastise the extension developers to let them know they should be versioning appropriately to to the the version of Postgres that they’re using. [Chuckles]

There is a good point in there, though. There is a lot of freedom in terms of how folks can build extensions. For example, just top of mind, pgvector supports all the supported versions of Postgres. Version 0.7.0 is going to be coming out soon so it’s able to say, “pgvector 0.7.0 works with these versions.” Dumb. PG plan meanwhile maintains several back releases; I think 1.6.0 is the latest release and it only supports Postgres 16. I don’t believe it supports the earlier versions (I have to double check), but there’s effectively things of that nature.

And then there aer all sorts of different things out there, like PostGIS has its own life cycles. So there’s something good in that and maybe the answer is that becomes part of the control file, saying what versions ov Postgres an extension is compatible with. That way we’re not necessarily doing something to break some environment. I’m just brainstorming on on live TV.

David G. Johnson: The other day I open a but report on this. but PostgreSQL dump and restore will dump it without the version that’s in the source database, and when yoq restore it, it’s going to restore to whatever the current version for the control file is even if you’re upgrading to a different database. versus restoring it to whatever the original version was. That dynamic just seemed problematic.

David Wheeler (he/him): I think it’s less problematic for trusted language extensions or extensions that have no C code in them, because pg_dump does dump the extension, so you should be able to load it up. I assume base backup and the others do the same thing.

David G. Johnson: I haven’t checked into that. It dumps CREATE EXTENSION and then it dump any user tables that are marked by the extension. So these code tables are marked as being user tables for TLE?

David Wheeler (he/him): What do you mean by code tables?

Regina Obe: That’s a good point. For example my Tiger geocoder is all PL/pgSQL, but it’s only the CREATE EXTENSION thing that’s named. So for your TLE table, it would try to reload it from the original source, wouldn’t it? In which case it would be the wrong version.

Jonathan Katz: We had to add some things into TLE to make sure it worked appropriately with pg_dump. Like I know for a fact that if you dump and load the extension it works it works fine. Of it doesn’t then there’s a bug and we need to fix it.

David G. Johnson: Okay yeah I haven’t played with this. Literally this is new to me for the most part. I found the whole fact that the control file is not updated when you do ALTER EXTENSION to be, at least in my mind, buggy.

Jonathan Katz: In the case of TLE, because it’s in theory major version-agnostic. When I say “in theory,” it’s because we need to make sure the TLE code in library itself is able to work with every major version. But once that’s abstracted away the TLEs themselves can just be dumped and reloaded into different versions of Postgres. I think we I we have a TAP test for that, I have to double check. But major version upgrades was something we 100% tested for

David Wheeler (he/him): I assume it’d be easier with pg_tle since there’s no need to make sure the extension is is installed on the file system of the new server.

Jonathan Katz: Yep. if you look at the internals for pg_tle, effectively the TLEs themselves are in a table. When you do a CREATE EXTENSION it gets loaded from that particular table.

David G. Johnson: Right, and when you do a pg_dump you make suer that table was dumped to the dump file.

Jonathan Katz: Yes. But this is a key thing that we we had to make sure would does work: When loading in a pg_dump, a lot of the CREATE EXTENSIONS get called before the table. So we need to make sure that we created the appropriate dependency so that we load the TLE data before the CREATE EXTENSION. Or the CREATE EXTENSION for the TLE itself.

Jeremy S in chat, replying to “Is background worker the main…”: doing a background worker today, I think requires working in C, and I don’t think core PG exposes this yet. Maybe it could be possible to create a way to register with a hook to a rust procedure or something, but maybe a better way in many cases is using pg_cron

Jonathan Katz in chat: We can add support for BGWs via the TLE API; it’s just not present currently.

nils in chat: Creative thinking, if a background worker doesn’t work in TLE, how about create your UDF in tle and schedule with pg_cron 🤡

David Wheeler (he/him): You mentioned in the comments that you think that background workers could be added. How would that work?

Jonathan Katz: It would be similar to the the other things that we’ve added, the data types and the hooks. It’s effectively creating the interface between the C API and what we’d expose as part of the TLE API. It’s similar to things like pgrx, where it’s binding to Postgres C API but it’s exposing it through a Rust API. We do something similar with the TLE API.

Steven Miller in chat: Thank you Jeremy. I like this idea to use TLE, then depend on cron for a recurring function call

Steven Miller in chat: Ah yes Nils same idea 😄

Jason Petersen in chat: Thumbs up to nils about pgcron. If you need a recurring BGW just write it in plpgsql and schedule it

nils in chat: Great hackers think alike

Jason Petersen in chat: (I know I do this)

David Wheeler (he/him): That that makes sense. I just thought the background workers were literally applications that are started when the postmaster starts up shut down when the postmaster shuts down.

Jonathan Katz: But there’s dynamic background workers.

David Wheeler (he/him): Oh, okay.

Jonathan Katz: That’s how a parallel query works.

Jeremy S in chat: Threading? 😰

David Wheeler (he/him): Gotcha, okay. Sorry my information’s out of date. [chuckles]

Jonathan Katz: Well maybe one day we’ll have you know some some form of threading, too. I don’t think like we’ll get a wholesale replacement with threads, but I think there are certain areas where threads would help and certain areas workers are the way to go/

David Wheeler (he/him): Yeah, yeah that makes sense.

Jonathan Katz: Hot take!

David Wheeler (he/him): What other questions do you have for about TLEs or extensions more broadly and packaging in relation to TLEs?

David G. Johnson: Just a random thought: Have you thought about incorporating foreign servers and pointing the TLE, instead of a local database, point it to a master, company-wide foreign table?

David Wheeler (he/him): Like a TLE registry?

David G. Johnson: Right, yeah something global would be nice. like okay we hosted on PGXN at there’s a TLE registry. But because for a company who wants maintain code internally between projects, and they want a shared library, they can publish it on one server, send up a link to it over foreign server, and then just point at that.

Jonathan Katz: Could be!

David Wheeler (he/him): I mean you could just use foreign foreign tables for that for the tables that TLE uses for its its registry, right?

David G. Johnson: That’s I’m thinking.

David Wheeler (he/him): Yeah that’s a cute idea.

Jonathan Katz: I think that just to to go back a few more minutes. I think you I was asked to talk about the vision. One one way to view extensions is trying things out before they’re in core, or before they’re in Postgres. The aspect that I would ultimately like to see in core someday is the ability to do that’s called “inline modules.” There is a SQL standard syntax, CREATE MODULE, that for this purpose. Some folks were trying to see see if we could get it into, I believe, Postgres 15. There was some push back on the design and it died on the vine for the time being.

But I do think it’s something to consider because when I talk to folks, whether it’s random Postgres users RDS customers, etc., and I go through TLE, one of the things that really stands out is one of the things that we had discussed here and I saw in the chat, which is this aspect: being able to version your stored procedures. This is in part what modules aims to solve. One is just having a SQL interface to load all these things and group it together. But then once you have that grouping you have the ability to version it. This is the part that’s very powerful. As soon as I saw this I was like, “man I could have used that that would have saved me like hours of debugging code in production.” Mot saying that I was ever sloppy and you know in random store procedures in my production database!

David Wheeler (he/him) in chat: I see CREATE MODULE in the db2 docs.

Jonathan Katz: But that’s kind of the vision. The fact that Postgres is extensible has led to this very widely adopted database. But I think there are things that we can also learn in our extensions and bring back upstream. There are certainly reasons why they we developing things in extensions! Like pgvector is an example of that, where we talked about it at PGCon last year. And part of the thought of not trying to add a vector data type to Postgres was, first, to make sure we could settle on what the the binary format would be; and once that’s solidified, then we could add it.

But I had an aside with Tom [Lane] where we talked about the fact that this is something we need to move fast on, the vector space is moving very quickly, extensions are a way to be able to move quickly when something like Postgres moves more deliberately.

This is in some ways where TLE is, our way to be able to see what kind of interface makes sense for being able to do inline extension loading and ultimately how we want that to look in core.

David Wheeler (he/him): Can you create data types with a binary representation in TLE?

Jonathan Katz: Yes as of (I want to say) the the 1.3 release. I have to double check the version. The way we’re able to do it safely is that it actually leverages the BYTEA type. When you create that representation it stores it as a BYTEA. What you get for free today is that, if you create your equality/inequality operators, you can use a b-tree look up on these data types.

So there’s a “dot dot dot” there. If we wanted to be able to use like GIST in GIN and build data types for our other index interfaces, there’s more work to be done. That would require a TLE interface. I spent a lot of time playing with GIST and GIN, and the interface calls themselves involve pointers. So that will require some thought yeah.

David Wheeler (he/him): I assume it’s a similar issue for Rust data types that are basically just serde-serialized.

Jonathan Katz: Yeah we can at least like store things in BYTEA, and that’s half the battle. It allows us to do a safe representation on disk as opposed just “here’s some random binary; good luck and don’t crash the database!”

Jason Petersen in chat: I also wondered about the function interface for things like storage features (table access methods).

I assume they’re similarly hairy

David Wheeler (he/him): Any other last minute questions?

Jonathan Katz: Table access methods. Yes table access methods are very hairy as are index access methods. I spent a lot of time the past 14 months looking at the index access method interface, which has a lot of brilliance in it, and certainly some more areas to develop. But it’s amazing! The fact that we can implement vector indexes and get all the best parts of Postgres is a phenomenal advantage.

Jeremy S: One last question. We’re leading up to Vancouver and we’re going to be starting to think about some of the topics that we want to make sure to talk about at the Summit. I think you mentioned one earlier (I should have written it down), but any final thoughts about topics that we should make sure to discuss?

Jonathan Katz: Just in general or TLE specific?

Jeremy S: Both. I mean for sure TLE-specific, but also just generally related to extensions

Jonathan Katz: My TLE-specific one dovetails into the general one. The first one is: is there ultimately a path forward to having some kind of inline extension management mechanism in core Postgres. That’s the top, part one, I spent the past five minutes talking about that.

But I think the big thing, and why we’re all here today, is how do we make it easier for developers to install extensions, manage extensions, etc. I think the notion of package management thanks to the work of Andres finding the backdoor to xz also shines a new light, because there’s a huge security component to this. I remember, David, some of our earlier chats around this. I think you know —- again, being ap-developer sympathetic — I definitely want to see ways to make it easier to be able to load extensions.

Having spend spent a lot of time on the other side, the first thing that comes to mind is security. How do we create a protocol for managing the extension ecosystem that also allows folks to opt into it and apply their own security or operational or whatever the requirements are on top of it. That’s the thing that’s most top of mind. I don’t expect to have like a full resolution from the Extension Summit on it, but at least the start of it. What is ultimately that universal packaging distribution protocol for Postgres extensions that we can all agree on?

David Wheeler (he/him): Thank you so much! Before we go I just wanted to tee up that in two weeks Yuri Rashkovskii is going to talk about his idea for universally buildable extensions: dev to prod. That’ll be on May 1st at noon Eastern and 4pm UTC. Thank you everybody for coming.

More about… Postgres Jonathan Katz TLEs PGXN Extensions PGConf Summit

Thursday, 25. April 2024

The Pragmatic Engineer

The Pulse #91: The end of US non-competes within sight?

Also: the Humane AI pin flop and how it relates to the “AI goldrush,” and a look at whether developers will move from Redis to Valkey after a license change, or stay.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. IBM buys HashiCorp; Google’s new operating reality; Section 174 still not repealed; Meta’s unexpected AI play, and more.

End of non-competes within sight in the US? The US Federal Trade Commission issued a ruling that would ban almost all non-compete agreements nationwide. If this passes, NDAs could become a lot more important for tech companies. However, the rule passing is far from a done deal: whether or not the rule lives will be decided in the courtroom. A closer look at the proposed changes.

The Humane AI pin flop, and “AI goldrush.” After years of developing hardware for the “new iPhone,” startup Humane suddenly pivoted to an AI use case. This bet was made soon after ChatGPT was released. Did Humane expect ChatGPT to improve much faster than it has?

Redis or Valkey? It’s only been a month since Valkey – a permissively licensed Redis ”continuation” announced its formation. I asked developers if they are planning to switch: and a quarter said they do.

1. Industry pulse IBM buys HashiCorp

Read more

Wednesday, 24. April 2024

Just a Theory

Test and Release pgrx Extensions with pgxn-tools

The v1.6.0 release of the pgxn-tools Docker image adds a new command to efficiently build and test pgrx extensions on a wide variety of Postgres versions.

Yesterday I released v1.6.0 of the pgxn/pgxn-tools Docker image with a new command: pgrx-build-test works much like the existing pg-build-test utility for PGXS extensions, but for pgrx extensions. Here’s an example from pg-jsonschema-boon, a pgrx extension I’ve been working on:

name: 🧪 Test on: push: jobs: test: runs-on: ubuntu-latest container: pgxn/pgxn-tools strategy: matrix: pg: [11, 12, 13, 14, 15, 16] name: 🐘 Postgres ${{ matrix.pg }} steps: - name: Checkout uses: actions/checkout@v4 - name: Start PostgreSQL ${{ matrix.pg }} run: pg-start ${{ matrix.pg }} - name: Setup Rust Cache uses: Swatinem/rust-cache@v2 - name: Test on PostgreSQL ${{ matrix.pg }} run: pgrx-build-test

The format is the same as for pg-build-test, starting with installing a specific version of Postgres from the Postgres Apt repository (supporting versions 8.2 – 17). It then adds the Swatinem/rust-cache action to speed up Rust builds by caching dependencies, and then simply calls pgrx-build-test instead of pg-build-test. Here’s what it does:

Extracts the pgrx version from the Cargo.toml file and installs it (requires v0.11.4 or higher) Initializes pgrx to use the Postgres installed by pg-start Builds the extension with cargo pgrx package Tests the extension with cargo pgrx test Installs the extension with cargo pgrx install Checks for a Makefile with installcheck configured and, if it exists, runs make installcheck

This last step allows one to include PGXS-style pg_regress tests in addition to Rust/pgrx tests, as pg-jsonschema-boon does. Here’s a successful run.

Special thanks to Eric Ridge and @Jubilee for all the help and improvements in pgrx v0.11.4 that enable this to work transparently.

pgrx Release Pattern

The pattern for releasing a prgx extension on PGXN is the same as before, although you may want to generate the META.json file from a template. For example, the pg-jsonschema-boon Makefile creates META.json from META.json.in by reading the version from Cargo.toml and replacing @CARGO_VERSION@, like so:

DISTVERSION = $(shell perl -nE '/^version\s*=\s*"([^"]+)/ && do { say $$1; exit }' Cargo.toml) META.json: META.json.in Cargo.toml @sed "s/@CARGO_VERSION@/$(DISTVERSION)/g" $< > $@

The release workflow uses it like so:

name: 🚀 Release on PGXN on: push: # Release on semantic version tag. tags: ['v[0-9]+.[0-9]+.[0-9]+'] jobs: release: name: 🚀 Release on PGXN runs-on: ubuntu-latest container: pgxn/pgxn-tools env: PGXN_USERNAME: ${{ secrets.PGXN_USERNAME }} PGXN_PASSWORD: ${{ secrets.PGXN_PASSWORD }} steps: - name: Check out the repo uses: actions/checkout@v4 - name: Bundle the Release env: { GIT_BUNDLE_OPTS: --add-file META.json } run: make META.json && pgxn-bundle - name: Release on PGXN run: pgxn-release

Note the “Bundle the Release” step, which first calls make META.json to generate the dated file, and tells pgxn-bundle to add the META.json via the GIT_BUNDLE_OPTS environment variable. The project also excludes the META.json.in file from the bundle in its .gitattributes file, and excludes META.json from the project repository in its .gigignore file.

Looking forward to seeing all your pgrx projects on PGXN!

More about… Postgres PGXN pgrx pgxn-tools

Patrick Breyer

Piraten: Mit EU-Barzahlungsobergrenze und Verbot anonymer Kryptowallets droht schleichende finanzielle Entmündigung

Heute am späten Nachmittag werden die Abgeordneten des EU-Parlaments im Plenum mehrheitlich einschneidenden neuen Anti-Geldwäschegesetzen zustimmen: Anonyme Barzahlungen über 3.000 € werden im geschäftlichen Verkehr verboten. Barzahlungen über 10.000 € …

Heute am späten Nachmittag werden die Abgeordneten des EU-Parlaments im Plenum mehrheitlich einschneidenden neuen Anti-Geldwäschegesetzen zustimmen: Anonyme Barzahlungen über 3.000 € werden im geschäftlichen Verkehr verboten. Barzahlungen über 10.000 € werden im geschäftlichen Verkehr sogar komplett verboten. Anonyme Bezahlkarten wie Paysafecard werden beschränkt. Und anonyme von Anbietern bereitgestellte Geldbörsen für Kryptowährungen (sog. hosted wallets) werden ohne Schwellenwert schon bei Minimalguthaben untersagt. Der Europaabgeordnete und digitale Freiheitskämpfer Dr. Patrick Breyer (Piratenpartei) erklärt seine Gegenstimme wie folgt:

“Wir Piraten verurteilen diesen EU-Krieg gegen das Bargeld, das seit Menschengedenken unsere finanzielle Freiheit und Privatsphäre sichert. Mit der schleichenden Abschaffung des Bargelds drohen Negativzinsen und das jederzeitige Abdrehen der Geldversorgung über Kartensperrungen. Die Abhängigkeit von Banken nimmt bedrohlich zu. Das gefährdet auch die Arbeit von Regierungskritikern, wie das Abschneiden von Wikileaks von Kreditkartenspenden vor einigen Jahren zeigt. Diese finanzielle Entmündigung gilt es zu stoppen!

Wir müssen stattdessen Wege finden, die besten Eigenschaften von Bargeld in unsere digitale Zukunft mitzunehmen. Auch im Netz haben wir ein Recht darauf, in Kryptowährungen bezahlen und spenden zu können, ohne dass unser Zahlungsverhalten anlasslos und personenbezogen aufgezeichnet wird. Dass die EU anonyme Geldbörsen für Kryptowährungen bei Anbietern (hosted wallets) ohne jeden Schwellenwert verbietet, während selbstbetriebene Wallets anonym bleiben, zeigt die Sinnlosigkeit dieser Repressivgesetze. Wenn die EU glaubt, virtuelle Währungen im Alleingang regulieren zu können, hat sie das weltweite Internet ohnehin nicht verstanden.

Anonyme Zahlungen in Bargeld zu begrenzen und anonyme Kryptowallets zu verbieten, hat bestenfalls minimal senkende Effekte auf die Kriminalität, nimmt aber unbescholtenen Bürgern die finanzielle Freiheit. Zum Einsammeln von Spenden sind Dissidenten wie der verstorbene Alexej Navalny und seine Ehefrau weltweit heute zunehmend auf anonyme Spenden in virtuellen Währungen angewiesen. Wo jede Zahlung erfasst und für immer gespeichert wird, drohen Hackerangriffe, unberechtigte Ermittlungen und eine abschreckende Staatsaufsicht über jeden Kauf und jede Spende.”

Die Spitzenkandidatin der Piratenpartei zur Europawahl Anja Hirschel unterstreicht: „Bargeldzahlungen werden immer weiter eingeschränkt und gleichzeitig anonyme Kryptowallets im Netz verboten. Dies führt zu einer immer detaillierteren Nachverfolgbarkeit unseres Konsums und Privatlebens. Wie viel ich wofür wann und wo ausgebe, lässt immer genauere Rückschlüsse auf mich als Person zu. In Zusammenhang mit den nationalen Diskussionen um Bezahlkarten für verschiedene Menschengruppen kann dies ein Baustein hin zu einer gesetzliche Grundlage zur späteren Profilbildung und Kontrolle selbst von legalen Ausgaben darstellen.“

Hintergrund zur Barzahlungsobergrenze:

2017 fragte die EU-Kommission schon einmal die Öffentlichkeit nach ihrer Meinung zur Begrenzung von Barzahlungen. Mehr als 90 % der antwortenden Bürger sprachen sich gegen einen solchen Schritt aus. Die Befragten hielten das anonyme Bezahlen mit Bargeld für eine “essentielle persönliche Freiheit” und waren der Meinung, dass “Beschränkungen für Bargeldzahlungen ineffektiv sind, um die möglichen Ziele (Bekämpfung von kriminellen Aktivitäten, Terrorismus, Steuerhinterziehung) zu erreichen”. Laut einer EZB-Umfrage verwenden bis zu 10% der Bürger Bargeld auch für Beträge über 10.000 € (z.B. beim Autokauf). Nach Berechnungen von Schattenwirtschaftsfachmann Friedrich Schneider von der Universität Linz hat das Verbot großer Barzahlungen „nur minimale senkende Effekte auf die Schwarzarbeit oder die Kriminalität“.

Hintergrund zu Kryptowährungen:

Im Gegensatz zu Bargeld, das völlig anonym ist, können Transaktionen mit Kryptowährungen gerade im Fall von Bitcoin über die “Blockchain” nachvollzogen werden. Wo virtuelle Vermögenswerte in der Vergangenheit für kriminelle Aktivitäten verwendet wurden, war eine strafrechtliche Verfolgung möglich, etwa indem ungewöhnliche Muster erkannt und Verdächtige identifiziert wurden. Einige Kriminelle haben sich im Laufe der Zeit selbst de-anonymisiert, und jeder Kriminelle wird irgendwann seine digitalen Gelder gegen echtes Geld eintauschen müssen. Virtuelle Guthaben sind von geringer Relevanz für das globale Finanzsystem. Es gibt keine belastbaren Nachweise dafür, dass virtuelle Währungen nennenswert zur Geldwäsche genutzt würden. EU-Vorschriften können durch Verwendung von Nicht-EU-Wallet-Diensten leicht umgangen werden. Virtuelle Vermögenswerte können auch direkt von einer Person zu einer anderen übertragen werden, ohne Zwischenhändler einzuschalten (unhosted wallets), so dass Identifizierungspflichten leerlaufen.

Die Verordnung im Wortlaut


EU-Gesundheitsdatenraum: Kniefall vor Big Tech und Big Pharma

Die Abgeordneten des EU-Parlaments werden heute Mittag in Plenum die Verordnung zur Schaffung eines „Europäischen Gesundheitsdatenraums“ (EHDS) abgesegnen. Danach sollen zu allen Patienten Informationen über jede medizinische Behandlung einschließlich Impfstatus, Medikamente …

Die Abgeordneten des EU-Parlaments werden heute Mittag in Plenum die Verordnung zur Schaffung eines „Europäischen Gesundheitsdatenraums“ (EHDS) abgesegnen. Danach sollen zu allen Patienten Informationen über jede medizinische Behandlung einschließlich Impfstatus, Medikamente und Schwangerschaften, Labor- und Entlassberichte digital gespeichert und europaweit abrufbar gemacht werden – auch zu den in Deutschland bisher nicht von der elektronischen Patientenakte erfassten Privatpatienten. Zugang sollen europaweit eine Vielzahl von Stellen erhalten können, das von der Bundesregierung versprochene Widerspruchsrecht gegen Datenzugriffe aus dem Ausland ist nicht vorgesehen. Als Schattenberichterstatter und Mitverhandler der Verordnung für den Innenausschuss (LIBE) warnt der Europaabgeordnete der Piratenpartei Dr. Patrick Breyer vor einem Kontrollverlust der Patienten über sensibelste Gesundheitsdaten und einer Aufgabe des Arztgeheimnisses. Die Piraten werden deshalb heute gegen die Verordnung in ihrer geplanten Ausgestaltung stimmen.

“Wir Piraten unterstützen die Idee eines EU-Gesundheitsdatenraums, aber nicht um den Preis der Aufgabe des Selbstbestimmungsrechts der Patienten und des Arztgeheimnisses zugunsten von Gesundheitsdatenabgriffen durch Regierungen, Big Pharma und Big Tech. Es gibt nichts intimeres als Informationen über unsere körperliche und geistige Gesundheit, einschließlich unserer Suchtkrankheiten, psychischer Störungen, Schwangerschaftsabbrüche bis hin zu Geschlechtskrankheiten und Reproduktionsstörungen. Wenn wir uns nicht darauf verlassen können, dass diese Informationen bei unseren behandelnden Ärzten sicher sind, lassen wir uns vielleicht nicht mehr behandeln mit schlimmen Folgen für unsere Gesundheit bis hin zu Suiziden. Von einem EU-Gesundheitsdatenraum können grenzüberschreitende Behandlungen und Forschung profitieren, aber diese Vorteile hätte man auch auf der Grundlage einer Einwilligung der Patienten und mit vollständiger Datenanonymisierung haben können.

Die schlussendlich vereinbarte Ausgestaltung der EU-Verordnung widerspricht dem aus Umfragen bekannten Willen der Patienten. Patienten werden vor der identifizierbaren Weitergabe ihrer Gesundheitsdaten nicht gefragt und haben je nach EU-Land nicht einmal ein Widerspruchsrecht, Daten können außerhalb der EU gespeichert werden auf Systemen ohne unabhängige Sicherheitsprüfung – all das zeigt, dass die Verordnung im Sinne maximaler Verwertung unserer persönlicher Gesundheitsdaten und nicht im Interesse der Patienten ausgestaltet worden ist. Diese Verordnung verrät im Profitinteresse der Industrie die Interessen und den Willen der Patienten, um mit ihren identifizierbaren Daten Produkte entwickeln und KI-Algorithmen trainieren zu können. Einer solchen Entmündigung der Patienten erteilen wir Piraten eine klare Absage!”

Anja Hirschel, medizinische Informatikerin und Spitzenkandidatin der Piratenpartei für die Europawahl 2024, ergänzt: “Eine zentrale Datenspeicherung weckt Begehrlichkeiten in verschiedenste Richtungen. Wir sprechen dabei allerdings nicht nur von Hackerangriffen, sondern von der sogenannten Sekundärnutzung. Diese bezeichnet Zugriffe, die zu Forschungszwecke vollumfänglich gewährt werden sollen. Die Patientendaten sollen dann an Dritte weitergegeben werden. Aus Datenschutzsicht ist bereits das zentrale Ansammeln problematisch, bei Weitergabe wenigstens ein Opt-In Verfahren (aktive Einwilligung) richtig. Dies würde eine gewisse Entscheidungshoheit jedes Menschen über die persönlichen Daten ermöglichen. Wird allerdings nicht einmal ein Opt-Out Verfahren (aktiver Widerspruch) etabliert, so bedeutet dies letztlich die Abschaffung der Vertraulichkeit jeglicher medizinischer Information. Und das obwohl Ärzte in Deutschland gemäß § 203 StGB berufsständisch zurecht der Schweigepflicht unterliegen, wie u.a. auch Rechtsanwälte. Dieser Schutz unserer privatesten Informationen und das Recht auf vertrauliche Versorgung und Beratung stehen jetzt auf dem Spiel.”

Auf Initiative von Breyer und anderen Abgeordneten konnte immerhin ein europaweiter Zwang zur elektronischen Patientenakte verhindert werden. Laut Artikel 8h und Erwägungsgrund 13a der Verordnung ist das deutsche und österreichische Widerspruchsrecht gegen die Einrichtung einer elektronischen Patientenakte gerettet. Im Fall eines Widerspruchs werden die Pflichtinformationen nur beim behandelnden Arzt gespeichert. “Ich selbst werde dieser elektronischen Patientenakte widersprechen, um nicht die Kontrolle über meine Gesundsheitsdaten zu verlieren”, erklärt Breyer. “Wir wissen aber, dass die wenigsten Patienten, die Fremdzugriffe auf ihre Daten ablehnen, tatsächlich das komplizierte Widerspruchsverfahren durchlaufen.”

Auf Anfrage Breyers hat die EU-Kommission kurz vor der Abstimmung bestätigt, dass das von der Bundesregierung versprochene Widerspruchsrecht gegen ausländische Zugriffe auf Gesundheitsdaten in der endgültigen Fassung der Verordnung “nicht vorgesehen” ist. “Wer der elektronischen Patientenakte oder ihrer Auswertung nicht insgesamt widerspricht, ermöglicht damit zwangsweise auch einen grenzüberschreitenden Zugriff darauf durch ausländische Behandler, Forscher und Regierungen. Das von der Bundesregierung geplante Recht speziell grenzüberschreitenden Datenzugriffen widersprechen zu können, ist in der Verordnung nicht rechtssicher vorgesehen. Das widerspricht dem Interesse und Willen der Patienten, von denen laut Meinungsumfrage nur eine Minderheit einen grenzüberschreitenden europaweiten Zugriff auf ihre Patientenakte wünscht”, so Breyer.

Hintergrund: Laut Umfragen wollen mehr als 80% der EU-Bürger selbst über die Weitergabe ihrer Patientenakten entscheiden. Mehrheitlich wollen sie um Einwilligung gebeten werden. Dies sieht die Verordnung aber nicht vor. Ohne Einwilligung des Patienten erhalten künftig europaweit auch Gesundheitsministerien und Gesundheitsbehörden, Universitäten, zur Forschung, Produktentwicklung und zum Trainieren ‚künstlicher Intelligenz‘ auch Technologieunternehmen und Pharmaindustrie Zugang zu anonymisierten und personenbezogenen identifizierbaren (nur pseudonymisierten) Patientenakten – es sei denn der Patient widerspricht ausdrücklich. Kein Widerspruchsrecht gibt es in Deutschland gegen die Weitergabe medizinischer Registerdatensätze und von Abrechnungsdatensätzen. Im EU-Ausland müssen die nationalen Gesundheitssysteme auch weiterhin kein Widerspruchsrecht vorsehen.

Die Bundesregierung unterstützt die EU-Pläne. Kritik äußerten dagegen der europäische Verbraucherverband BEUC und das Datenschutznetzwerk EDRi.


Jon Udell

The future of SQL: conversational hands-on problem solving

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/the-future-of-sql-conversational-hands-on-problem-solving/ I keep returning to the theme of choral explanations (#4 on my list of best practices), and it’s especially relevant in the SQL domain where there are just so many ways to write a query. Exploring the range of possibilities used to be … Continue reading

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/the-future-of-sql-conversational-hands-on-problem-solving/

I keep returning to the theme of choral explanations (#4 on my list of best practices), and it’s especially relevant in the SQL domain where there are just so many ways to write a query.

Exploring the range of possibilities used to be arduous, time-consuming and hard to justify. Now it’s becoming hard to justify not doing that; optimizations (sometimes major ones) can and do emerge.

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

18 Using AI to Improve Bad Business Writing

19 Code in Context: How AI Can Help Improve Our Documentation

Tuesday, 23. April 2024

Just a Theory

PGXN V2 Architecture

I’ve written and published a document outlining the vision and strategy for the next iteration of PGXN. It includes diagrams. Everybody loves diagrams.
PGXN Future Architecture

High-level diagram of the six logical services making up the proposed future extension distribution architecture. The Root Registry sits at the center, providing APIs for the other services to consume for their own use cases. Trusted instances of those services submit additional data about extensions via the Interactions service to enhance and enrich the service to better inform and delight users.

Over on the Postgres Wiki I’ve published a new document for the PGXN v2 project: PGXN v2 Architecture. It has diagrams, such as the one above! From the introduction:

This document outlines the project to build extension distribution, discovery, and packaging tools and services to power the growth, accessability, and utility of the Postgres extension ecosystem. Taking the overall Postgres community as its audience, it defines the services to be provided and the architecture to run them, as well as the strategic vision to guide project planning and decision-making.

With the goal to think strategically and plan pragmatically, this document describes the former to enable the latter. As such, it is necessarily high-level; details, scoping, and planning will be surfaced in more project-focused documents.

Bear in mind that this document outlines an ambitious, long-term strategy. If you’re thinking that there’s too much here, that we’er over-thinking and over-designing the system, rest assured that project execution will be fundamentally incremental and pragmatic. This document is the guiding light for the project, and subject to change as development proceeds and new wrinkles arise.

For those of you interested in the future of Postgres extension distribution, please give it a read! I expect it to guide the planning and implementation of the the new services and tools in the coming year. Please do consider it a living document, however; it’s likely to need updates as new issues and patterns emerge. Log in and hit the “watch” tab to stay in the loop for those changes or the “discussion” tab to leave feedback.

I’ve also moved the previously-mentioned document Extension Ecosystem: Jobs and Tools to the wiki, and created a top-level PGXN v2 and PGXN category for all PGXN-related content. It also includes another new document, Service Disposition, which describes itself as:

A summary of the ambitiously-envisioned future PGXN services and architecture, followed by an examination of existing services and how they will gradually be refactored or replaced for the updated platform.

Check it out for how I expect existing services to evolve into or be replaced by the updated platform.

More about… Postgres PGXN Architecture

The Pragmatic Engineer

Building Bluesky: a Distributed Social Network (Real-World Engineering Challenges)

Bluesky is built by around 10 engineers, and has amassed 5 million users since publicly launching in February this year. A deep dive into novel design decisions, moving off AWS, and more.

Before we start: AI tooling for software development feels like it has hit "peak hype" across mainstream media. We would like to do a "reality check" and find out how engineers and teams are using these tools (and which tools/use cases are genuinely efficient). Please help us by filling out this survey.

Fill out the survey on AI tools

We will share the full report with all of you who share detailed insights. Thanks for your help!

‘Real-world engineering challenges’ is a series in which we interpret interesting software engineering or engineering management case studies from tech companies.

Bluesky is known as a Twitter-alternative. It launched two years ago, with an invite-only beta launch last year. It’s already grown to an impressive 5.5 million registered users. Interestingly for software engineers, Bluesky is also a fascinating engineering project unlike any other mainstream social network. Martin Kleppman, author of the Designing Data Intensive Applications book, is involved as a technical advisor, and has published a paper outlining the novel approaches Bluesky has taken.

The biggest differences between Bluesky and other large social networks:

Decentralized. Bluesky is a “decentralized social network,” meaning anyone can run their own servers. If Bluesky’s core team turned off all services today, the network would keep functioning. As such, Bluesky offers a way for users to truly own their data and services.

Open source. Nearly everything about Bluesky builds is open source, and hosted on GitHub.

Rapid growth. The product went from zero to 5 million users in around 12 months after announcing an invite-only beta.

Small team. Bluesky was built with a small team of 3 engineers during the first year, and with 12 software engineers at the time of publication.

Other social networks have achieved some of these things; such as Mastodon allowing users to own their data and identity, and Meta achieving eye-catching growth by getting 100 million users in just a week. Still, only Bluesky has pulled off them all.

Today, we dive into how Bluesky is built, sitting down with its two founding engineers: Daniel Holmgren and Paul Frazee. They take us through:

Development timeline. How Bluesky went from a vague idea with few specific details, to a decentralized social network with millions of users.

Experimentation phase. A team of 2-3 engineers prototyped for 9 months, established the development principles, and laid the groundwork for the protocol and app.

v1 architecture. An overview of Bluesky’s architecture at the launch of its public beta offering. This was a Postgres database built on top of AWS, and used Pulumi.

v2 architecture. Extending Bluesky to support “federation,” allowing users to run their own Bluesky instances.

Scaling the database layer. PostgreSQL didn’t scale with the site’s growth, so it was time to migrate. The team chose ScyllaDB and SQLite.

Infra stack: from AWS to on-prem. AWS was becoming too costly, so Bluesky moved over to dedicated data centers and bare-metal machines.

Reality of building a social network. Typical firefighting issues, Elon Musk, and outages not being “life-or-death” crises.

1. Development timeline

Bluesky has been in development for just over 2 years, and has been publicly available for around 12 months. Here’s the timeline:

Bluesky’s development timeline and major milestones

Adding in the three phases we’ll discuss below:

Bluesky’s development, split into three phases Phase 1: Experimentation

The first 10 months of the project between January and October 2022 were all about exploration, and the team started to work fully in the open after 4 months. The first project the team open sourced was Authenticated Data Experiment (ADX), an experimental personal data server and a command-line client, accompanied by a network architecture overview.

In April 2022, heavy Twitter user, Elon Musk, raised the prospect of potentially acquiring the site, which created interest in alternatives to the bird app, as any major change in a market-leading social network does.

The first commit for the Bluesky mobile app was made in June 2022, and Paul Frazee worked on it. It started as a proof-of-concept to validate that the protocol worked correctly, and to aid protocol development via real-world use. Conventional wisdom says that prototypes are thrown away after serving their purpose. 

However, in this case this mobile app that a single person had built, became the production app, following the unforeseen spike of interest in it caused by takeover news at Twitter. This is a good reminder that real world events can push conventional wisdom out of the window!

In October 2022, the team announced the Authenticated Transfer Protocol (AT Protocol) and the app’s waitlist, just a few days after news that Elon Musk was to acquire Twitter. This led many tweeters to seek alternative social networks, and drove a major signup spike for Bluesky’s private beta. This development put pressure on the Bluesky team to seize the unexpected opportunity by getting the protocol and app ready for beta users. See details on the AT Protocol.

Phase 2: invite-only launch and the first 1M users

In October 2022, Bluesky consisted solely of Jay Graber CEO, and two software engineers; Daniel and Paul. Engineer #3, Devin, joined the same month. Announcing the AT Protocol and waitlist generated some media buzz and Bluesky attracted more interest during this period.

In March 2023, the company was confident that the protocol and mobile app were stable enough to invite more users by sending invites.

“Blocking” was implemented in a single night. After the app opened up to more users, there was an influx of offensive posts and of users verbally harassing other accounts. This made it clear that implementing blocks to restrict individual accounts from viewing and commenting on a user’s posts, was urgently-needed functionality.

The three earliest developers – Paul, Devin and Daniel – jumped on a call, then got to work. In the community, developers saw the pull requests (PRs) on this feature appear on GitHub, and started to point out bugs, and cheer on the rapid implementation. They wrapped it up and launched the feature by the end of the same day. To date, this is the most rapidly-built feature, and is still used across the protocol and the app!

In June 2023, Bluesky passed the 100,000-users milestone when the team numbered 6 developers, who’d shipped features like custom feeds, blocking and muting, moderation controls, and custom domains. A web application built on React Native was also in production.

In September 2023, Bluesky passed 1 million users – a 900,000 increase in just 3 months!

Phase 3: Preparing for public launch

In the 6 months following the 1 million-user milestone, the focus was on preparing to open up Bluesky to the public with no waitlist or throttling of invites.

Federation (internal.) To prepare for “proper” federation, the team made architecture changes to enable internal federation of Bluesky servers. 

Federation is a key concept in distributed networks. It means a group of nodes can send messages to one another. For Bluesky, it meant that – eventually – users should be able to run their own PDS instances that host their own user information (and user information of users on that server.) And the Bluesky network operates seamlessly with this distributed backend.

A new logo and a reference to Twitter. The team prepared a new logo for launch, and announced it in December 2023:

The old and new logo

The butterfly logo is intended as a symbol of freedom and change. Existing centralized social media platforms – like X (formerly Twitter,) Instagram, TikTok, and Youtube – are platforms that want to lock users into their website and apps. Bluesky, on the other hand, offers its protocol, but doesn’t dictate which apps or websites people use. It doesn’t even want to dictate the hosting of content:

The final slide of Jay Graber’s presentation for Twitter about the vision for Bluesky. It was Twitter that granted Bluesky its initial $13M funding in 2021 – partially based on this vision. The image visualizes the blue bird freed from a closed platform into Bluesky’s open ecosystem. Source: Bluesky

Let’s dive into each phase of the building process.

2. Experimentation phase

During Bluesky’s first 9 months (January-September 2022) two software engineers built the protocol and apps – Daniel Holmgren and Paul Frazee – and Jay the CEO signed off design decisions. The first couple of months were about experimenting and tech “spiking,” which means timeboxing the time and effort spent building and trying out ideas. Here’s Paul:

“We would greenfield for a period, then attack what we had just created to see if it holds up. We gave the existing technologies a really close look; if we didn’t see meaningful improvements from the existing protocols, then we decided we’d use what was already out there.”

When the direction wasn’t clear, the team kept trying out new approaches, says Daniel:

“We set out to use as many existing specs as we could. We spent a lot of time early on investigating things like Activity Pub and seriously trying to figure out how we could make it work, and realizing that it didn't really work for our use case.”

Development principles

The still-small team set up principles to ensure continuous progress:

No backward steps. Ease of use, scale, and feature developer experience, can not be worse than existing social networks’.

Unify app development with protocol development. Never make tech decisions in isolation from practical use cases.

Don’t be precious! If an idea or design doesn’t work, just throw it out!

Approach to building a new, novel decentralized protocol

The team prioritized flexible design choices in order to not lock themselves into a technology, until they knew exactly what they were building. Not coupling the data layer too closely with Postgres is an example of this. See below.

Building for flexibility, not scalability, was deliberate. The idea was to swap this approach to prioritize scale once everyone knew exactly what to build. The knowledge that decisions are hard to undo made the team’s own decision-making more thorough, Daniel reflects:

“The most difficult part of building Bluesky has been the constant awareness that small decisions you make may be locked in for years and have ripple effects. In a decentralized environment, these can be difficult to unwind. It puts a lot of weight on every decision, and we have to double and triple check choices that we make so that we hopefully don’t regret them.” 

Inventing new approaches was never a goal. The original idea was to take a protocol or technology off the shelf, and push it as far as possible to reveal a requirement that didn’t quite fit. For example, Lexicon – the schema used to define remote procedure call (RPC) methods and record types – started out as JSON schemas. The team tried hard to keep it lightweight, and stuck to JSON schemas. But they ended up bending over backwards to make it work. In the end, the team decided to fork off from JSON schemas and added features to it, which is how Lexicon was born.

Bluesky gets criticism for inventing new approaches which are non-standard across decentralized networks. Paul explains it like this:

“We never set out to live the ‘not invented here’ (NIH) syndrome. I don’t think anyone building something new has this goal. In the end, it just naturally evolved in this direction.

No one had done a high-scale decentralized social network before this! If someone had, we probably wouldn’t have needed to invent as many things.”

Bluesky takes inspiration from existing web technologies. As Daniel puts it:

“The AT Protocol is a pretty typical JSON API collection over HTTP. The architecture of Bluesky looks very similar to a traditional social media data center turned inside out. The firehose API looks a lot like Kafka – and we’re probably going to shard it in a similar way.”

3. v1 architecture: not really scalable and not federated – yet Infrastructure choices

PostgreSQL was the team’s database of choice when starting development. Postgres is often called the “Swiss Army knife of databases” because it’s speedy for development, great for prototyping, with a vast number of extensions. One drawback is that Postgres is a single bottleneck in the system, which can cause issues when scaling to handle massive loads that never materialize for most projects.

For the team, using Postgres worked really well while they were unsure exactly what they were building, or how they would query things. Paul’s summary of the choice to use Postgres:

“You start with a giant Postgres database and see how far that can take you, so that you can move quickly early on.”

AWS infrastructure was what the team started with because it’s quick to set up and easy to use, says Daniel:

“We were running everything out of AWS, and that is great because you can just spin up new VMs very easily, and spin up new stacks and services easily.”

The first infra hire at Bluesky, Jake Gold, iterated on the AWS setup:

“The basic idea we have right now is we’re using AWS, we have auto-scaling groups, and those auto-scaling groups are just EC2 instances running Docker Community Edition (CE) for the runtime and for containers. And then we have a load balancer in front and a Postgres multi-availability zone instance in the back on Relational Database Service (RDS). It’s a really simple setup.”

To facilitate deployments on AWS, the team used infrastructure-as-code service, Pulumi.

Modularizing the architecture for an open network was an effort the team kicked off early. The goal of modularization was to spin out parts of the network which users could host themselves. Daniel says:

“Our early insight was that we should give developers building on top of Bluesky the ability to focus on the parts of the network that they want to focus on. This is the microservices part.

An external developer building a feed should not need to index every “like” in the network. Someone self-hosting their own account should not need to consume thousands of posts to create a timeline. You can split the network into specific roles and have them work in concert.”

Personal Data Server

At first, the architecture of Bluesky consisted of one centralized server, the PDS (Personal Data Server.)

The Bluesky backend in March 2023

The strategy was to split this centralized service into smaller parts and allow for federation, eventually.

Bluesky being a federated network means individual users can run their own “Bluesky instance” and curate their own network.

The feed generator The backend in May 2023, after the feed generator was moved into its own component

In May 2023, the Bluesky team moved the feed generator to its own role. This service allows any developer to create a custom algorithm, and choose one to use. Developers can spin up a new Feed Generator service and make it discoverable to the Bluesky network, to add a new algorithm. Bluesky also allows users to choose from several predefined algorithms.

The Feed Generator interface was the first case of Bluesky as a decentralized network. From then, the Bluesky network was not solely the services which the Bluesky team operated, it was also third-party services like Feed Generator instances that plugged into the Bluesky network.

Dedicated “Appview” service

For the next step, the view logic was moved from the PDS, to an “Appview” service. This is a pretty standard approach for backend systems, to move everything view-related to its own service, and not to trouble other systems with presenting data to web and mobile applications.

Splitting out application view logic into its own service Relays to crawl the network

In the future, there could be hundreds or thousands of PDSs in the Bluesky network. So, how will all the data be synchronized with them? The answer is that a “crawler” will go through all these PDSs. In preparation for this crawl the team introduced a Relay service:

Preparing for federation and multiple PDSs by adding a Relay service to “crawl” later 4. v2 architecture: scaleable and federated

The v1 architecture needed to evolve in order to support full federation, and the team always planned to move on from it. But they expected v1 to last longer than only 6 months.

Federation

Federation sandbox. Before shipping a first version of federation, the team built a Federation Sandbox to test the architecture, as a safe space to try new features like modulation and curation tooling.

Internal federation. To prepare for federation proper, the next refactoring was to add support for multiple Personal Data Servers. As a first step, the Bluesky team did this internally. Users noticed nothing of this transition, which was intentional, and Bluesky was then federated! Proving that federation worked was a large milestone.

As a reminder, federation was critical to Bluesky because it made the network truly distributed. With federation, any user can run their own Bluesky server.

Prior to federation, Bluesky created 10 PDS services, wrapped into an Entryway interface

The “internally federated” PDS servers worked exactly like a self-hosted PDS. Bluesky made one addition, to wrap the internal PDS servers into a new service called “Entryway,” which provides the “bsky.social” identity to the PDSes. Entryway will become the “official” Bluesky OAuth authorization server for users who choose bsky.social servers, and one operated as a self-hosted server.

Later, Bluesky increased the number of internal PDS servers from 10 to 20 for capacity reasons, and to test that adding PDS servers worked as expected.

External federation. With everything ready to support self-hosted Personal Data Servers, Bluesky flipped to switch, and started to “crawl” those servers in February 2024:

Adding support for “proper” federation. Anyone can self-host a “Bluesky instance” in PDS form

To date, Bluesky has more than 300 self-hosted PDSs. This change has made the network properly distributed, anyone wanting to own their data on Bluesky can self-host an instance. Over time, we could also see services launch which self-host instances and allow for full data ownership in exchange for a fee.

Appview: further refactoring

Recently, Bluesky further refactored its Appview service, and pulled out the moderation functionality into its own service, called Ozone:

Moving moderation and post labeling functionality from Appview to Ozone

Users can run their own Ozone service – meaning to be a moderator in the Bluesky system. Here are details on how to self-host this service, and more about Ozone.

An architectural overview, with Martin Kleppman

Martin is the author of the popular software engineering book, Designing Data Intensive Applications, and he also advises the Bluesky team in weekly calls.

Martin and the Bluesky team published a paper describing the Bluesky system, Bluesky and the AT Protocol: Usable decentralized social media. In it, they offer a detailed overview of the architecture:

The architecture of Bluesky. Image source: Bluesky and the AT Protocol

The diagram above shows how data flows occur in the application:

Personal data server (PDS): these can be Bluesky-hosted (around 20 today) or self-hosted (around 300)

Relays: these collect events from the PDSs. Bluesky has its “official” relay hosted in its own infrastructure, but other developers can set up alternative relays that listen to all PDSs.

Firehose: the output of the relays.

Labelers and feed generators: these digest firehose events. They can be Bluesky-hosted, or be hosted independently of Bluesky.

App View: The Bluesky-hosted “official” app view, or alternate app views

Data flowing back to PDSs: feed generators hosted by Bluesky or externally, feed events data back to the PDSs.

5. Scaling the database layer Scaling issues with Postgres

Scaling issues emerged 2-3 months after the public beta launch in mid-2023.

Connection pool issues and lock contention. The Postgres connection pool backup and Node’s event loop got into a bad feedback loop. The team observed Postgres lock contention issues. This refers to multiple processes trying to access the same data simultaneously, but the data is locked to all except one process. For example, when multiple processes attempt to update the same row.

Small Postgres outages. Postgres doesn’t give the developer much control over which query plan it will take. Bluesky had a few smaller outages due to a query plan randomly flipping to something that ran about 1,000x times slower.

The need for horizontal scaling. Horizontal scaling is adding more machines to a service, so that the throughput of this system improves linearly with each new machine. But Postgres does not support horizontal scaling because it runs as a single database with transactional guarantees, meaning it becomes a bottleneck – if a necessary one – for the entire network.

As a reminder, the team was still tiny when all these scaling challenges emerged. There were only 6 developers (Daniel, Devin, Bryan and Jake on the backend, and Paul and Ansh on the frontend). Then in summer 2023, Daniel had a dream:

“After one stressful day, I dreamt that me, Jay [Bluesky’s CEO], and Devin were in my backyard. There were snakes everywhere you looked. We were going to wrangle and round up the snakes in a panic. But that that point, Devin stops and says to all of us: ‘wait, wait, guys, I think there’s a Postgres extension for this!’”

ScyllaDB replacing Postgres

The team knew they needed a horizontally scalable data storage solution, with fine-grained control of how data is indexed and queried.

ScyllaDB was an obvious choice because it supports horizontal scalability due to being a wide-column database (a NoSQL type.) Wide-column databases store data in flexible columns that can be spread across multiple servers or database rows. They can also support two rows having different columns, which gives a lot more flexibility for data storage!

Wide-column databases store data in columns so it’s highly scalable and flexible. Two rows in one table can have different types or numbers of columns. Source: AWS

The biggest tradeoffs:

Data must be denormalized, meaning it isn’t stored as efficiently as in a relational database. Basically, you’ll store more data and require more storage space.

Data needs to be indexed on write. Writing to a wide column database is more expensive than to a relational database. For each row and column changed, the relevant indexes need to be updated, which typically makes these databases more write-intensive than relational ones.

The team was satisfied with their early choice of Postgres, says Daniel:

“Postgres was great early on because we didn’t quite know exactly what questions we’d be asking of the data. It let us toss data into the database and figure it out from there. Now we understand the data and the types of queries we need to run, it frees us up to index it in Scylla in exactly the manner we need and provide APIs for the exact queries we’ll be asking.”

SQLite

ScyllaDB is used for the Appview, which is Bluesky’s most read-heavy service. However, the Personal Data Servers use something else entirely: SQLite. This is a database written in the C language which stores the whole database in a single file on the host machine. SQLite is considered “zero configuration,” unlike most other databases that require service management – like startup scripts – or access control management. SQLite requires none of this and can be started up from a single process with no system administrative privileges. It “just works.”

Daniel explains why SQLite was ideal for the PDSs:

“We took a somewhat novel approach of giving every user their own SQLite database. By removing the Postgres dependency, we made it possible to run a ‘PDS in a box’ without having to worry about managing a database. We didn’t have to worry about things like replicas or failover. For those thinking this is irresponsible: don’t worry, we are backing up all the data on our PDSs!”

SQLite worked really well because the PDS – in its ideal form – is a single-tenant system. We owned up to that by having these single tenant SQLite databases.

We also leaned into the fact that we’re building a federated network. We federated our data hosting in the exact same manner that it works for non-Bluesky PDSs.” 

Migrating the PDSs from Postgre to SQLite created fantastic improvement in operations, Daniel adds:

“PDSs have been a dream to run since this refactor. They are cheap to operate (no Postgres service!) and require virtually no operational overhead!”

6. Infra stack: from AWS to on-prem

Bluesky’s infrastructure was initially hosted on Amazon Web Services (AWS) and the team used infrastructure-as-a-code service, Pulumi. This approach let them move quickly early on, and also to scale their infra as the network grew. Of course, as the network grew so did the infrastructure bill. 

Move to on-prem

Cost and performance were the main drivers in moving on-prem. The team got hardware that was more than 10x as powerful as before, for a fraction of the price. How was this decision made? A key hire played a big role.

Bluesky’s first hire with large-scale experience was Jake Gold, who joined in January 2023, and began a cost analysis of AWS versus on-prem. He eventually convinced the team to make this big change.

But how did the team forecast future load, and calculate the hardware footprint they’d need? Daniel recalls:

“We looked at the trends and tried to make a safe bet. We were thinking: ‘okay, today we're over-provisioned. We want to stay over-provisioned, so we have room to grow without upgrading the hardware, but also just so we have stability if something happens in the world, and everyone decides to post about it.’

We built our architecture to be horizontally scalable so that we can add more capacity just by throwing more machines at it. There is some lead time to buying new machines, but we have space in the rack. We have room in the network connections. The switches are good for it.

If we need to scale, it’s really just about ‘get some more servers and hook them up!’ We can get to twice the capacity after doubling the machines we’re running in our data center. This is sweet!”

Becoming cloud-agnostic was the first step in moving off AWS. By June 2023, six months after Jake joined, Bluesky’s infrastructure was cloud agnostic. 

Bluesky always has the option of using AWS to scale if needed, and is designed in a way that it would not be overly difficult to stand up additional virtual machines on AWS, if the existing infrastructure has capacity or scaling issues.

Today, the Personal Data Servers are bare-metal servers hosted by cloud infrastructure vendor, Vultr. Bluesky currently operates 20 and shards them so that each PDS supports about 300,000 users.

Bluesky’s load by the numbers

Currently, Bluesky’s system sees this sort of load:

60-100 events/second received by the firehose service, which is the “main” service that emits messages sent on the network in real time. During the public launch of Bluesky in February, the peak was 400 events/second.

400 timeline loads/second. A timeline load is when a user (or client) makes a request to fetch their current timeline.

3,500 requests/second across the network.

7. Reality of building a social network

To close, we (Gergely and Elin) asked the teams some questions on what it’s like to build a high-growth social network.

What is a typical firefighting issue you often encounter?

“Every influx of users brought new problems, and we found ourselves doing quite a bit of firefighting. One day, after a particularly notable incident, growth showed no signs of stopping, and we had to temporarily disable signups in order to keep the service running.” – Daniel

What were the events referred to as “Elon Musk?”

“We never quite knew when a user bump was going to come, and invites were out in the wild waiting to be used. Then something would happen, and thousands of users suddenly joined. We started referring to these days as EMEs (Elon Musk Events) because they were normally precipitated by some change on Twitter.” – Daniel

“It was a bit like throwing a party and everybody showing up 2 hours early, while you’re still setting up the chairs and telling people to get drinks from the fridge. And then about ten times more people show up than expected.” – Paul

How are outages different for a social network?

“Disabling signups or pausing the service is never fun to do, but it actually created a bunch of excitement and a strange sense of pride in the user base.” – Daniel

“Outages are not fun, but they’re not life and death, generally. And if you look at the traffic, usually what happens is after an outage, traffic tends to go up. And a lot of people who joined, they’re just talking about the fun outage that they missed because they weren’t on the network.” – Jake

The whole developer team is on Bluesky, and actively responding to user feedback. How do you do this, and why?

“People just pinging us in the app and explaining their problem, is so good. We can just respond, "Hey, can you give me a screenshot? What platform are you on?" It's such a fast support turnaround. The big benefit of building a social app is that your customers are right there, and will tell you if something's not working.

Real time user feedback was how mute words got prioritized, recently. In terms of a signal about how important something is, when you start getting PRs to add the feature, and you get a ton of people plus-oneing the issue – not to mention people asking for it in the app – that tells you a lot.” – Paul

Takeaways

Gergely here. Many thanks to Daniel and Paul for part one of this deep dive into how Bluesky works! You can try out Bluesky for yourself, learn more about Bluesky’s AT Protocol, or about its architecture. And I’m also on Bluesky.

Decentralized architectures require a different way of thinking. I’ll be honest, I’m so used to building and designing “centralized” architecture, that the thought of servers being operated outside of the company is very alien. My immediate thoughts were:

Is it secure enough? Malicious actors could run anything on those servers and attempt to overload the network or exploit vulnerabilities in the system. The Bluesky team also stressed how the security model is something you thoroughly need to consider as you design APIs for such a system.

What about external nodes that don’t ever update the version of the software? How do they get bug fixes? And what about versioning? How to ensure “outdated clients” are cut off from the network?

Finally, I thought; “wow, this kind of reminds me of the confusion I initially felt about Skype’s peer-to-peer network

I’m delighted we did a deep dive about Bluesky because it has forced me to think more broadly. A server drawing on a diagram no longer just means “a group of our servers,” it can also mean “plus, a group of external servers.” Once this is understood, it’s easy. And this skill of designing distributed and federated systems may be useful in the future, as I expect the concept of distributed architecture to become more popular.

It’s impressive what a tiny team of experienced engineers can build. I had to triple-check that Bluesky’s core team was only two engineers for almost nine months, during which time they built the basics of the protocol, and made progress with the iOS and Android apps. Even now, Bluesky is a very lean team of around 12 engineers for the complexity they build with and the company’s growth. 

In the next part of this deep dive into Bluesky, we cover more on how the team works. 

Owning your own infrastructure instead of using the cloud seems a rational choice. Bluesky found large savings by moving off AWS once they could forecast the type of load they needed. Jake Gold, the engineer driving this transition, has been vocal about how cloud providers have become more expensive than many people realize. Speaking on the podcast, Last Week in AWS, he said:

“With the original vision of AWS I first started using in 2006, or whenever launched, they said they would lower your bill every so often, as Moore’s law makes their bill lower. And that kind of happened a little bit here and there, but it hasn’t happened to the same degree as I think we all hoped it would.”

Don’t forget, it’s not only Bluesky which rejects cloud providers for efficiency. We previously did a deep dive into travel booking platform Agoda, and why it isn’t on the cloud.

I’m slowly changing my mind about decentralized and federated social networks. I also tried out Mastodon, which is another federated social network, when it launched. At the time, Mastodon felt a lot more clunky in onboarding than Bluesky. You had to choose a server to use, but different servers have different rules, whereas Bluesky was much smoother. Still, as a user, I was blissfully unaware of how different these social networks are from the dominant platforms.

It was only by learning about Bluesky’s architecture that I appreciated the design goals of a decentralized social network. Currently, mainstream social networks are operated exclusively by the company that owns them. But a decentralized network allows servers to be operated by other teams/organizations/individuals. This might not seem like a big deal, but it means a social network is no longer dependent on the moderation policies of a parent company.

Decentralized social networks also allows users to use custom algorithms, websites and mobile apps, which creates opportunities for developers to build innovative experiences. In contrast, you cannot build a custom third-party client for X, Threads, or LinkedIn.

I’m still unsure how much mainstream appeal decentralized social networks hold for non-technical people, but I’m rooting for Bluesky, Mastodon, and the other decentralized social apps. Perhaps they can challenge Big Tech’s dominance of social media, or at least change people’s understanding of what a social network can be.

In a follow-up issue, we’ll look deeper into the engineering culture at Bluesky: the company culture, a deeper look at the tech stack, and how they are building seemingly so much with a surprisingly small team and company. I suspect we can all learn a lot in how a dozen engineers help a startup scale to more than 5 million users.

Enjoyed this issue? Subscribe to get this newsletter every week 👇

Subscribe now


@_Nat Zone

NIST SP 800-63Bへの補遺が出ました〜パスキーの追加です

近年、多要素認証(MFA)の普及が進む中、ユーザーの利便性とセキュリティのバランスを取ることが課題となっています。MFAとは、パスワードに加え、生体認証やワンタイムパスワードなど複数の認証方式を組み合わせることで、アカウントへの不正アクセスを防ぐセキュリティ手法です。しかし、MF…

近年、多要素認証(MFA)の普及が進む中、ユーザーの利便性とセキュリティのバランスを取ることが課題となっています。MFAとは、パスワードに加え、生体認証やワンタイムパスワードなど複数の認証方式を組み合わせることで、アカウントへの不正アクセスを防ぐセキュリティ手法です。しかし、MFAの導入はユーザーにとって面倒な手順が増えることを意味します。この課題を解決するために登場したのが、Syncable Authenticator(同期可能認証器)、別名Passkey(パスキー)と呼ばれる新しい認証技術です。

同期可能認証器は、認証に用いる秘密鍵を複数のデバイス間で同期できるようにすることで、ユーザーはどのデバイスでも同じ認証情報を使えるようになります。これにより利便性が大幅に向上します。一方で、秘密鍵を複数の端末で共有することはセキュリティ上のリスクを伴います。そこで、米国国立標準技術研究所(NIST)は、シンカブル認証器の安全な利用に関するガイドライン(SP 800-63Bの補遺)を公開しました。

NIST: Incorporating Syncable Authenticators Into NIST SP 800-63B

この文書は、NIST Special Publication 800-63Bへの補足であり、認証キーを複製しデバイス間で同期できる同期可能認証器(パスキーとしても知られている)の使用に関するガイダンスを提供するものです。主なポイントは以下の通りです。

適切に構成された同期可能認証器は、中間者攻撃、検証者のなりすまし、リプレイ攻撃などの脅威を軽減し、認証の意図を提供することで、認証保証レベル2(AAL2)を達成できます。AAL2は、フィッシング耐性など高度なセキュリティを求める一方で、ある程度の使いやすさも確保するレベルです。 本文書では、キーの生成、保存、アクセス制御に関する一定の要件を満たすことを条件に、同期可能認証器の認証キーの複製を許可するようSP 800-63Bを更新しています。これによって、政府機関でもパスキーを使うことができるようになります。 WebAuthn仕様のフラグを使用して、認証器がAAL2の要件を満たしているかどうかを判断するなど、実装上の考慮事項について説明しています。エンタープライズでのユースケースでは、Attestationを利用してオーセンティケータの機能を検証することができます。 同期可能認証器の潜在的な脅威と課題を概説し、不正なキーの使用、同期ファブリックの侵害、失効の難しさなどを示し、それらを軽減するための方法を提案しています。 本文書は、一部の実装におけるユーザー間のキー共有のリスクを認識し、エンタープライズおよび公開向けのユースケースについてガイダンスを提供しています。

全体として、この補足資料は、適切に導入されれば、便利でフィッシング耐性のある認証を提供できる同期可能認証器の統合について、機関がリスクに基づいた informed な意思決定を行うことを目的としています。

本補遺は、AAL2への同期可能認証器の適合性を検証する上で重要な基準を提供しています。また、鍵の不正使用やクラウドストレージの侵害といった新たな脅威についても言及し、対策を提言しています。同期可能認証器の潜在的なリスクを認識しつつ、適切に導入することで、利便性を損なわずにセキュリティを向上できることを示しています。

本補遺は、セキュリティ担当者だけでなく、同期可能認証器の導入を検討する全ての関係者にとって有益な情報を提供しています。NISTによる公的なガイドラインの存在は、同期可能認証器の安全性に対する信頼を高め、普及を後押しするでしょう。ユーザー数の増加は、さらなる技術革新を促します。本補遺は、セキュリティと利便性を両立する新認証技術の健全な発展に寄与する重要な一里塚になると考えられます。

なお、背景等については @phr_eidentity さんのこの記事1が詳しいです。


アイデンティティをやっているなら読むべき本〜『メタバース進化論――仮想現実の荒野に芽吹く「解放」と「創造」の新世界』

まずはじめにアイデンティティ業界の人へ とりあえず、アイデンティティに関心のある人は、本書『メタバース進化論』を読んだほうが良い。メタバース・バ美肉として現実の物理的制限から存在を解き放ち、抽象空間で取り扱うことができるようになることによって、アイデンティティとプライバシー管理の…
まずはじめにアイデンティティ業界の人へ

とりあえず、アイデンティティに関心のある人は、本書『メタバース進化論』を読んだほうが良い。メタバース・バ美肉として現実の物理的制限から存在を解き放ち、抽象空間で取り扱うことができるようになることによって、アイデンティティとプライバシー管理の課題や要件がこんなにも明確になるのか!曰く

そもそも「アイデンティティ(自己同一性)とは、私たちが私たち自身をどのようなものと捉えるかという「認識」であり、他者や社会からそれが認められているという「感覚」のことです。 物理現実では、基本的に生まれたままの名前・姿・声を受け入れるしかありませんでした。つまり、アイデンティティとは「与えられる」ものでした。 基本的には与えられた固定のものを「受け入れる」しかなかった物理現実時代のそれとは違い、メタバース時代のアイデンティティは自由に「デザインする」ものになり、「なりたい自分」としての人生を送ることが可能になる 人間を分割可能な「分人(Dividual)」として捉える 複数のアイデンティティを「切り替える」ことで人生を自由にデザインできる プラトンの「イデア論」〜見ている現実世界は、実体ではなく、イデアの影にすぎないのだ アイデンティティを自在にデザインして「なりたい自分」になれるメタバース

(出所)『メタバース進化論』

わたしの本1やブログ、講演などに接しているかたは、「はいはいアレね」と思うだろう。次の図は、2011年6月に書いた「非技術者のためのデジタル・アイデンティティ入門」の頃から使いまわしている図だが、ほとんど同様のことを言っている。分人 v.s. Identity (Partial Identity) など使っている言葉は違えど殆どマッピングできる。

ただ、多分、本書のほうが100倍わかりやすい。これは、この図では自己像と言って抽象的に扱っているものにアバターという実体を与えて手触り感を持たせているのと同時に、何でもかんでも「属性」という分類でくくってしまわずに、「言霊世界」「視覚世界」「音響世界」のようにカテゴライズして説明しているからだろう。もちろん単純な筆力の違いもあるが。

【図表1】自己像(アイデンティティ)と幸福追求

とりわけ重要なのが、魂の自由性とか、アイデンティティは他者から与えられるものではなく自分で確立するものだというところだ。このあたり、「アイデンティティの専門家」でも忘れている人達が多い。神だとか政府だとかに与えてもらいそれを受容するものだという中世的な考え方の人たちが多いのだ。

中世的アイデンティティ v.s. 近代的アイデンティティ

近代的な個人観と中世的な個人観の主な違いは以下のようにまとめらるだろう。

中世的個人観

個人は共同体の一部として存在し、共同体からの役割や地位によって定義されていた。2 キリスト教の影響で、個人は神との関係性において「世俗外個人」として捉えられていた。3 4 個人の内面性よりも、共同体への所属と役割が重視されていた。5

近代的個人観

個人は自律的な存在として捉えられ、個人の内面性や自我意識が重視されるようになった。67 啓蒙主義の影響で、個人は中世的な共同体からの拘束から解放され、「内面的孤独」を経験するようになった。8 個人の権利や自由が強調され、個人主義的価値観が台頭した。910 プライバシーの概念が生まれ、私的領域が神聖視されるようになった。11

つまり、中世では個人は共同体の一部として捉えられていたのに対し、近代では個人の自律性と内面性が強調され、個人主義的価値観が確立されたということができる。魂の自由性と言っても良い。でも残念ながら、この個人主義的価値観・近代的個人観に行き着いていない人が結構多い。だから、アバターは現実世界の自分に似ていなければならないだとか、アイデンティティウォレットには政府発行の個人属性証明(PID)を入れて、それを中心に回していくとかの考えになってくる。

これは別にウォレットに限ったことではなくて、Web 2.0的なSNSにおける本人確認でも似たようなことが起きてくる。先ほど経験したことだと、LinkedInで本人確認済みにするには、氏名はパスポート表記と同一でなければならないという。パスポート表記などというのは、国境を超えるときの識別子として政府に登録されたものであって、自分を表出するものではない。わたしは英語では Nat Sakimura として活動しているが、それが「本人確認」をすると、Natsuhiko Sakimuraであることを強制される。なんだそれは。そんなことしたら過去の業績とのリンクができなくなるではないか。いやそんな些細なことではない。名乗りを奪うということは、人格を否定することに等しい。ちょうど千と千尋の神隠しの湯婆婆が千尋の名前を奪うように。

まさに、本書がいうように、名前は「言霊世界のアイデンティティ」なのだ。この「名乗りの自由性」はとても大切だ。なぜならば、本書が言うように『周りの方々からこの「言霊」で毎日繰り返し呼びかけられ続けることで、わたしは◯◯としての自己認識を獲得することができる』からだ。

また、言うまでもなく「アバター:視覚世界のアイデンティティ」(本書)の自由性も重要だ。『自分自身の「アイデンティティ表現」としてみた時は現実のような「人間」のかたちに縛られたくない』(本書)からだ。それともあれだろうか。四肢が不自由だったら、メタバースでも四肢が不自由でなければいけないとでも言うのか。

その自由性の象徴がバ美肉である。日本では、人形浄瑠璃や歌舞伎の女形などの文化的背景がありだいぶ受容が進んできたが、西洋ではヘタをしたら56されかねない。バ美肉を受容できるかどうかは、潜在的な差別意識のあるなしのリトマス試験紙になる。

本書ではMetaが規制を入れてくる可能性に触れているが、AppleもVision Pro でやってきても驚かない。場合によっては、VRのアバタを簡単に作らさせるためにUXとしてそういうことにするといってくるかもしれない。だがこうした動きには徹底抗戦しなければなるまい。日本では、自己イメージコントロール権説や、「身だしなみなどに関する自己決定権」から、「なりたい自分になれる権利」は認められ、間接適用説からmetaなど事業者もこの権利の侵害は許されないはずだ12

こうしたアイデンティティに関する話は第4章にまとめられているので、時間がなかったらまずそこだけでも読んでほしいところだ。

プライバシーに関しても参考になる

また、アイデンティティと違って一つの章になっているわけではないが、プライバシーに関しても参考になる記述がいろいろとある。曰く、

同じく日本の伝統芸能「人形浄瑠璃」でも、優美に踊る美しい女性の人形の背後には黒子のおじさんが見えてしまっています。しかし、それを指摘するのは野暮

これは、「データが見えてしまっても、意図された目的以外では使ってはいけない」というデータ保護の基本につながる。わたしが昔から「大人のプライバシー」と呼んでいるものだ。もちろん「なりたい自分になれる」というのが、「自己イメージコントロール権」13と表裏一体であるのは言うまでもなく。

ISOでは現在、メタバースのプライバシーについての検討が始まっているが、関係者にぜひ本書を読ませたいところだ。英語版がほしい。

もちろんメタバースの解説も

さて、自分がアイデンティティ&プライバシー業界の人間だから、いきなりそのあたりの各論に入ってしまったが、そもそも本書はメタバースについての本だ。「メタバースの定義:実現に必要な七要件」であるとか、4大ソーシャルVRの紹介であるとか、それらを支える技術であるとか、メタバースに住む住民たちの特性、なぜ人々はバ美肉になるのか、などなど、データの裏付けをとりながらきっちりと解説している。しかも、マーカーで線を引きたくなるようなキャッチーなフレーズが満載で。たとえばこんな感じ:

声:音響世界のアイデンティティ 自分の精神に作用するプロテウス効果 バ美肉は枯山水。見立ての文化の延長線。 (現在は)「永遠に新しいことを学び続けていく」ことが当然(の社会)。すべての人が永遠に「未熟」であることを受け入れざるを得ない社会において、「大人としての自尊心」に変わる新たな価値観こそが、お互いの「未熟さ」を愛おしいと思う価値観「かわいい」

(出所)『メタバース進化論』

わたしは、Kindleで大量に線を引きながら読んでしまった。上記はそのほんの一部だ。ぜひ本書を買って読んいただきたい。

動画による解説も一緒にみるとなお良し

なお、本書は文章による紹介としては大変優れているが、やはりメタバースのイメージを得るには映像のほうが適している。そういう意味で、筆者である「バーチャル美少女ねむ」さんが、国連のIGF 2023で行ったプレゼン「バーチャル美少女だけど国連登壇してみた」も並行して見ることをおすすめする。非常にわかりやすいプレゼンだ。英語ですが…。

これは、国連が主催するインターネット・ガバナンス・フォーラム(IGF14 )2023 京都のDay 0

「Event No.134 Talk with Metaverse residents – a new identity and diversity | Internet Governance Forum」(2023年10月8日 18:00〜19:00)

の模様を録画したものだ。わたしもこの日だけは、自分のセッション15があったのでIGFの会場にいたのに、生参加できなくて残念だった。多分この時間はOECDの人とTrusted Data Intermeidaries (TDI: 信頼できるデータ仲介者) について話していたと思う。

というわけで

本書、本当におすすめだ。サンフランシスコからの機内で読んでいて、第4章を読んでいて涙が自然に流れ出てきたくらい。ぜひ買って読みましょう!

【変更履歴】

2024-04-23 初版 2024-04-24 中世的アイデンティティ v.s. 近代的アイデンティティを追加、敬体を常態に変更。

Monday, 22. April 2024

Just a Theory

🎙️ Hacking Postgres s02e03

I quite enjoyed the chat with Ry Walker on the “Hacking Postgres” podcast, posted last week.

Hacking Postgres S2E3: David Wheeler, Principal Architect, Tembo

Last week I appeared on s02e03 of the Hacking Postgres podcast.

The experience I had after my independent Consulting gig for 10 years working in companies was, like, bringing up other people and being supportive of other people and hearing from a diversity of voices and perspectives makes everything better.

That’s part of why I want to get so much input on and feedback on the stuff that I’m hoping do with PGXN v2 — or whatever we ultimately call it. But people matter, more than the technology, more than any of the rest of it.

I quite enjoyed this wide-ranging discussion. We covered my history with the Postgres community, a bunch of the projects I’ve worked on over the years, plans and hopes for the PGXN v2 project, perspectives on people and technology, and exciting new and anticipated features of Postgres. Find it wherever fine podcasts are streamed, including:

YouTube Apple Podcasts Overcast Twitter More about… Postgres Podcast Hacking Postgres Sqitch pgTAP PGXN

Phil Windleys Technometria

Using X.509 Certs for DID Provenance

Summary: The abundance of X.509 certificate authorities who already perform identity proofing for businesses provides a rich resouce that can be leveraged to boot the verifiable data ecosystem. Here’s how it could work. When you used a verifiable credential to prove something about yourself, the verifier can know cryptographically: (1) the identifiers for the issuer, (2) the credential hasn't been

Summary: The abundance of X.509 certificate authorities who already perform identity proofing for businesses provides a rich resouce that can be leveraged to boot the verifiable data ecosystem. Here’s how it could work.

When you used a verifiable credential to prove something about yourself, the verifier can know cryptographically: (1) the identifiers for the issuer, (2) the credential hasn't been tampered with, (3) the credential was issued to you, and (4) the credential hasn't been revoked. These four checks are important because their establish the fidelity of the data being transferred. They don't, however, tell them whether they can trust the issuer. For that, they need to take the issuer's decentralized identifier (DID) that they got from credential presentation and determine who it belongs to.

At Internet Identity Workshop last week, Drummond Reed gave a session on how X.509 certificates could help with this. The first step, like always, is to resolve the DID and retrieve the DIDDoc that associates keys and endpoints with the DID. The endpoint can be an HTTP server and, of course, should have an X.509 certificate providing TLS security. That certificate, at the very least, has a a domain name to bind that to the certificate's public key. It can, if you pay for the feature, also include information about the entity that applied for the certificate. The certificate authority proofs that information and is vouching for it when they sign the certificate.

The key to making the X.509 certificate useful for checking the provenance of a DID lies in one key change. X.509 certificates can contain and extended field called a Subject Alternative Name. This following figure shows how it can help.

Using X.509 Certificates to establish the owner of a DID

The issuer (Attestor) creates the DID they will use to issue the certificate along with its associated DIDDoc, including an HTTP endpoint for DID verification.

Attestor applies for a X.509 certificate for that endpoint, including in the application the DID they created in (1).

The certificate authority does it's usual proofing of the application and issues a certificate that includes the DID in the Subject Alternative Name field.

The issuer creates a credential definition in the usual way that includes their DID and writes it to whatever Verifiable Data Registry their DID method dictates.

Attestor issues a credential to a holder (Alice) using that credential definition.

At some later time, Alice presents the credential to the verifier (Certiphi).

Certiphi resolves the DID to get the DIDDoc and retrieves the verfication endpoint from the DIDDoc

Certiphi retrieves the certificate for that endpoint1.

Certiphi verifies the certificate by checking it's signature and ensures that the DID in the DIDDoc for the credential matches the one in certificate.2

The issuer's DID has now been tied in a verifiable way to whatever information is in the certificate. Provided the certificate includes information about the entity beyond the domain name, the verifier can use that information to determine whether or not the credential is authentic (i.e., issued by who the credential definition purports issued it). That might be all the evidence they need to determine whether to trust the entity. Certificate authorities could also issue verifiable credentials to the customer attesting the same verified claims—after all, it's one more product they can offer.

The benefit of doing issuer validation using X.509 certificates is that there are already many trusted X.509 certificate authorities in business who already do proofing of attributes about businesses. That's a huge chunk of the verifiable data ecosystem that doesn't need to be built because it can be leveraged. To make this work, digital certificate authorities would need to start offering to validate DIDs and include them in a certificate as a Subject Alternative Name. I don't discount that this will take some bureaucratic maneuvering. Certificate authorities will need to see a business opportunity. I'd love to see Digitcert or someone do a pilot on this.

Notes

Note that this step might be combined with the previous step if the Verifiable Data Registry is the same server as the endpoint, but that's not necessarily going to be the case for a number of reasons.

Note that this does not create a call back wherein Attestor can determine which credential was used, preserving the privacy of the presentation. Attestor does know one of its credentials has been presented to Certiphi.

Subscribe now

Share


Doc Searls Weblog

Archives as Commons

The Santa Barbara News-Press was born in 1868 and died in 2023 at age 155. Its glory years ran from 1932 until 2000, when the New York Times sold it to Wendy McCaw, who rode it to hell. That ride began with the Santa Barbara News Press Controversy in 2006 and ended when Ampersand, the company McCaw created […]
The Santa Barbara library, viewed from the county courthouse. Is this where the dead local paper’s archives will go? How about future archives of all the local news organs?

The Santa Barbara News-Press was born in 1868 and died in 2023 at age 155. Its glory years ran from 1932 until 2000, when the New York Times sold it to Wendy McCaw, who rode it to hell.

That ride began with the Santa Barbara News Press Controversy in 2006 and ended when Ampersand, the company McCaw created to hold the paper’s bag of assets (which did not include its landmark building downtown, which McCaw kept), filed for Chapter 7 bankruptcy in late July of last year. Here are stories about the death of the paper in three local news journals that have done a great job of taking up the slack left when the News-Press began to collapse, plus one in the LA Times:

‘Santa Barbara News-Press’ Files for Bankruptcy: Publisher Ampersand Claims Few Assets and Many Creditors, by Jean Yamamura in the Santa Barbara Independent (July 23, 2023) Santa Barbara News-Press Declares Bankruptcy, Staff Told All Jobs ‘Eliminated’, by Joshua Molina, in Noozhawk (July 23, 2023) Santa Barbara News-Press Files for Bankruptcy, Staff Fired, by Edhat Staff (July 24, 2023) Santa Barbara News-Press bankruptcy brings uneasy end to an owner’s bitter tenure, by James Rainey, in the Los Angeles Times. (July 29, 2023)

I followed those with this in We Need Deep News:

From what I’ve read so far (and I’d love to be wrong) none of those news reports touch on the subject of the News-Press‘ archives, which conceivably reach back across the century and a half it was published. There can’t be a better first draft of history for Santa Barbara than that one. If it’s gone, the loss is incalculable. (August 18 2023)

Last month brought bad news about that:

‘Santa Barbara News-Press’ Online Assets to Be Sold: Bankruptcy Sale for $250,000 Subject to Bids in April, by Jean Yamamura in the Santa Barbara Independent March 8, 2024 A local paper went bankrupt. Now a faraway buyer wants its assets: The Santa Barbara News-Press’s digital assets are up for sale. Locals worry they could become a farm for AI-generated SEO bait. By Ernie Smith in Fast Company. (March 5, 2024) Santa Barbara’s Collective Memory, Sold for Kindling: Will ‘NewsPress.Com’ Become a Zombie Website? by William Belfiore in the Independent. (March 29, 2024)

But then, thanks to William Belfiore’s appeal in that last piece, we learned this:

‘Santa Barbara News-Press’ Website Goes to ‘Local Kids’ Group Fronted by Ben Romo Makes Winning Auction Bid of $285,000, by Jean Yamamura in the Independent (Apr 09, 2024)

The only mention of archives was in the closing sentences of that piece:

The purchase of the website included the Santa Barbara News-Press trademark, which would be important to the groups looking at the physical archive of back issues, photographs, and clippings by topic. Romo, who was once a paper boy for the daily, acknowledged that his group was supportive of the archive remaining local, too.

I don’t know what that means, and I haven’t checked. But I am sure that the archives ought to be managed by the community as a common pool resource.

As it happens, my wife and I are visiting scholars at the Ostrom Workshop at Indiana University, which is concerned with this kind of thing, because its namesake, Elinor Ostrom, won a Nobel Prize in Economics for her work on how commons are self-governed. In her landmark book, Governing the Commons: The Evolution of Institutions for Collective Action, she lists eight principles for managing a commons, which are summarized here:

 Define clear group boundaries. Match rules governing use of common goods to local needs and conditions. Ensure that those affected by the rules can participate in modifying the rules. Make sure the rule-making rights of community members are respected by outside authorities. Develop a system, carried out by community members, for monitoring members’ behavior. Use graduated sanctions for rule violators. Provide accessible, low-cost means for dispute resolution. Build responsibility for governing the common resource in nested tiers from the lowest level up to the entire interconnected system.

Journalists, especially those who report news, are not herding animals. They tend to be competitive and territorial by both nature and purpose. So the collection of news entities I wrote about in We Need Wide News and We Need Whole News will almost certainly not cohere into a commons such as Lin (her nickname) Ostrom addresses in that list.

But they should cohere around archives—not only because that’s the right thing to do, but because they need those archives. We all do.

So I hope Santa Barbara’s many journals, journalists, friends, supporters, and interested local institutions get together around this challenge. Build a commons around those archives, whatever and wherever they happen to be.

Meanwhile here in Bloomington, my other hometown, we are pushing forward with The Online Local Chronicle that Dave Askins wrote about in the previous installment in this series. We might call that a commons interest here.

 

 


Wrench in the Gears

On The Road Again – Vermont Trip Wrap Up

I’m Arkansas bound in the morning, so I don’t have time for a long post. All I have time for is to share a video with musings about the second part of my trip north and the photos that go along with it. Keep me in your prayers these next few weeks. I’m going to [...]

I’m Arkansas bound in the morning, so I don’t have time for a long post. All I have time for is to share a video with musings about the second part of my trip north and the photos that go along with it. Keep me in your prayers these next few weeks. I’m going to be putting a lot of miles behind me.

 

 

 

 

 

 

 

 

 

Sunday, 21. April 2024

Doc Searls Weblog

This Thing is Bigger Than Journalism

Journalism as we knew it is washing away. But the story is bigger than journalism alone, and bigger than a story alone can tell. (Image borrowed from the brilliant Despair.com.) We who care about journalism are asked to join the Save Journalism Project, and its fight against Big Tech. Their pitch begins, and adds, On the first point, we […]
Journalism as we knew it is washing away. But the story is bigger than journalism alone, and bigger than a story alone can tell. (Image borrowed from the brilliant Despair.com.)

We who care about journalism are asked to join the Save Journalism Project, and its fight against Big Tech. Their pitch begins,

and adds,

On the first point, we should note that journalists have been working for magazines, broadcasters, newsletters and themselves for many dozens of years. So journalism isn’t just about newspapers. Also, because so many journalists have long made livings in those other media, the loss of work is far greater than the 2,400 gone from newspapers. It’s truly massive. I don’t know any field where the loss of paying jobs is larger on a percentage basis. Not taxi driving, not hospitality, not retail, not manufacturing… not anything I can think of. (Well, maybe nuns. I don’t see many of those these days.)

We should also respect the simple fact that now there is more journalism than ever: in blogs, social media, podcasting, and other places. Most of those kinds of journalism don’t pay, but that doesn’t disqualify the work from the label. Hell, I’m committing journalism here and this doesn’t pay.

“The story of big tech’s threat to journalism” (what the Project wants us all to tell) is also something of a red herring because it distracts our attention from causes much bigger than Big Tech.

Every new technology “works us over completely,” Marshall McLuhan says (in The Medium is the Massage). And no new medium, no new technologies, have ever worked us more than the digital kind. The change began with digital tech and integrated circuits and then went absolute with the Internet. Together, digital technologies and the Internet have radiacally changed our species, our civilization, and our planet.

Not long ago, in a conversation about this with Joi Ito, I asked him how big he thought the digital transformation was. Bigger than broadcast? Print? Writing? Speech? Stone tools?

“No,” he replied. “It’s the biggest thing since oxygenation.” In case you don’t remember, that happened between about two and a half billion years ago. (Joi also writes about it here.)

So, while journalism matters enormously, it’s just one casualty of digitalization. And, let’s face it, a beneficiary as well. Either way, we need to understand the whole picture, which is about a lot more than what journalism sees happening in the mirror.

Here’s one outfit working on that bigger picture. I‘m involved with it.

I also don’t expect most journalists to take much interest in the subject, because it’s too big, and it doesn’t make full sense as a story, which is journalism’s stock in trade. (I explain a bit about journalism’s “story problem” in this TEDx talk.)

Still, some journalists are on the case, including me. Love to have others join in. But please don’t bother if you think Big Tech is alone to blame. Because the story is bigger than that, and far more than a story.

I just copied and pasted this post from here in Medium, where I posted it in July 2019. It expands on a post now archived here. It’s kinda sad that not much has changed over all that time.

Thursday, 18. April 2024

Mike Jones: self-issued

OpenID Federation Session at April 2024 IIW

John Bradley and I convened a session on Trust Establishment with OpenID Federation at the Internet Identity Workshop (IIW) on Thursday, April 18, 2024. The material used to drive the discussion was: Trust Establishment with OpenID Federation (PowerPoint) (PDF) The session was well attended and the discussion lively. Numerous people with trust establishment problems to […]

John Bradley and I convened a session on Trust Establishment with OpenID Federation at the Internet Identity Workshop (IIW) on Thursday, April 18, 2024. The material used to drive the discussion was:

Trust Establishment with OpenID Federation (PowerPoint) (PDF)

The session was well attended and the discussion lively. Numerous people with trust establishment problems to solve contributed, including experts from the SAML federation world, people involved in digital wallet projects, and several people already using or considering using OpenID Federation. Thanks to all who participated!


The Pragmatic Engineer

The Pulse #90: Devin reversing ambitious claims

The “world’s first AI developer” tones down expectations and has been outperformed by an open source tool. Also: hiring upticks at Big Tech; a very realistic AI video generator by Microsoft, and more.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Fintech valuations rising again; pre-earnings layoffs at Tesla and Google; Google fires staff trying to interfere with business; Rippling offering a secondary to its employees, and more.

Devin: Reversing ambitious claims. A month ago, Devin launched with fanfare as “the world’s first AI developer,” claiming that it “even completed real jobs at Upwork.” Upon closer inspection, this claim did not hold up. The company behind Devin had since toned down expectations. Also: open source solution AutoCodeRover is offering even better performance than Devin’s closed-source and not-yet-publicly available tool. This space is commoditizing rapidly.

Microsoft’s disturbingly realistic AI video generator. Microsoft Research showcased a tool that generated very realistic videos from a single image. The #1 use case will surely be fraudulent deepfakes generation. This development could well speed up AI regulation in several countries.

Hiring upticks at Meta, Netflix and Amazon? Data from interview preparation website interviewing.io suggests hiring is back at full speed at Meta, Netflix and – possibly – Amazon.

1. Industry pulse Fintech valuations rising again

Read more

Tuesday, 16. April 2024

Mike Jones: self-issued

OpenID Presentations at April 2024 OpenID Workshop and IIW

As has become traditional, I gave the following presentation at the Monday, April 15, 2024 OpenID Workshop at Google: OpenID Connect Working Group Update (PowerPoint) (PDF) I also gave this invited “101” session presentation at the Internet Identity Workshop (IIW) on Tuesday, April 16, 2024: Introduction to OpenID Connect (PowerPoint) (PDF)

As has become traditional, I gave the following presentation at the Monday, April 15, 2024 OpenID Workshop at Google:

OpenID Connect Working Group Update (PowerPoint) (PDF)

I also gave this invited “101” session presentation at the Internet Identity Workshop (IIW) on Tuesday, April 16, 2024:

Introduction to OpenID Connect (PowerPoint) (PDF)

The Pragmatic Engineer

What is Security Engineering? Part 1.

A deep dive into the ever-changing field of security engineering; a domain that can feel intimidating to some software engineers. With Nielet D'Mello, security engineer at Datadog.

👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

Q: “As a software engineer, I’d like to learn more about security engineering. What’s a good way to understand this vast field?”

Security is so important in our industry. There’s frequently news stories about security incidents, like the authentication provider Okta which was breached, then responded poorly and got schooled on “Security 101” practices by its own customer, Cloudflare. The criticism that followed for Okta was inevitable and also deserved, as it essentially sells security. But what about engineers who want to build things securely, where do they start?

I figured there’s no better place to find out than by asking a security engineer, so I reached out to Nielet D'Mello. She’s a security engineer at Datadog, whose job is incorporating security into products from the very start of the development process. Nielet has been working in the security domain for nearly a decade, and before that she was at Intel, where she worked closely with the security team, as well as at McAfee, in consumer and enterprise security products. Nielet’s also speaks at security conferences – here’s her 2023 talk on security design and guidance at scale.

In today’s issue, Nielet takes us through:

Myths and misconceptions about security engineering. Common misconceptions, like that security is only security engineers' responsibility, or that security through obscurity is sufficient, and other myths.

History of security engineering. Security engineering’s evolution since the 1990s; especially network and perimeter defense up to today.

The present. A transformation to a proactive approach, and a shift to “decentralized security.”

A mental model. Seven core dimensions for thinking about application security, with a close look at each one.

Towards a Secure SDLC. An approach to make all steps of the software development lifecycle (SDLC,) “security-first.”

In next week’s issue, we round up this topic with tactical advice on how to define a service or system’s criticality, preparing for threat modeling exercises, and an overview of popular security strategies and principles like “defense in depth” and “zero trust.”

As a note, throughout this article we cover application security engineering (aka, “AppSec.”) This is the most common type of security engineering at tech companies building software products. Other specializations within security engineering include cloud security (focusing on cloud infrastructure security,) infrastructure security (securing hardware, operating systems, middleware,) and even physical security (physical access controls and surveillance.) These topics are out of scope for this series.

With that, it’s over to Nielet.

Common security engineering terms

Hi! We use three terms frequently in this article, so let’s start by defining them:

Vulnerability: An exploitable flaw or weakness in a system’s design, implementation or deployment

Threat: The potential for a threat actor to exploit a vulnerability

Risk: Loss or damage that could occur when a threat actualizes

Security engineering vs software engineering

How intertwined are security engineering and software engineering?

When it comes to software engineering, there’s nothing too special about security. Yet, its extensive depth, breadth, and nuance, mean the security domain has long felt intimidating to engineers. But it has existed for as long as software engineering; so, why does security engineering still feel like an “emerging” field? 

It’s due to software engineering’s ever-increasing complexity: distributed systems, microservices, cloud computing, Artificial Intelligence (AI,) and more. Security engineering aims to stay ahead in this dynamic, ever-evolving threat landscape, and businesses are starting to prioritize it more.

Some statistics reveal why investing in security is increasingly important:

$4.45M: global average cost of a single data breach in 2023, a 15% rise over 3 years

16% more application security attack surfaces. In 2023 alone, this meant 29,000 new vulnerabilities were identified, which organizations need to defend against.

A security engineering organization is usually tasked with:

Risk prevention and detection: Aim to defend an organization's assets: its data, applications, code, infrastructure, etc.

Response and recovery: react to threats and remediate attacks.

1. Myths and misconceptions about security engineering

I’ve observed several common misconceptions, and this article seems like a good place to debunk them.

Myth #1: Security is only the responsibility of security engineers

This is surprisingly common, but not exactly true. Security engineers are stewards of the organization's overall security posture, but realistically, they can never keep up with all developments in the product and platform space – just within their organizations! 

Security teams also tend to be lean, meaning there aren’t many engineers. If they focus too much on the weeds; like constantly triaging incidents or security findings, this will take away from high-value work that brings company-wide impact. Examples of high-value work include:

Security design reviews done product-wide

Building and running programs and services for a secure software development lifecycle

Relying solely on a security team to make all security design decisions is a common anti-pattern. Amazon Web Services, in its “AWS Well-Architected” guide, recommends against this practice, and instead suggests:

“Build a program or mechanism that empowers builder teams to make security decisions about the software that they create. Your security team still needs to validate these decisions during a review, but embedding security ownership in builder teams allows for faster, more secure workloads to be built. This mechanism also promotes a culture of ownership that positively impacts the operation of the systems you build. (...)

Common anti-patterns:

Leaving all security design decisions to the security team

Not addressing security requirements early enough in the development process.

Not obtaining feedback from builders and security people on the operation of the program.”

Myth #2: Security through obscurity is sufficient

Security through obscurity is the assumption that safeguarding certain details or functions of a system's operations can guarantee security. The principle is, “if only we know how this thing works, then it will be secure enough because others won’t be able to figure it out.”

This approach leads to a false sense of security! It can also lead to exploits. For example:

You have a web application with an admin panel, and this panel has features like managing users, managing content, and configuring the system. The admin panel has URL endpoints like /admin/user-management, /admin/content-management, /admin/system-configuration. How do you make these endpoints secure? The obvious way is to add authentication. However, this is a lot of effort. A simpler idea is to use obfuscation, remapping URLs to something hard to guess:

Security through obscurity. The newly mapped URLs at the bottom are obscurer, but no more secure than those at the top

In this case, the developer relies on the obscurity of the URLs to prevent unauthorized access. However, all it takes is for the URL endpoints information to leak, or an attacker to brute-force the URLs, and the website can be exploited.

Myth #3: More security measures makes software more secure

It’s tempting to believe, right? Unfortunately, in my experience, it’s simply untrue.

Implementing multiple security measures can enhance the overall security posture of software, but it’s essential to strike a balance between security and usability. For each security measure, carefully consider these things:

Effectiveness

Complexity

Performance impact

Management overhead

Your goal should be that collectively, the security measures provide meaningful protection against threats to the product or platform.

Myth #4: Once secure, always secure

So, your system passed all its security reviews and penetration tests, and you have evidence it is secure. Can you now step away, and assume it will continue to be secure? No!

The threat landscape is constantly changing. Over the past year, there’s been a surge in attacks aimed at businesses and organizations around the world. These attacks intended to damage brands’ reputations, steal sensitive data, seek financial gain, and more. They are often done by ransomware groups, such as BlackCat’s attack on Change Healthcare and Reddit, or mass account hacking through credential-stuffing.

New vulnerabilities and attack vectors emerge regularly. For example, applications built on top of large language models (LLMs) are now susceptible to prompt injection, which is a class of attack against applications built on top of LLMs. They work by concatenating untrusted user input with a trusted prompt constructed by the application’s developer. So, security mechanisms built against existing injection attacks must factor this in, as security measures that used to be effective become obsolete or insufficient against new and advanced threats, rendering software vulnerable.

Things that can introduce vulnerabilities and weaken your overall security posture:

Accumulation of technical debt

Using deprecated components and libraries

Outdated dependencies

Security vulnerabilities in a dependency, framework, library, or service

Zero-day exploits are disclosed vulnerabilities for which no patch is available. These are a special kind of vulnerability, unknown to all consumers of the software. Finding such exploits is very challenging, but organizations with large security teams can do it. For example, Google discovered 93 zero-days in 2023.

Regulatory requirements and industry standards often mandate regular security assessments, audits and updates, to ensure compliance with data protection laws and regulations. Adhering to these requirements may necessitate ongoing security improvements, regardless of the software's initial security status.

Myth #5: Penetration testing by itself ensures security

Penetration testing, aka pen testing, involves simulating real-world security attacks against a system, network, application, or organization's infrastructure. The main goal is to identify and exploit vulnerabilities in a system's security defenses, mimicking the tactics, techniques, and procedures of attacks. Pen testing allows organizations to understand their security posture and to prioritize remediation efforts accordingly.

Downsides of pen testing:

It’s a snapshot of the security posture at a single, specific moment

Costly and labor-intensive

A system deemed secure by a penetration test one day, may become vulnerable the next day to new exploits or changes in the environment. Plus, scoping plays a huge role in the impact of penetration test results. Scoping refers to applications, users, networks, devices, accounts, and other assets that should be tested to achieve the organization's objectives.

When pen tests are incorrectly scoped, broader security issues or systemic weaknesses may be missed, which attackers can exploit. Scoping pen tests correctly means providing enough information for the pen testing team upfront, so they can be productive. Here’s a summary from Jessica La Bouve, Solutions Architect at penetration testing vendor, BishopFox, on the importance of scoping:

“If a criminal has decided to target you, they have infinite time to find your weaknesses. (...) The assessment team has a finite amount of time to identify critical weaknesses. What they’re able to accomplish in that time depends on the amount of information you give them during scoping. (...)

Keeping your pen tester in the dark only makes it harder for them to find weaknesses that you need to fix. Even if an attacker starts from zero, they have plenty of time to conduct reconnaissance and learn a lot about your organization, giving your pen tester a head start means they can get right down to the business of finding the real threats to your systems. Attackers also don’t have any limitations on what they can try. They don’t usually worry about knocking your systems offline, but a pen tester would. To maximize a pen tester’s limited time and balance out the technical limitations placed on them, provide as much information as you can.”

2. History of security engineering

Security engineering teams tend to be lean by design and also by constraints, like the specialized skill sets needed, and budget limitations. This lean approach applies at whatever the scale of a company. 

Security teams are much smaller than product/platform engineering teams, and tend to be “two-pizza teams” of between 5-10 application security engineers. As the security org is small, it focuses on projects and initiatives offering high return on investment in value, risk reduction, and impact terms.

If we look at the evolution of security engineering, there’s been significant shifts over the decades due to technological advancement, changes in threat landscapes, and systems’ increasing interconnectedness. Below are some examples.

The 1990s 

The widespread adoption of the internet led to the development of various secure protocols (SSL, HTTPS,) and measures like firewalls and antivirus software to protect networks and data. The primary focus of security activities was network and perimeter defense, largely due to the dominance of client server architectures. 

The 2000s

Web applications gained popularity and security engineering shifted focus towards securing web applications and the network. As web vulnerabilities like SQL injection, cross-site scripting and buffer overflows became common, so did awareness of and focus on secure coding practices. 

Around the same time, compliance and regulatory frameworks like SOX, HIPAA, and PCI DSS came into effect, and led organizations to boost efforts to comply with security requirements and guidelines.

Early 2010s

Cloud computing created new security challenges, like data privacy, data encryption, secure authentication, access control, and secure infrastructure configurations. The vulnerability landscape evolved in tandem with rapid technological shifts, and security shifted to efforts to automate security testing and remediation.

Mid-2010s, early 2020s

The rise of containerization and microservices architecture, the emerging field of AI and machine learning, and a shift to zero-trust architectures. This means security engineering must deal with increased complexity and more attack vectors.

3. Security engineering today

Read more


Phil Windleys Technometria

Relationships are Entangled

People aren't digitally embodied. As a result their online relationships are anemic. As we think about the kinds of digital identity systems that will help people live authentic online lives, we must build identity system that give people tools to operationalize rich digital relationships.

People aren't digitally embodied. As a result their online relationships are anemic. As we think about the kinds of digital identity systems that will help people live authentic online lives, we must build identity system that give people tools to operationalize rich digital relationships.

Identity is the ability to recognize, remember, and react to people, organizations, systems, and things. In the current web, companies employ many ponderous technological systems to perform those functions. In these systems, we are like ghosts in the machines. We have "accounts" in companies' systems, but no good way to recognize, remember, and react to them or anyone else. We are not digital embodied.

One of great benefits of embodiment is the ability to form and operationalize rich digital relationships. I've written a lot about the nature of digital relationships.

Relationships and Identity

Authentic Digital Relationships

Ephemeral Relationships

Operationalizing Digital Relationships

Relationships in the Self-Sovereign Internet of Things

The Architecture of Identity Systems

Are Transactional Relationships Enough?

Fluid Multi-Pseudonymity

One of the discussions at VRM Day caused me to think about a feature of digital relationships I hadn't considered before. Someone said that if you think about a graph with people (or things, organizations, and so on) as the nodes, the relationships are the edges, like so:

Bi-directional relationship

In this figure Alice and Bob have a bi-directional relationship. This is how I've normally thought about it and how I'd have drawn it. But in today's discussion, someone said that the relationship is shared and that Alice and Bob both control it. But I realized that viewpoint is too simple. Specifically, Alice and Bob each have a different perspective of that relationship and will use it separately.

For example, imagine that Alice is the cashier at a grocery store and Bob is a customer. Alice gives great service, so Bob seeks her out when he shops. Alice on the other hand has no particular recollection of Bob from encounter to encounter. For Alice, the relationship is ephemeral, but for Bob, it's longer term. The nature of each relationship is different. So, we might look at it like this[1]:

Two, uni-directional relationships

But after discussing it some more, I realized that these relationships aren't independent. They're entangled like this:

Entangled relationships

In the example I gave above, as Bob seeks out Alice more and more, Alice might come to recognize him and call him by name, changing the nature of her relationship with Bob. And that may influence the nature of Bob's relationship with Alice. Over time, these interactions influence both relationships. So, while Alice and Bob both have control over their relationship with the other, actions by one influence the other.

I frequently say that we don't build identity systems to manage identities, but rather to manage relationships. The problem with contemporary identity systems is that they are all one sided, controlled by one party—almost always a company. As I've said before, people are not digital embodied and thus have no good way to manage their online relationships. As we strive to build better digital identity systems, I think it's paramount that we build systems that provide people with tools that embody them and provide them with the ability to operationalize their online relationships. These are more than decentralized; they are self-sovereign.

Notes

Peer decentralized identifiers (DIDs) are a great technology for creating bi-directional relationships.

Share

Subscribe now

Monday, 15. April 2024

Just a Theory

Mini Summit: Jonathan Katz on TLEs

The topic of the Postgres Extension Mini-Summit this week is Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will be giving a brief presentation on the “TLE Vision and Specifics” at noon Eastern / 16:00 UTC on Wednesday April 17, followed by community discussion.

Coming up this week: the fourth installment of the Postgres extension ecosystem mini-summit. The topic: Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will give a brief presentation on “TLE Vision and Specifics” followed by community discussion of the issues TLEs aim to address, what works and what doesn’t, and the role of TLEs in the future of the extension ecosystem.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite. And if Eventbrite says the event is sold out, rest assured we have plenty more virtual seats! Just send at david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord.

More about… Postgres TLE Extensions PGConf Summit

Damien Bod

Implement a secure Blazor Web application using OpenID Connect and security headers

This article shows how to implement a secure .NET 8 Blazor Web application using OpenID Connect and security headers with CSP nonces. The NetEscapades.AspNetCore.SecurityHeaders nuget package is used to implement the security headers and OpenIddict is used to implement the OIDC server. Code: https://github.com/damienbod/BlazorWebOidc OpenIddict is used as the identity provider and an OpenID connect

This article shows how to implement a secure .NET 8 Blazor Web application using OpenID Connect and security headers with CSP nonces. The NetEscapades.AspNetCore.SecurityHeaders nuget package is used to implement the security headers and OpenIddict is used to implement the OIDC server.

Code: https://github.com/damienbod/BlazorWebOidc

OpenIddict is used as the identity provider and an OpenID connect client is setup to allow an OpenID Connect confidential code flow PKCE client. The Web application is a server rendered application using Blazor server components implemented using Blazor Web, ASP.NET Core and .NET 8.

Step 1: Init solution from the .NET Blazor samples

The solution was created using the Blazor samples from Microsoft. The .NET 8 BlazorWebAppOidc project was used to setup the solution.

https://github.com/dotnet/blazor-samples/tree/main/8.0/BlazorWebAppOidc

The code sample implements the client profile parts and the CSRF protection. Login and Logout plumbing is also implemented.

Step 2: Switch the OpenID Connect server

OpenIddict is used as the identity provider and so the OIDC client set up needs to be changed. The program file was updated and the OpenID Connect Microsoft Entra ID client was replaced with the OpenIddict client. The client on the server is setup directly in the worker class in the Openiddict server. Both of the setups must match. The client uses an OpenID Connect confidential client with code flow and PKCE.

builder.Services.AddAuthentication(OIDC_SCHEME) .AddOpenIdConnect(OIDC_SCHEME, options => { // From appsettings.json, keyvault, user-secrets // "OpenIDConnectSettings": { // "Authority": "https://localhost:44318", // "ClientId": "oidc-pkce-confidential", // "ClientSecret": "--secret-in-key-vault-user-secrets--" // }, builder.Configuration.GetSection("OpenIDConnectSettings").Bind(options); options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.MapInboundClaims = false; // Remove Microsoft mappings options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = "name" }; }) .AddCookie();

Note: You could also use the OpenIddict client packages to implement the client. I like to use the defaults.

Step 3: Disable WASM mode

Any web application should protect the session, not just implement authentication using an OIDC server. One of the most important browser protection is the CSP header and a good CSP uses a nonce. Blazor Web using WASM does not support this and so this must be disabled. Remove the WASM part from the middleware.

In the program.cs, update Blazor Web to:

builder.Services.AddRazorComponents() .AddInteractiveServerComponents();

and

app.MapRazorComponents<App>() .AddInteractiveServerRenderMode() .AddAdditionalAssemblies( typeof(BlazorWebAppOidc.Client._Imports).Assembly);

Remove the WASM usage in the UI components. Switch to InteractiveServer mode.

<HeadOutlet @rendermode="InteractiveServer" /> </head> <body> <Routes @rendermode="InteractiveServer" /> Step 4: Add CSP nonce middleware

The CSP nonce can be used in Blazor (Server) components with some extra effort because the Blazor components cannot read the HTTP headers from the responses. The CircuitHandler class can be used for this. A BlazorNonceService class can be created to add the nonce. This class inherits the CircuitHandler implementation.

using Microsoft.AspNetCore.Components; using Microsoft.AspNetCore.Components.Server.Circuits; namespace BlazorWebAppOidc.CspServices; /// <summary> /// Original src: https://github.com/javiercn/BlazorWebNonceService /// </summary> public class BlazorNonceService : CircuitHandler, IDisposable { private readonly PersistentComponentState _state; private readonly PersistingComponentStateSubscription _subscription; public BlazorNonceService(PersistentComponentState state) { if (state.TryTakeFromJson("nonce", out string? nonce)) { if (nonce is not null) { Nonce = nonce; } else { throw new InvalidOperationException( "Nonce can't be null when provided"); } } else { _subscription = state.RegisterOnPersisting(PersistNonce); } _state = state; } public string? Nonce { get; set; } private Task PersistNonce() { _state.PersistAsJson("nonce", Nonce); return Task.CompletedTask; } public void SetNonce(string nonce) { ArgumentException.ThrowIfNullOrWhiteSpace(nonce); if (Nonce != null) { throw new InvalidOperationException("Nonce already defined"); } Nonce = nonce; } public void Dispose() => ((IDisposable)_subscription)?.Dispose(); }

A NonceMiddleware ASP.NET Core middleware service can now be used to read the nonce from the headers and set this in the BlazorNonceService CircuitHandler implementation. NetEscapades.AspNetCore.SecurityHeaders is used to implement the security headers and if a CSP nonce is created, the NETESCAPADES_NONCE http header is set.

namespace BlazorWebAppOidc.CspServices; public class NonceMiddleware { private readonly RequestDelegate _next; public NonceMiddleware(RequestDelegate next) { _next = next; } public async Task Invoke(HttpContext context, BlazorNonceService blazorNonceService) { var success = context.Items .TryGetValue("NETESCAPADES_NONCE", out var nonce); if (success && nonce != null) { blazorNonceService.SetNonce(nonce.ToString()!); } await _next.Invoke(context); } }

The middleware for the nonce is added to the ASP.NET Core services.

builder.Services.TryAddEnumerable( ServiceDescriptor.Scoped<CircuitHandler, BlazorNonceService>(sp => sp.GetRequiredService<BlazorNonceService>())); builder.Services.AddScoped<BlazorNonceService>();

Use the middleware is in the ASP.NET Core pipelines.

app.UseMiddleware<NonceMiddleware>(); Step 5: Add HTTP browser security headers

The NetEscapades.AspNetCore.SecurityHeaders nuget package is used to implement the security headers as best possible for this type of application. The SecurityHeadersDefinitions class implements this. CSP nonces are configuration as well as other security headers.

namespace BlazorWebAppOidc; public static class SecurityHeadersDefinitions { public static HeaderPolicyCollection GetHeaderPolicyCollection( bool isDev, string? idpHost) { ArgumentNullException.ThrowIfNull(idpHost); var policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self().From(idpHost); builder.AddFontSrc().Self(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); builder.AddStyleSrc() .UnsafeInline() .Self(); // due to Blazor builder.AddScriptSrc() .WithNonce() .UnsafeEval() // due to Blazor WASM .StrictDynamic() .OverHttps() .UnsafeInline(); // only a fallback for older browsers }) .RemoveServerHeader() .AddPermissionsPolicy(builder => { builder.AddAccelerometer().None(); builder.AddAutoplay().None(); builder.AddCamera().None(); builder.AddEncryptedMedia().None(); builder.AddFullscreen().All(); builder.AddGeolocation().None(); builder.AddGyroscope().None(); builder.AddMagnetometer().None(); builder.AddMicrophone().None(); builder.AddMidi().None(); builder.AddPayment().None(); builder.AddPictureInPicture().None(); builder.AddSyncXHR().None(); builder.AddUsb().None(); }); if (!isDev) { // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(); } policy.ApplyDocumentHeadersToAllResponses(); return policy; } }

The security headers are added using middleware as early as possible in the pipeline. I add the headers for all requests.

app.UseSecurityHeaders( SecurityHeadersDefinitions.GetHeaderPolicyCollection( app.Environment.IsDevelopment(), app.Configuration["OpenIDConnectSettings:Authority"]));

The CSP nonce can now be used in the Blazor components and scripts can only be read using the nonce. Unsecure scripts or unsecure inline scripts should never be read anywhere in a browser application.

<pre class="wp-block-syntaxhighlighter-code"> <a href="http://_framework/blazor.web.js">http://_framework/blazor.web.js</a> </body> </html> @code { /// <summary> /// Original src: https://github.com/javiercn/BlazorWebNonceService /// </summary> [CascadingParameter] HttpContext Context { get; set; } = default!; protected override void OnInitialized() { var nonce = GetNonce(); if (nonce != null) { BlazorNonceService.SetNonce(nonce); } } public string? GetNonce() { if (Context.Items.TryGetValue("nonce", out var item) && item is string nonce and not null) { return nonce; } return null; } }</pre>

The applications can be started and the authentication and the session protection can be validated. Using the WASM mode in Blazor Web requires a weaker security setup and you need to disable the CSP nonces. This is not a good idea.

Links

https://github.com/dotnet/blazor-samples/tree/main/8.0/BlazorWebAppOidc

Using a CSP nonce in Blazor Web

https://github.com/damienbod/BlazorServerOidc


@_Nat Zone

NHK「最深日本研究~外国人博士の目~」からメタバースとアイデンティティとプライバシーについて考えてみる

4/13(土) 午前8:10-午前8:40にNHKで「最深日本研究~外国人博士の目~」という番組が放映されました。4/20(土) 午前8:39 までNHK+で視聴可能ですでは触れられているようです。また、日本における「理想」の実体化としての「カワイイ」の考察などもされています。昨…

4/13(土) 午前8:10-午前8:40にNHKで「最深日本研究~外国人博士の目~」という番組が放映されました。4/20(土) 午前8:39 までNHK+で視聴可能です2では触れられているようです。また、日本における「理想」の実体化としての「カワイイ」の考察などもされています。昨今、アイデンティティとプライバシーの国際標準化の分野でもメタバースが関心の対象になってきているところなので、とても参考になりました。

以下に番組概要のメモと、その後にアイデンティティやプライバシーの観点での考察を記載します。

番組概要

番組の概要はこんな感じです。なぐり書きメモなんで整っていませんが、まぁなんとなくは分かるでしょう。NHK+での番組公開終了までは、ぜひ番組をご覧になることをおすすめします。

(出所)NHK+ 「最深日本研究~外国人博士の目~」https://www.nhk.jp/p/ts/RJ5G2XZ4N3/episode/te/J1K96JLJ9K/

導入部 2000年代以降ネット上に広がったもう一つの世界、メタバース。人々はアバターと呼ばれる自らの分身を使い見知らぬ人との出会いを楽しむ。 その中で、日本人コミュニティではバ美肉が目立つ。 スイスの文化人類学者リュドミラ・ブレディキナ氏(31)3はこのことに興味を持つ。 なぜ日本人男性は美少女キャラクターになりたがるのか、その答えを求めて研究 アニメ・和食・J-POP日本文化に注がれる世界からの熱い視線。そして、日本を愛しわれわれには無い目線でより深く日本を知ろうとする外国人研究者 彼らに日本はどう写っているか。バーチャル人類学者がわれわれの知らない日本の真の姿を解き明かす〜最深日本研究 第1章:最新メタバース体験施設を調査 スイスにはこのような施設はない メタバースを体験するためのすべてが揃っている 初心者には案内人もついている。やはり美少女キャラクター 一般視聴者に向けたメタバースの紹介 アバターの選択〜好きなアバターになる〜アイデンティティの表出 バーチャル空間では一瞬で何十万もある世界を移動できる そこで景色を楽しんだり、人々とお酒を飲んで交流したりなど、現実世界と同じように楽しめる メタバースの利用者は世界中でおよそ4億人 花火をましたから見上げるなどここでしかできない体験 第2章:日本のバーチャル世界を研究したい ミラさんがバーチャル世界の研究を始めたのは4年前 研究調査で驚きを持って知った言葉「バ美肉」(バーチャル美少女受肉)男性が美少女キャラを持つときに使われることば 「受肉」には西洋では宗教的な意味があるが、バ美肉たちは宗教に全く関心がない。そこにミラさんは興味を惹かれた。 調査をすると驚くべきことがわかった。男性のおよそ8割が美少女キャラクターを使っている 女性 78.3% 男性 15.2% その他 6.5% その見た目には大きな特徴:アニメに出るような可愛い女の子。これに対して西洋では、クールでセクシー。カワイイとは違って強い女性という印象を受ける。

(出所)NHK「最深日本研究~外国人博士の目~」https://www.nhk.jp/p/ts/RJ5G2XZ4N3/episode/te/J1K96JLJ9K/

ミラさん:わたしは西洋で育ち、女性として美しくエレガントでありたいと思っていた。なぜ日本の男性はカワイイと言われたいのか。西洋人としてこの異なる視点をもっと理解したいと思います。 文化人類学の紹介 20世紀初頭に一つの学問領域として確立。 クロード・レヴィ・ストロース ルース・ベネディクト 梅棹忠夫 机に向かうだけではなく、現地に赴き直接人と触れあるフィールドワークという手法で人間とは何かを探求する学問 人類学者は世界各地の民族を訪ね、彼らとともに暮らし、彼らの生活をつぶさに観察、自分の属する社会と比較することで、その共通点や違い、多様性を明らかにしてきた バーチャル世界という新たな地平 最初SNSを使いバ美肉たちに話しかけたが、部外者であるわたしは信用されなかった。わたしが西洋人であり女性だったから。そこで、ミラさんは人類学の手法であるフィールドワークを使ってバ美肉たちに接近。自分のアバターを作ることは重要だった。バーチャル世界で彼らの信用を得ることができたから。 アバターを纏うことでバ美肉たちと同じ目線に立ち直接ふれあい始めたリュドミラ ときには一緒にライブ配信を行い、オリジナル曲の英訳を買って出た そして1年ほどかけて、徐々にコミュニティの中に入っていった 第3章 声までカワイイってどういうこと? 両声類4の紹介 声は男性のままか音声変換器を使うのが一般的 しかし、あまちじょんこ(Youtuber @johnko_amachi X: @johnko20100711) などは自らの超絶技巧で声を変える。 中の人は、じょんアニキ。 めざすもの:天真爛漫で無邪気で、こういう小さい子っているよねという女の子をしたい。 高い声にするだけだと、男の人の高い声になってしまう。声は楽器と一緒なので楽器の形を変えてあげる。そうやって音質を変えて、その上で音の高さを変える。両方が合わさってじょんこの声になる 2年かけ、理想の可愛い声を獲得した 「別の自分になれるってめちゃくちゃ楽しくて。で、なんかそれで、ただ楽しいということを突き詰めていく。天才じゃないから頑張るしかない。」 カワイイ動きのトレーニング。いざやっていむと難しい。ちいさい子の気持ちになって動かないといけない。たとえば、歩くときに右の足を出すと右に重心がよってしまう。これを繰り返すとふらふら見える。このふらふらした動きが、見ている人からしたら守ってあげたいというので可愛く見える。 ミラさん 歩く時にかわいいを意識したことがなかったのでとてもおもしろいと思った 前は日本の男性といえばサラリーマンかオタクという表層的なステレオタイプなイメージしかなかった。しかし、研究をしてみると、日本人男性の多様性や日本文化の寛容性を知りとても面白いということがわかってきた ミラさんの生い立ちの説明 ロシア生れ。12歳でスイスに移住。フランス語の勉強に日本の漫画。「天使禁猟区」5。今までこのような美しい絵を見たことがなかったので新しい世界が開けた。 27際のときに日本文化を学ぶためにジュネーブ大学の修士課程に。 論文を発表「Babiniku」6 日本の伝統芸能の一つである歌舞伎の女形と同様に男性が美少女キャラクターを演じる類似点 人形浄瑠璃とアバターをあやつるバ美肉の共通性など日本の伝統文化を通してバ美肉を考察した論文は大学から学術賞を受けた 日本の文化はとても興味深く、趣味と実益を兼ねた研究をしたいと考えた ブレンダ・マーフィー教授 「彼女の研究の魅力は西洋で主流ではない”日本文化”に着目している点です」 バーチャル人類学はとても新しい分野であり、彼女はグローバルな文化への挑戦者といえる リュドミラが来日するのはこれで6度目。街に行けば目につくのは美少女キャラクター。看板や飲食店など町のいたるところで見かける。美少女キャラクターは秋葉原だけでなく日本文化で不可欠な存在 アニメや漫画の専門店に立ち寄り、次々に新しいものが生まれる日本のサブカルチャーもミラさんにとって研究の一環 第4章 美少女キャラに会いたい アイドルVTuber のらきゃっと (@NoraCat0415 ch. 登録者86000人以上)。ファンネーム「ねずみさん」 これを生み出したのが「のらねこP」(@PIMR, X: @VR_Produce_Nora)インタビュー のらきゃっとにガチ恋だと言っているファンとの関係についておしえて のらちゃんがちょっとドジなところもあってでも強くて可愛くてというのはファンがそう思っているから作られたという部分が多くて、ネズミさんたちがのらきゃっとという存在にそうあってほしいという願いのもとに作られたのでファンからの影響はとても強い のらきゃっとがファンから可愛いと言われるとどう感じるか 娘が公園で仲の良い友達と遊んでいて幸せそうだと親が感じるのに似ている のらきゃっとが生まれて人生は変わったか? 今までの仕事に行って帰ってきて家で寝るだけという生活でそんなに友達もいないというような人生から、本当に楽しく話せる友達が10倍以上に増えた。誰かと話したいときに話せるので寂しいと思う瞬間がなくなった かわいい美少女になれるって最高ですね「みんななれる〜」 ファンから求められることで美少女キャラはよりかわいくくなり、その美少女を生み出した彼の生活も豊かにしている。今日いちばん重要な発見だった 第5章 美少女になる理由を知りたい バーチャル美少女とファンとの交流はさらに進化。現実世界のファンが集まる店。友人のバ美肉が出演すると聞き訪ねた。「新宿ねむバー」 バーチャル美少女ねむさん(Youtube @nemchan_nel, X @nemchan_nel )=『メタバース進化論』(技術評論社)の著者で、ミラさんの研究の頼もしい協力者 ミラさん:「直接あって美少女やメタバースとか私達が情熱を注いでいることについてもっと話そう」 ねむさん:「楽しそう。たくさん話すのを楽しみにしてるよ」 ミラさん:「ではリアル世界で」 ねむはプライベート時間の殆どをメタバースで過ごし、専門書7を出版するほどこの世界について考え続けてきた人物。 ミラさん:直接美少女キャラクターになる理由を訊いてみたい 匿名で活動したいというねむさんの希望により映像と音声を加工しています→モザイクの大きさはかなり調整したらしい SNSを通じ知り合ったのは4年前。ねむさんとはメタバースユーザの生活実態を一緒に調査したことも。 ミラさんによるねむさんへのインタビュー 初めてねむになったときどう思ったの? すごく恥ずかしかった。 罪悪感〜違う自分になるというのが悪いことをしているような感覚だった でもだんだんいつもの自分では出てこないような言葉が口から出てきた 美少女になると自己表現も変わるということか? 「現実の自分だとカッコつけちゃうのかな」 「そこまで心揺れ動かない結構冷静な人間だと自分のことを思っていたけど、ねむちゃんのときはそういうのから開放されてもっと自由に表現できる」 あなたにとっての美少女ってどういう存在なのか? “カワイイ”という概念を具現化した存在が美少女キャラクターだと思っている 肩書とか現実世界のしがらみを剥ぎ取ると人間の本質ってすごくカワイイ。魂って実はカワイイと思っていて、カワイイはその人の本質を褒めている気がする。 自信になるというか、ありのままでちゃんと意味があるんだと思える気がする なぜ日本人男性が美少女になりたいと思うのか。その答えを求めてフィールドワークを続けてきたミラさんの今考える結論 日本人男性は美少女になりカワイイを獲得することでストレスから開放される。 なぜならカワイイは失敗しても許されるし責任を負わなくていいからです。 私は日本文化から生まれたカワイイに可能性を感じました しかし私は人類学者としてコミュニティを一般化するのではなく、これからも多くの人たちの声に耳を傾けたいと思っています

エンド:若き人類学者が日本のバーチャル世界に分け入っていく

考察 アイデンティティとバ美肉

アイデンティティという言葉は番組の中に出て来ませんが、多くの発言は「アイデンティティ」関連だとわたしの中では整理されました。たとえば、じょんこさんの「別の自分になれるってめちゃくちゃ楽しくて。で、なんかそれで、ただ楽しいということを突き詰めていく。」という発言は、現実社会の仕事だとか交友関係という既存のコンテキストを超えた新たなコンテキストでの自我の確立とその表出(自観するアイデンティティの表出)およびそれがコミュニティ内で他の人に観測されるもののフィードバック(他観)という関係性の享受から幸福感を得ているように見えます。(自観と他観のあたりは、「非技術者のためのデジタルアイデンティティ入門」をご参照ください)

類似のことは、ねむさんの「(最初ねむになったとき)すごく恥ずかしかった。罪悪感〜違う自分になるというのが悪いことをしているような感覚だった。でもだんだんいつもの自分では出てこないような言葉が口から出てきた。」「ねむちゃんのときはそういうのから開放されてもっと自由に表現できる。」という言葉にも出てきます。

一方、のらねこPの「のらちゃんがちょっとドジなところもあってでも強くて可愛くてというのはファンがそう思っているから作られたという部分が多くて、ネズミさんたちがのらきゃっとという存在にそうあってほしいという願いのもとに作られたのでファンからの影響はとても強い。」という発言は、自観と他観のズレの最小化を、表出する属性の調整によって他観の方を変えて行うという「通常」のやり方とは逆で、他観の方に自観をあわせてしまうというアプローチに見えます。根源的自我に他観を合わせに行くのではなく、自我の方を調整してしまうわけですね。ただ、これが本当に自分の自我だとかなり心理的負荷が大きくなってくるのではないかと思われます。心理的負荷を減らすには、自分の中からその人格を外だししてしまって客観化してしまうことが必要であるようにも思えます。

このあたりが、のらねこPの「娘が公園で仲の良い友達と遊んでいて幸せそうだと親が感じるのに似ている」という発言からも表出されているように感じられました。

ここの部分で、ミラさんは「ファンから求められることで美少女キャラはよりかわいくくなり、その美少女を生み出した彼の生活も豊かにしている。今日いちばん重要な発見だった」と述べています。

免罪符としては機能しない美少女キャラクターとプライバシー

次に検討してみたいのが、ミラさんの「日本人男性は美少女になりカワイイを獲得することでストレスから開放される。なぜならカワイイは失敗しても許されるし責任を負わなくていいから。」という発言です。わたしの解釈違いかもしれませんが、カワイイが免罪符的に機能するので、日本人男性はカワイイを獲得したいと言っているように聞こえます。「日本人男性は」としているので、この「カワイイ」の概念とその通用する範囲が日本にまず限定されているのだろうということは想像できます。そのうえで「カワイイ」が免罪符的に機能するのかと言うと、これはちょっと難しいところだと思っています。

「カワイイ」を体現するものとしては、いわゆるアイドルVTuberなどがいると思います。ところが、その免罪符を持っているはずの彼女たちは、時として誹謗中傷されたりコミュニティから叩かれ、「卒業」を余儀なくされたりしているのは、このあたりをちょっと追っている人ならよくご存知のことと思います。もし「カワイイ」が免罪符として機能するならばこういうことは起きないと思うのです。

おそらく理想としての「カワイイ」には人々の期待がつまっていて、その期待を裏切ると叩かれるという構図があるのではないかとも思えます。つまり「カワイイ」が免罪符として機能するわけではない。ある意味、「”カワイイ”という概念を具現化した存在としての美少女キャラクター」は現代の巫女であり、古代と同じく、期待を裏切ったときにはスケープゴートにされる存在だと言えるかもしれません。で、わたしはスケープゴートって作ってはいけないと思うんですよね。

現代の巫女がスケープゴート化してしまう原因の多くは、コンテキスト外の情報の当該コンテキストへの混入です。たとえば、誰かからのチャットが画面に映り込んでしまったとか、他の人の声が混入してしまったとか、自分の顔が反射で映り込んでしまったとかですね。今後は他のワールドでの行動が当該ワールドにレポーティングされてしまうというような形の情報混入も起きてくることでしょう。場合によっては悪意の人が監視用のキャラクターを各所に配置して意図的にこれをやってくることも考えられます。中の人と、その人が運用する異なるアバターの名寄せとかもこの中に入ってきます。

番組の中でねむさんは「匿名で活動したい」としてモザイクをかけて声も変えて登場していますが、これも望まない名寄せであるとか情報の混入を防ぐ取り組みの一環と整理することもできます。

こうしたことをどうやって防ぐかとか取り締まっていくかなどが、メタバースにおけるプライバシーの尊重という意味では重要性を増してくると思われます。メタバースプライバシーの国際標準作成作業が始まっている背景でもあります。

しがらみを削ぎ落とした存在としての「ありのままの自分=美少女キャラクター」

さて、ではなぜ日本人男性はバ美肉化するのか?上記のことを勘案すると、ミラさんのいう「日本人男性は美少女になりカワイイを獲得することでストレスから開放される。なぜならカワイイは失敗しても許されるし責任を負わなくていいから。」というのはちょっと違いそうです。

番組内でねむさんは「”カワイイ”という概念を具現化した存在が美少女キャラクターだと思っている。肩書とか現実世界のしがらみを剥ぎ取ると人間の本質ってすごくカワイイ。魂って実はカワイイと思っていて、カワイイはその人の本質を褒めている気がする。自信になるというか、ありのままでちゃんと意味があるんだと思える気がする。」とおっしゃっておられます。現実世界のしがらみを剥ぎ取った人の本質は「カワイイ」であり、そのありのままの自分の自我「自観」を表出すると、日本文化においては美少女になるということでしょうか。そうかもしれません。なにせ、8割近くのひとがそうあろうとするのですから。

番組では、各被取材者ごとに数時間のインタビューがあったのがそれぞれ5分くらいにまとめられてしまっているようです。削ぎ落とされたものの中に、このあたりに光を当ててくれる発言も多くあったことでしょう。その意味で、アフタートークも準備されているようなので、それに期待したいと思います。

【ガチ勢各位】
実際にはみんな2 ~3時間喋った内容をそれぞれ一般人向けに5分くらいにまとめられてるので、ガチ勢のみんなとしたはぶっちゃけ物足りない面もあると思う。という訳で、アフタートーク企画中です!!! #最深日本研究 https://t.co/db9ToxqQpl

— バーチャル美少女ねむ/Nem4/13 NHK「最深日本研究」出演! (@nemchan_nel) April 14, 2024

(追記)アフタートークの日時が決まったようです。4/20(土)22:00〜。わたしはサンフランシスコからの飛行機の時間と微妙なのですが…。アーカイブは公開されるのだろうか…。

【緊急ライブ告知】4/20 (土) 22:00~
ゴメン全然喋り足りない!!! #最深日本研究 出演者4名が、番組には収まりきらなかった”バ美肉”や活動への想いなど、時間無制限で喋り倒します!

NHK「最深日本研究」非公式アフタートーク【ねむ✕ミラ✕のらねこP✕じょんこ】https://t.co/2zG4EbuVGO pic.twitter.com/0G2R0zHGEo

— バーチャル美少女ねむ/Nem4/13 NHK「最深日本研究」出演! (@nemchan_nel) April 15, 2024

<変更履歴>

ミラさんのことを博士と書いていましたが、まだ博士課程在学中のようですので「ミラさん」の表記に変えました。 アフタートークの日時を追加しました。 サムネがFacebookだとはねられてしまうので差し替えました。 「ブレディキナ」を「プレディキナ」と表記してしまっていたのを修正

Sunday, 14. April 2024

Kent Bull

KERI Update: Major release and Branch Strategy Change

The WebOfTrust community recently released a major update to the Key Event Receipt Infrastructure (KERI) and Authentic Chained Data Containers (ACDC) implementation as a coordinated release across the suite of WebOfTrust repositories. Coordinated Release This resulted in the following release versions: Branch Strategy Change Recent Blog Posts See Nuttawut Kongsuwan’s […]

The WebOfTrust community recently released a major update to the Key Event Receipt Infrastructure (KERI) and Authentic Chained Data Containers (ACDC) implementation as a coordinated release across the suite of WebOfTrust repositories.

Coordinated Release

This resulted in the following release versions:

Repository: KERIpy (used for witnesses, command line-managed decentralized identities, and as the core library for the Python KERY space)
Version: 1.1.12 (PyPi) – Apr 9, 2024 Repository: KERIA (agent server used for individual decentralized identity agents – your data sits here)
Version: 0.2.0.dev0 Pre-Release (PyPi) – Apr 11, 2024 Repository: SignifyPy (edge-signing client implementation – your keys sit here)
Version: 0.1.0 (PyPi) – Feb 13, 2024 Branch Strategy Change “development” branch merged to “main” as the old GitFlow style branching strategy was dropped in favor of trunk-based development (single main branch strategy). See the keripy#726 GitHub discussion for the rationale. This occurred for the following repositories: Python KERIpy KERIA SignifyPy Typescript SignifyTS

Recent Blog Posts

See Nuttawut Kongsuwan’s explanation of how to use KERI in his “The Hitchhiker’s Guide to KERI. Part 3: How do you use KERI?

Saturday, 13. April 2024

Michael Ruminer

Refocusing a Career

Now for something entirely different. I write only occasionally on a topic of interest to me, verifiable credentials. Today it is something entirely different, my career. Most specifically, on how it may be time for a refocus. A movement from software development life-cycle process back toward a new more technical focus, AI. I have, in some ways, refocused my career a couple of times. It was

Now for something entirely different. I write only occasionally on a topic of interest to me, verifiable credentials. Today it is something entirely different, my career. Most specifically, on how it may be time for a refocus. A movement from software development life-cycle process back toward a new more technical focus, AI.

I have, in some ways, refocused my career a couple of times. It wasn’t a full restart or anything; just moving into new directions within software development. Some were more a natural progressions than a refocus. This time it would certainly be a refocus.

What I do and what I have done?

For the last several years I have been a software technical release manager focused on not only getting software out the door but on process improvement. That is not where I started my career. It all started as a system admin, then a programmer, then getting involved in migrations and software develpment for mitigating Y2K, on to a software team lead. Then came the first refocus, I got deeply invested in a new Microsoft product Team Foundation Server. This took me down a fun rabbit hole where I spent some years consulting on implementations and use of the product. I spent a few years of my consulting as a Micrsoft MVP on the product, which was very rewarding. Ultimately, with a focus on version control, automated builds and deploys, and release processes with the product and in general, I made the next refocus. I moved into release management. Decidely less technical, but something I have now enjoyed for a number of years. I find the cross-functional nature rewarding and plenty of opportunity to still practice process improvement.

Why the consideration of a refocus?

I see it as a dying domain. I have found that the role is late to join organizations and more easily released from those same organizations. In my experience, companies find unexpected value in the role and can really enjoy having the “glue” bring higher quality and improved flow to production and beyond, but “enjoy” is too much of an operative word here. When belts are tightened it is one of the first roles to go — seen as a luxury not a requirement. I also see diminishing new opportunities in the marketplace. All this places downward pressure on compensation. Additionally, upward progression from the role is often difficult. So it may be time for a full refocus, again.

Too technical for the role?

It wasn’t too long ago I was told as a final feedback for a failed pursuit of an open release management position:

You are too technical.

Hmmm… I knew what they meant, but I had never had my quals work against me- not that I know of. Most find it to be a boon. It made me think. With all the considerations of why there might need to be a refocus, is it time to shift my momentum?

Where to next?

I really enjoy what I do and would hate to give it up but the writing might be on the wall. So, as I survey the landscape, I say to myself “what might be next”. Where is the best opportunity to delve into new things and make an impact for organizations? Is it the AI domain? Can I go from zero to hero? What does that look like? How long would it take and what path is there? I see a lot of potential in Microsoft’s Azure AI offerings and the ones that will inevtiably come down the line. Plus, I have a long history in the Microsoft stack. Let’s see where the next few years take me. Regardless of anything, I’ll certainly be digging deeper into the practical applications and implementations of the technology all why continuing my enjoyable current role in software life cycle.

Back to a common topic

Now, if I can come up with an AI realm play that ties in verifiable, portable, credentials. What will be bigger in the future than identity and AI? Nothing.

Friday, 12. April 2024

Mike Jones: self-issued

Fully-Specified Algorithms Presentation at 2024 OAuth Security Workshop

I gave a presentation on Fully-Specified Algorithms for JOSE and COSE at the 2024 OAuth Security Workshop in Rome. The slides used to update participants on the progress of the work are available as PowerPoint and PDF. Thanks to the organizers for another great OAuth Security Workshop! And special thanks to the colleagues from Fondazione […]

I gave a presentation on Fully-Specified Algorithms for JOSE and COSE at the 2024 OAuth Security Workshop in Rome. The slides used to update participants on the progress of the work are available as PowerPoint and PDF.

Thanks to the organizers for another great OAuth Security Workshop! And special thanks to the colleagues from Fondazione Bruno Kessler who did a great job with local arrangements in Rome!


COSE “typ” (type) Header Parameter Specification in RFC Editor Queue

I’m pleased to report that the COSE “typ” (type) Header Parameter Specification has been approved by the IESG and is now in the RFC Editor queue. The version approved by the IESG and sent to the RFC Editor is: https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-05.html It joins CBOR Web Token (CWT) Claims in COSE Headers in the RFC Editor queue. […]

I’m pleased to report that the COSE “typ” (type) Header Parameter Specification has been approved by the IESG and is now in the RFC Editor queue.

The version approved by the IESG and sent to the RFC Editor is:

https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-05.html

It joins CBOR Web Token (CWT) Claims in COSE Headers in the RFC Editor queue. Because of the reference to this spec by CWT Claims in Headers, they form a cluster, and therefore will become RFCs at the same time.

Thursday, 11. April 2024

The Pragmatic Engineer

The Pulse #89: The end of Hopin

In only 5 years, Hopin went from zero to a $7.7B valuation, and back to zero again. Also: Bending Spoons’ startup acquisition model, hiring upticks at Meta, Netflix and Amazon, and more

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. The first mass layoffs at Apple since 1997 (or not?); amateurish URL rewrite at X (formerly Twitter); never-ending job interviews for engineering executives, and more.

The end of Hopin. It took Hopin just two years to become the fastest-ever growing European startup by valuation. Four years later, the company is no more. The final valuable parts of Hopin are being sold, and all staff are expected to be let go. Exclusive details on the StreamYard sale.

Weekend maintenance kicks an Italian bank offline for days. It is now day five that Italian bank Sella has its apps and internetbank down, after a weekend systems update went south. The problem seems to be database-related: “something, something Oracle.”

Adyen, the only major Fintech with zero mass layoffs? All major Fintech startups have let go of some staff over the past two years, except Adyen. Meanwhile, the business has quietly become one of Stripe’s biggest competitors. A close look at this curious phenomenon.

1. Industry pulse First layoffs at Apple since 1997, or not…?

Read more

Wednesday, 10. April 2024

David Kelts on ID

“Ripping IDs” Pt 2: The Unwinnable Cost Problem of Identity Verification

In a story on “Ripping IDs Should Go the Way of CDs”, I wrote about the questionable accuracy, usability, and privacy of “ripping” analog IDs into the digital space — a/k/a Mobile Know Your Customer (mKYC). Now I‘ll analyze the “ $1 cost problem” and you’ll see why important steps in mKYC are being skipped. Analog to Digital. Cost, accuracy, usability, and privacy problems of Mobile ID Verifi
In a story on “Ripping IDs Should Go the Way of CDs”, I wrote about the questionable accuracy, usability, and privacy of “ripping” analog IDs into the digital space — a/k/a Mobile Know Your Customer (mKYC). Now I‘ll analyze the “ $1 cost problem” and you’ll see why important steps in mKYC are being skipped. Analog to Digital. Cost, accuracy, usability, and privacy problems of Mobile ID Verification (image cidaas.com)

Services that require identity verification have become increasingly dependent on “scan the front, scan the back, take a selfie, match” for everything from renting a sidewalk scooter to opening bank accounts to provisioning your mobile driver’s license. The cost of inaccuracy in each situation varies greatly. The practical amount of money to spend within these use cases doesn’t — from 0 to 25 cents. Yet most services cost $1. Startups eat the difference hoping the future cost will go down (sorry, it won’t). Many businesses eliminate steps to keep costs down but still roll out a high-demand feature. Avoid offering mobile services that depend on ID and you fall behind. Do it, and accept the risk of fraudulent inputs.

There are both financial risk mitigation use cases and government identity use cases that are currently utilizing mKYC. I’d like to separate these concepts (read to the end to see why).

Note: I do not use the overloaded term document verification. I specifically split document verification into two distinct steps: document authentication (is it from a trusted issuer) and document validation (is the data on it accurate and the account behind it still valid). (See also definitions NIST 800–63). Both of these would be resolved with digital government-signed identity documents.
Cost Analysis of mKYC
Note: I invite those with more insider knowledge of the costs of IDV to connect with me to correct my educated summary of costs below.

There are at least 10 mKYC issues that negatively impact user experience, accuracy, and privacy. mKYC is a technology where we haven’t yet been able to achieve Privacy by Design Principle #4 of full functionality without zero-sum tradeoffs. It’s still cost vs. accuracy vs. user experience vs. privacy (spoiler alert: cost wins). Doing analog-to-digital, this won’t ever change.

The Document Authentication Cost Problem
Is the physical document real or fake? How much does it cost to accurately know?

The cost problem here is simply explained. In physical documents, the goal of winning the card security feature race against fraudsters was to make it expensive, in dollars and knowledge, to produce fake IDs. So cards have proprietary visual, tool-based (8x magnify, multi-spectral lighting), and forensic (destroy the card to detect) security features built into their structure, printing, and lamination. Digitally created for analog use... At great expense, including admirable IP, layered over time… but the fraudsters still caught up, able to replicate lower-cost visual features (and therefore make them acceptable by every bar bouncer on earth).

This gave rise to the supremacy of tool-based authentication for high-risk use cases — use an 8x loupe magnifier; combine white, UV, and IR lights (multi-spectral) to detect secrets built or printed into the card; purposeful design flaws, raised surfaces… amazing technology! Until you try to scan and authenticate it with a white-light phone camera held by a human hand.

Now We Expect to Go Back from Analog to Digital?

Try to ingest a secure physical card back into the digital realm and you must try to detect all three levels of physical card security features with a hand-held camera designed for action photos and portraits taken at focal lengths much further away than human arm length. There’s no UV or IR light, no optical magnification, very few steady cams, and no foolproof UX coaching for a person who isn’t aware of what’s being detected.

Do you see where this is taking us? Massive cost, diminishing return, user friction, bad experience… dropoff. What do mKYC purchasers do? Turn features off to avoid paying for document authentication and user friction.

The Document Validation Cost Problem

Once you’ve scanned an ID document, extracted the data from the front with OCR, decoded the barcode from the back, and skipped sending it for document authentication (lol), you’re still left with a few questions…

Is the document still valid? Is the identity data accurate? Are the attributes fresh or have they changed? Can I ask the authoritative source to validate them? How much does it cost me to perform that document data validation?

If you want to establish the provenance, accuracy, and freshness of the data, you need to connect to the source that mitigates your risk level.

At the first level of reliability are “credible sources”. For example, data aggregators charge dimes and quarters for the service of validating data you collected against what they algorithmically believe to be accurate. It’s due diligence on your part. Whether you consider that credible depends on your level of risk versus the cost of service.

At the higher level of reliability are “authoritative sources” — the official issuers of identity documents. Who can connect? Mostly government and law enforcement. You can pay a quarter or two per record to validate data in privacy-preserving manners — yes/no on the accuracy of what you have versus looking up someone’s identity data using an identifier.

See the problem? It costs more just to validate than businesses should logically spend on the whole identity problem. And most don’t have access.

Then there’s the “use at your own risk” level of reliability of unofficial sources (not NIST-defined, surprisingly). The problem with using unofficial sources is that all our data was breached. Assume that to be true — zero trust. Check your mailbox for legal breach notices — confirmed. It’s cheaper to buy that data from the dark web than to use either a credible or authoritative source (if available to you). Advantage fraudsters… and the reason fraudsters can pass Knowledge-Based Authentication more easily than our memories. Skip KBA as an alternative (please).

What’s the point? Eh, given the cost, you’ll skip the document validation step. And you skipped document authentication. So you’re left with having snapped an off-angle photo of that ID document so that you can crop the low-resolution face out of it (resplendent with holograms, security lines, watermarking, and reflective lamination) for biometric matching. Better set that False Accept Rate high enough to get a decent Match Score!

Or you can just produce digital documents in the first place. (See #3 below)
Cost Consolidation

From what I see, there are some alternatives for us to start to win-win the zero-sum problem we are facing.

1. Do this Once at High Cost/Accuracy and Reuse it at Lower Cost

If one analog-to-digital event can be run at a high level of identity assurance(IAL2 minimum) by an authoritative entity, and the event can be re-used, then we start to lower the repeated costs to all future entities. Perhaps then the cost-benefit ratio will make sense. Note that I said an “authoritative” entity and not a credible or trusted entity. I like banks. Banks act in trustworthy ways, perform due diligence when required, and have been willing to step to the forefront in many regions. But if you require a “government-issued ID” for your use case, bank KYC isn’t that. Bank KYC is an Identity Verification not a government signed ID.

A reusable identity solution should be a government-issued, privacy-protecting credential with user authentication and selective disclosure. You might look at mDL as this, however, there’s no standardized mechanism for user authentication yet in ISO/IEC 18013–5 or -7. Even without built-in user authentication, ingesting a digital mDL will short-circuit dozens of my list of 10 problems (pun intended), increase accuracy, and lower cost while allowing fewer fraudulent documents through. That’s why NIST wants to solve this first through the NCCoE.

2. Separate Financial Risk Mitigation from Official Identity

Why this hasn’t happened yet confuses me. Yet we still call it “identity fraud” when criminals are using freely available identity data to perform “financial fraud”.

But seriously, scooter rentals don’t need my ID document, they need to know I can or will pay. Keep going and we will realize the ACLU’s dystopian over-identification state (link will download PDF). None of us want that.

The solution Financial Services needs to look at is reusable user authentication against the KYC they or their partners already performed… without sharing identity attributes that won’t mitigate financial risk.

3. Issue Digitally-Signed Originals

Better yet, just skip the analog to digital steps. Issue and accept digital birth certificates. Issue and accept Mobile or digital ID documents. If you’re not yet accepting digital mDLs, start.

People in the (self-labeled) Self-Sovereign or Decentralized Identity spaces have created the nearly ideal architecture for issuing authoritatively signed, lightweight, digital birth certificates. So far, to my knowledge, only Singapore has jumped on this technology and is realizing the cost savings and accuracy. There’s still one privacy hole to close… whoever controls the distributed ledger can potentially see all authentication and validation requests against the ledger if they control the API layer. This is a solvable technical problem with the architectural concepts of blinding, brokerage, or anonymizing API access (rotating relying party identifiers). This is solvable with policy (link will open video) if not law.

4. Put the User/Holder in Control of their Credentials and Data

When you provide the user (aka citizen, resident, digital subject, holder) with their document in digital form, they gain control over how they share it. EU Digital Identity Wallets are making progress. More importantly, the ease of sharing means that the recipient (relying party) doesn’t need to store identity attributes to mitigate their risk. Ask, verify, dispose. This is a concept of Interactive Information Privacy(TM) that I’ll write about more in the future and that I presented at Cloud Identity Summit in 2017.

Don’t forget to put your reusable identity-proofing credential in that wallet.
Call to Action

We want to stop “ripping IDs” just like we stopped ripping CDs when digital content services became ubiquitous. Digital Identity Services are the next frontier, but everybody is trying to solve everything everywhere all at once. I laid out 4 pathways above, each with existing possible solutions and a legitimate business model to make them self-sustaining. Let’s create actionable, cost-effective solutions to our “identity fraud” problems. The architects have done decent jobs… now the product, market, and business sides should capitalize in better ways than we are currently employing.


Just a Theory

Mini Summit Three

A rough transcript of my Devrim Gündüz’s talk, “Overview of {yum,zypp}.postgresql.org, and the challenges around RPMifying extensions”, along with interspersed chat comments.

Terrific presentation and discussion at last week’s Postgres Extension Ecosystem Mini-Summit. In fact, I later learned that some people couldn’t make it because the Eventbrite event page showed it as sold out!

This is a limitation of the free level at Eventbrite, which caps tickets at 25. But this is a fully-remote event and we can host more people than that. We had over 30 last week! So if you’d like to attend one of the remaining three events and Eventbrite shows it as sold out, have a look at the bookmarks for the #extensions channel on the Postgres Slack, or email the username david at this domain and I will send you the link!

Okay, back to business. Last week, Devrim Gündüz kindly gave a super thorough presentation in his capacity as the maintainer of the Postgres community Yum and ZYpp repositories. This topic sparked lots of interesting discussion and commentary, which I detail below. But why read when you can look?

Video PDF Slides

Still reading? Okay then, here we go!

Introduction I opened the meeting and introduced Devrim Gündüz. Presentation

Thank you for the introduction. Going to talk about Yum and ZYpp dot postgresql.org, these are the challenge for us, and the challenge of making RPMs from extensions.

Work for EDB as Postgres expert, Postgres Major contributor responsible for the Postgres Yum repo. If you’re using RPMs, I’m building them. I also contribute to Rocky, SUSE, and Fedora Linux, proud Postgres community member, live in London, and I’m also doing some DJing, so if I get fired I have an extra job to do.

Agenda: Last year at PGConf.eu in Prague, I had 60 slides for 5 minutes, so expect huge number of slides for today. I want to leave some time for discussion discussion and questions.

I’m going to start with how it started back in the 2000s, talk about the contents of the repos, which distributions we t because that’s another challenge um how do we do the extension packaging how to RPM how to build RPMs of an extension and how to update an extension RPM.

Then the horror story um when what happens when there’s a new Postgres beta is out, which is due in the next next month or so for Postgres 17.

Then we are going to speak about the extension maintainers, problem for us, and then the problems about relying on the external repos.

So if you have any questions please ask as soon as you can. I may not be able to follow the chat during the presentation, but I’ll try as much as I can.

So let’s talk about how it started. When I started using Linux in 1996 um and Postgres in 1998, we always had Tom Lane and we had we had Lamar for for who build built RPMs for RedHta Upstream. So they were just building the distribution packages, nothing community. It was only specific to RedHat — not RedHat Enterprise Linux but RedHat 3-4-5 — not the non-enterprise version of RedHat back then, but it was called it Fedora core back then, the first version was released in November 2003, which was another challenge for packaging

One of the problems with the distribution packaging was that it was slightly behind the minor Postgres releases, sometimes major post releases

So that was one single Postgres version available for a given distro, say Postgres 6.4 or something, 7.0, and multiple versions were not available back then, and the minor version was slightly behind.

I started building packages for my laptop because I wanted to use Postgres and not all versions wer available back then. So I started building packages for my laptop and my server. They were based on the packaging of Tom Lane and Lamar.

Then I uploaded them to my personal server and emailed the PG mailing lists lists and said, “I’m running on own packages, use at your own risk. This is just a rebuild of the upstream packages on the RedHat version that I’m using or the Fedora version that I’m using. Up to you! This is a personal project, nothing serious.”

So then then people started downloading them, and using them. There was no signature, nothing official back then. Then Lamar said he didn’t have enough time for the RPMs. He sent an email to the mailing lists and I said Devrim is stepping up to the plate, and I did it. So that was I think in 2004, about which is about 20 years ago.

So 19 years ago we had the first domain, postgresql.rpm.org, and then we had more packages. In 2007 we had the first repository RPM and then we had yum.postgresql.org. This means that, starting 2007, this began to be the official RPM repo of the community, which which was a good thing because we could control everything under the Community.

I was living in Canada back then. We had the first RPM build farm — instead of using my laptop and my VMs — we had the actual machine back then.

In 2010 we had what was then called multiple postmaster support, which means the parallel installation of the Postgres RPMs. That was a that was a revolution because even still, the current Fedora or RedHat RPMs cannot be installed in parallel. So if you want to install post 13, 14, 15, 16, and 17 or whatever, it is available in the community repo. This is a great feature because you may want to test or do something else. This is how we how we started, 14 years ago we had this feature in the community repo.

Then next year we moved the repo to the community servers and unified the spec files. Our builds are extremely simple — like you can start your own builds in less than 10 minutes: five minutes for pulling the git repo and then a few minutes for for a package. Extremely simple builds, and now we are in 2024

Let’s talk about the contents, because we have four different contents for the repo. The first one is the “common” repo. We call it “common” in the repository config file. It has the RPMs that work across all Postgres major releases. Let’s say pg_badger, or the ODBC driver, JDBC driver, or GDAL for PostGIS. There’s lots of stuff: Python packages, which are not extensions but we like Patroni, which actually works for all Postgres releases.

This was an effort to get rid of duplicates in the Postgres repo. I think we shaved lots of gigabytes back then, and we still save a lot

Then, to address one of the topics of today’s talks, we have the “non-common” RPMs. (These are the name of the directories, by the way.) They are the RPMs which are Postgres version-specific, basically they are extensions. Say plpgsql_check 15 or whatever. Lots of extensions are there.

Then we have extras. They are not actually Postgres packages, they shouldn’t be included in our repo by default, but many people actually look for these packages because they they want to use Patroni and they don’t have the supporting RPMs or supporting RTM RPMs, or they’re not up-to-date.

I’m building a console, console-template, ETCD, HAProxy keepalived and vip-manager. They are all open source, some of them are Go packages — which, actually, I don’t build them, I just distribute the precompiled binaries via the repo. So that makes easier for people to deploy the packages.

And then we have the “non-free” repo. These are the packages that depend on closed-source software like Oracle libraries, or that have license restrictions. For example, ora2pg depends on Perl DBD::Oracle, oracle_fdw depends on Oracle, pg_storm depends on Cuda Nvidia stuff, timescaledb-tsl actually is Timescale DB with the TSL license, informix_fdw and db2_ftw.

So we have some non-free packages which actually depend on non-free stuff as well. All of them are well-maintained: I’m trying to keep everything up to date — like real up-to-date! That brings some problems but we will get there.

We support RedHat Enterprise Linux and of course Rocky Linux. This year we started supporting Alma Linux as well. Of course they are more or less identical, but we test them, install, and support to verify the packages in these three distributions.

We have x86_64 aarchm64, ppc64le, and RedHat 9, 8, and 7. We have also RedHat 6 for Postgres 12, but it’s going to be killed by the end of this year. We have Fedora, but only two major releases, which matches the Fedora lifecycle. And SUSE — my pain in the neck — um I’ll get there.

Since you all are here for extension packaging, let’s get there: what happens for extension packaging.

First of all, we have the first extension, which is the in-core extensions. They are delivered with the contrib sub-package, which matches the directory name in The Postgres tarball. There are separate packages for each major version, so postgres15-contrib, postgres13-contrib, etc. These are the installation directories for each extension. We are going to use those directories for the other [non-cre] extensions as well.

When we add a new extension to the repo, it’s going to use these directories if they have a binary or if they an extension config file, if the library or the mem files — all are going are all installed under these directories. This magic is done by PGXS, which has been there forever. We just provide initial stuff and then the rest is done by the the PGXS magic. This is the base for a lot of the core extensions.

So what happens when we do non-common and non-free package? First of all, they are distributed separately for each Postgres major version. Let’s go back to the one of the extensions, say plpgsql_check. We have a separate package for Postgres 14, a separate package for Postgres 15, 16, 13, and 12. If they build against all the supported Postgres versions, we have separate packages for each.

Of course from time to time — as far as I remember Timescale DB does this —- Timescale DB only supports Postgres 15 and 16 nowadays. So we drop support for the older versions in the Git repo. But they are separate packages; they are all installed in these directories along with the main contrib package.

This is the naming convention that I use: $extensionName_PGMajorVersion. Four or six years ago, some of the packages didn’t have an underscore before the PG major version. It was a problem, so someone complained in the hackers mailing list, and then I made the changes.

Currently all the previous and all the new packages have this naming convention except a few packages like PostGIS — because in PostGIS we have multiple versions of the same extension. let’s say we have PostGIS 3.3, 3.2, 3.4, 3.1, 3.0. We have combinations of each — I think we have PostGIS 3.0 in some of the distributions but mostly PostGIS 3.1, 3.2, 3.3, and 3.4, and then all the Postgres versions they support — A lot of builds! So there are some exceptions where we have the extension name and extension major version before the Postgres major version.

Jeremy S in chat: question: curious if you have thoughts about relocatable rpm packages. ever tried? just best to avoid?

I have a question from Jeremy. This is a very basic question to answer. This is actually forbidden by the packaging rules. The RPM packaging rules forbid you to distribute or create relocatable RPM packages. We we stick to the packaging guidelines, so this this cannot be done.

Jeremy S in chat: Thanks! (Didn’t realize packaging guidelines forbid this)

Let’s talk about how we build extensions. Often our develop package is enough: many of our packages just just rely on on Postgres itself. But of course packages like PostGIS may depend on some other packages, like GDAL, which requires lots of lots of extra dependencies as well. The most problematic one is the GIS Stack: they need EPEL on RHEL and RHEL and its derivatives.

There there has been a discussion in the past about whether should require EPEL by default. The answer is “no,” because not all not all of our users are installing, for example, the GIS stack or other packages. Most of our users — not the majority of our users —- rely on the um rely on just our repo.

On the other hand, in order to provide completeness for our users, I added lots of python packages in the past to support Patroni — because the upstream packages (I’m sorry not maybe upstream packages) were not enough. The version wasn’t enough or maybe too low. So From some time to time I add non Postgres-related packages to the repo just to support the Postgres package. In the past it was PGAdmin, but now it’s not in our repo so it’s not a problem: their upstream is building their own RPMs, which is a good thing for us. We are building extra packages mostly for Patroni.

However, this is a potential problem for some enterprise users because large companies don’t even want to use the EPEL repo because they feel like it’s like it’s not a community repo, but a community repo controlled by Fedora and RedHat. That’s why from time to time I try to add some of the packages to our repo.

If it’s a problem for enterprise users, does it mean we should we maintain tons of extra packages in the EPEL repo for the GIS stack? The answer is “no, definitely no”. Not just because of the human power that we need to maintain those those packages — I mean rebuilding them is easy: I just get the source RPM, commit the spec files into our repo, and rebuild them. But maintaining them is something else.

We will have a similar problem when we release our own ICU package in the next few years. Because, now that we have the in core collation — but just for C Locale —- and people are trying to get rid of glibc, maybe we should have an ICU package. But maintaining an ICU Library across a single Postgres major version is a real big challenge that I don’t know how to solve for now, at least.

And then SLES — my pain in the neck — the GIS stack requires lots of repos on SLES 15. They are they are well documented on on our website.

Fedora is safe because Fedora is Fedora, everything is there, it’s easy to get a package there.

Jeremy S in chat: “Building them is easy. Maintaining them is something else.”

Yeah that’s that’s the problem, Jeremy.

So how do you RPMify an extension?

The first thing is to know that the extension exists. This is one of the big problems between developers and users. The developer creates a useful extension and then they don’t create a tarball, they don’t release anything. They expect people to install Git on their production databases and git pull, install make, gcc, all the development libraries and build a binary, blah, blah blah.

I’m sorry that’ss not going to work. It also doesn’t work for pip — pip is not a package manager, it just destroys your server. It downloads things to random places and then everything’s gone. That’s why I added lots of Python packages to support Patroni, because most of the users use the packaging package manager to install Postgres and other packages to their servers. It’s the same for Debian, Ubuntu, for RPMs, for Windows, for macOS.

So first of all we know have to know that the extension exists and we have to have a tallbal. If I see an extension that seems good enough I’ll get there. PGXN is a good place, because when I go to pgxn.org a few times per day and see if there is a new version of an extension or if there’s new extension, it’s a good piece. But there’s a problem: we have hundreds of extensions — maybe thousands — but not all of them are on PGXN. They should be!

David Wheeler (he/him) in chat: You should follow @pgxn@botsin.space for new releases :-)

nils in chat: pg_xz incoming

I don’t know how to solve this problem, but we should expect every extension to announce their extensions on PGXN. I’m not just talking about installing everything through PGXN, but at least have an entry that there’s a new extension, this is repo, the website, the readme and the is a tarball. It doesn’t have to be on PGXN, as long as we have something.

And then I check the version. If there is an extension that will kill your database and the version is 0.001, that’s not going to be added to the repo, because we don’t want to distribute an experimental feature.

David Wheeler (he/him) in chat: LOL, all my extensions start at 0.0.1

David G. Johnston in chat: If it isn’t on PGXN it doesn’t exist.

Another problem is that lots of people write extensions but some of them are just garbage. I’m sorry but that’s the truth. I mean they just release a version and then do nothing.

David Wheeler (he/him) in chat: LOLOLOL

From the chat, “pgxn_xz is coming”: that’s right! We have blackhole_fdw, which was written by Andrew Dunstan. When you create blackhole_fdw, it throws all of your data into black hole, so everything is gone.

Yeah, “if it’s not on PGXN it doesn’t exist,” that’s what I hope we achieve achieve in the next year or so.

Jimmy Angelakos in chat, replying to “If it isn’t on PGXN …”: I would correct that to “If it isn’t on PGXN it isn’t maintained.” Sometimes even ON PGXN…

Yeah Jimmy, that’s one of the big problems that we have: maintenance.

We create the spec file, just copy one of the existing ones and start editing. It’s easy but sometimes we have to add patches. We used to carry patches for each Postgres major version to change the Makefiles for the specific Postgres major version. But I realized that it was [not a great pattern]. Now we just export the path, which fixes the problem.

David G. Johnston in chat: As a policy though, someone who cares and wants responsibility needs to apply to be part of the RPM community.

Then I initiate a scratch build for any missing requirements. If there are any missing build requirements it fails to build. I only do it on Fedora latest, not for every package because it doesn’t always help because some distros may not have the missing dependency

Alvaro Hernandez in chat: Hi everybody!

David G. Johnston in chat: Delegated to PGXN for that directory.

Let’s say we rely on some really good feature that comes with a latest version of something, but that latest version may not appear in RedHat 7 or 8. So some dist dros may have it, but the version may be lower than required. Or some distros may have the dependency under different name. Now in the spec file we have “if SUSE then this” and “if RedHat then this” “if Fedora then”, “if RedHat nine then this”, etc. That’s okay, it’s expected. As long as we have the package, I don’t care.

Then I push it to the Git repo, which I use not just for the spec files and patches, but also for carrying the spec files and pitches to the build instances.

Jorge in chat: How to handle extension versioning properly? I mean, in the control file the version could be anything, like for ex. citus default_version = ‘12.2-1’ where the “published version” is v12.1.2, then the “default version” could remain there forever.

Also have seen in the wild extensions that the control file have a version 0.1 (forever) and the “released” version is 1.5

If something fails I go back to the drawing board. GCC may fail (gcc 14 has been released on Fedora 40 and is causing lots of issues for for packaging nowadays), it could be cmake — too recent or too old. It could be LLVM — LLVM18 is a problem for Postgres nowadays. I either try to fix it ping upstream. I often ping upstream because the issue must be fixed anyway

If everything is okay, just push the packages to the repo.

Ruohang Feng (Vonng) in chat: question: how about adding some good extensions written in Rust/pgrx to the repo? like pgml, pg_bm25, pg_analytics, pg_graphql….

One issues is that there is no proper announcement. Maybe I have an awesome extension available in the Postgres repo that people crave and, we build the extensions, it took a lot of time (thank you Jimmy, he helped me a lot) and then I didn’t actually announce it that much. On the other hand, people just can use PG stat base [?] to install and start using it in a few seconds. This is something that we should improve.

Steven Miller in chat: How to handle system dependencies like libc versions being updated on the target system? Do extensions need to be complied against exactly a specific libc version?

From Nevzat in chat: how can we make sure bugfix or extension is safe before installing it

vrmiguel in chat: Interesting approach to identify build/runtime requirements

Over at Tembo we have a project called trunk-packager which attempts to automatically create .deb packages given the extension’s shared object.

We try to identify the dynamic libs the extension requires by parsing its ELF and then trying to map the required .so to the Debian package that provides it, saving this info in the .deb’s control file

From the chat: How to handle extension versions properly? That’s a good thing but, extension version and the release version don’t have to match. Thr extension version isn’t the same thing as the release version. It’s the version of the SQL file or the functions or the tables, the views, sort procedures, or whatever. If it’s 0.1 it means it’s 0.1 it means nothing nothing has changed in this specific regarding the control file. They they may bump up the package version because they may add new features, but if they don’t add new features to the SQL file, then they don’t update the extensions. I hope that answers your question George

I have another question from Ruohang. Yaaaaay! I was afraid that someone would ask that one. We have no extensions written in Rust in repo so far. It’s not like Go; there is a ban against Go because we don’t want to download the world, all the internet just to build an extension. If I recall correctly they’re rewriting pg_anonymizer in Rust. They will let me know when they release it or they’re ready to release it, and then I’ll build it. It’s not something I don’t like, it just hasn’t happened.

Keith Fiske in chat: I still update the control file with my extensions even if it’s only a library change. Makes it easier to know what version is installed from within the database, not just looking at the package version (which may not be accessible)

Ruohang Feng (Vonng) inchat: question: How to handle RPM and extension name conflicts, e.g., Hydra’s columnar and Citus’s columnar.

*David Wheeler (he/him) in chat, replying to “I still update the c…” Yeah I’ve been shifting to this pattern, it’s too confusing otherwise

If you think there are good extensions like these, just create a ticket on redmine.postgresql.org. I’m happy to take a look as long as I know them. That’s one of the problems: I have never heard about pg_analytics or pgml, because they’re not on PGXN. Or maybe they are. This is something that we should improve in the next few months.

Jimmy Angelakos in chat: Go is a pretty terrible ecosystem. It has its own package manager in the language, so it’s kinda incompatible with distro packages

Jason Petersen in chat: (but that means a build is safe within a single release, it doesn’t mean you can move a built extension from one Fedora version to another, right?)

David Wheeler (he/him) in chat, replying to “How to handle system…”: Libc is stable in each major version of the OS, and there are separate RPMs for each.

Another question from Steven: how to handle system dependencies like libc version updates. The answer is no. It’s mostly because they don’t update the libc major version across the across across the lifetime of the of the release. So we don’t need to rebuild the extension against libc.

Steven Miller in chat, replying to “How to handle system…”: Ok I see, TY

Jason Petersen in chat, replying to “How to handle system…”: Is that how we deploy in trunk, though?

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Huh? You can build binaries in a sandbox and then you just need the binary in your package.

[Addressing Nevzat’s question]: That’s a great question. It’s up to you! It’s no different than installing Postges or any other thing. I just build RPMs. If you’re reading the hackers mailing list nowadays, people rely on me an Christoph and others, so that we don’t inject any code into the RPMs or Debian packages. You just need to trust us not to add extra code to the packages. But if there’s a feature problem or any bug then you should complain upstream, not to us. so you should just test.

Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: Find me one person that does this.

Steven Miller in chat, replying to “How to handle system…”: We don’t have OS version as one of the dimensions of version packaging but should

[Addressing vrmiguel’s comment]: Yeah, that could be done but like I don’t like complex things, that’s why I’m an RPM packager.

Jason Petersen in chat, replying to “Go is a pretty terri…”: (doesn’t go statically link binaries, or did they drop that philosophy?)

vrmiguel in chat: I think citus has changed it to citus_columnar

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Hugo: https://github.com/hugomods/docker/tree/main/docker/hugo

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Jason: Static except for libc, yes

Another question from Ruohang: uh how to handle RPM and extension name conflicts. I think Citus came first, so you should complain to Hydra and ask them to change the name. They shouldn’t be identical. We have something similar with Pgpool: they they are conflicting with the PCP Library ,which has been in the Linux for the last 25 years. I think Pgpool has to change their name.

Jeremy S in chat, replying to “I still update the c…”: So you think people will run the “alter extension upgrade” eh?

[Addressing Keith Fiske’s comment]: I’m not saying I don’t agree with you, but it means every time I have to update my extension version in my running database — it’s some extra work but that’s okay. It’s the user problem, not my problem.

Question from Jason [on moving an extension from one Fedora to another]: Right, it may not be safe because the GCC version may be different and other stuff may be different. One distro to another is not safe, Jason; sorry about that.

[Back to Steven’s question]: Yes, David’s answer is right.

[Addressing vrmiguel’s comment about citus_columnar]: You are right.

Jimmy I’m not going to read your comment about go because I don’t think think you can swear enough here.

vrmiguel in chat, replying to “Go is a pretty terri…”: Are there known Postgres extensions written in Go? Not sure how Go is relevant here

Jason Petersen in chat: you said “gcc” and not “libc” there, are you implying that things like compiler versions and flags need to be identical between postgres and extensions

Keith Fiske in chat, replying to “I still update the c…”: I think they should …

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Were some experiments a few years ago. https://pkg.go.dev/github.com/microo8/plgo

Let me continue now. First you have to know the extension exists, and then the you also need to know that the extension has an update. Unfortunately the same problem: the extension exists or has an update and they just don’t let us know.

Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: @vrmiguel now you know why :D

This is a big challenge Fedora has in house solution.When you add a new package to Fedora, I think they crawl their repo once a day and if there’s new release they create a ticket in their bug tracker automatically, so that the maintainer knows there’s a new version. This can be done, but would need a volunteer to do it. Orr maybe the easiest thing is just add everything to the to PGXN,

When we update an extension we, have to make sure it doesn’t break anything. It requires some testing. As I said earlier, building is one thing, maintaining the extension is a bigger thing. If you want to raise a baby, you are responsible until until the end of your life. Consider this like your baby: either just let us know if you can’t maintain an extension anymore or please respond to the tickets that I open.

Steven Miller in chat: One other detail about compatibility dimensions. We have noticed some extensions can be complied with chip-specific instructions like AVX512, for example vector does this which optimizes performance in some cases

Alvaro Hernandez in chat, replying to “you said “gcc” and n…”: I’d reverse the question: do we have strong guarantees that there are no risks if versions and/or flags may differ?

I believe extensions are already risky in several ways, and we should diminish any other risks, like packaging ones.

So I’d say absolutely yes, compile extensions and Postgres in exactly the same way, versions and environments.

Sometimes a new minor version of an extension breaks a previous Postgres release. For example, an extension drops support for Postgres 12 even though Postgres 12 is still supported. Or they didn’t do the upgrade path work. I have to make sure everything is safe.

nils in chat, rReplying to “I think citus has ch…”: It was never changed, the extension has always either been embedded in Citus or later moved to a separate extension called citus_columner.

I think the name conflict comes from the access method being called columnar, which Citus claimed first. (Hydra’s started actually as a fork from Citus’ codebase).

(disclaimer; I work on Citus and its ecosystem)

Jason Petersen in chat, replying to “I think citus has ch…”: hi nils

Next month a new beta comes out. Everyone is happy, let’s start testing new features. For the packagers that means it’s time to start building extensions against beta-1. So a build might fail, we fix it, and then it may fail against beta-2. I understand if extension authors may want to wait until rc-1. That’s acceptable as long as they let us know. Many of them fail, and then Christoph and I create tickets against them and display them on wiki.postgresql.org. It’s a Hall of Shame!

Eric in chat: When you decide to package a new extension do you coordinate with upstream to make that decision?

David Wheeler (he/him) in chat, replying to “When you decide to p…”: I learned I had extensions in the yum repo only after the fact

Eric in chat, replying to “When you decide to p…”: I see

vrmiguel in chat: @Devrim Gündüz I’m curious about how RPM deals with extensions that depend on other Pg extensions

David Wheeler (he/him) in chat: You can test Postgres 17 in the pgxn-tools docker image today. Example: https://github.com/theory/pgtap/actions/runs/8502825052

This list pisses off the extension authors because they don’t respond to ticket. So what do we do next? It happens again and again and again, because they just don’t respond to us. On Monday uh I got a response from an extension maintainer. He said “you are talking like you are my boss!” I said, “I’m talking like I’m your user, I’m sorry. I just asked for a very specific thing.”

nils in chat: I love Devrim’s issues against our repo’s! They are like clockwork, every year 😄

David Wheeler (he/him) in chat, replying to “You can test Postgre…”: It relies on the community apt repo

Eric in chat, replying to “When you decide to p…”: Related: ever had upstream request you stop packaging an extension?

Steven Miller* in chat, replying to “One other detail abo…”: Even if compiled inside a container, on a specific chip it can get chip-specific instructions inside the binary. For example building vector on linux/amd64 inside docker on a chip with AVX512, the container image will not work on another linux/amd64 system that does not have AVX512

David Wheeler (he/him) in chat: :boss:

Unresponsive maintainers are a challenge: they don’t respond to tickets, or emails, or they don’t update the extensions for recent Postgres versions.

Don’t get me wrong even the big companies also do this, or they don’t update the extensions for the new GCC versions. I don’t expect them to test everything against all all the GCC versions; that’s that’s my problem. But just respond please.

What’s the responsibility of the packager in this case? Should we fork if they don’t respond at all? No we are not forking it! VBut going to conferences helps, because if the extension author is there I can talk to them in person in a quiet place, in a good way, just “please update the package tomorrow or you’re going to die”. Of course not this but you see what I mean.

[Looking at chat]: I’m going to skip any word about containers; sorry about that.

[Addressing Eric’s question]: That’s a good so so the question! No, actually they support us a lot, because that’s the way that people use their extensions. And do we coordinate with upstream? No, I coordinate with myself and try to build it. Of course upstream just can just create a ticket, send me email, or find me at a conference. They can say, “hey, we have an extension, could you package an RPM?” Sure, why not." I don’t coordinate with Upstream as long as uh there is no problem with the builds.

Eric in chat, replying to “When you decide to p…”: So you haven’t run into a situation where upstream did not want/appreciate you packaging for them?

[Respondinding to nils’s comment]: Thank you, thanks for responding!

[Responding to vrmiguel’s question about depending on other extensions]: We actually add dependency to that one. That’s bit of uh work, like PG rotting depends on PostGIS. In order to provide a seamless installation the PostGIS package, in the PostGIS spec file, I add an extra line that says it provides PostGiS without the version as part of the name. Then when we install pg rotting, it looks for any PostGIS package — which is fine because it can run against any PostGIS version. So I add the dependency to other extensions if we need them.

David G. Johnston in chat: The tooling ideally would report, say to PGXN or whatever the directory location for the initial application is, the issues and remind them that if the build system cannot build their extension it will not be included in the final RPM. You are an unpaid service provider for them and if they don’t meet their obligations their don’t get the benefit of the service.

[Responding to Eric’s upstream follow-up question]: I haven’t seen anything in any upstream where a person didn’t want me to package. But I haven’t seen many appreciations, either; I mean they don’t appreciate you. I’m being appreciated by EDB — money, money, money, must be funny — thanks EDB! But I haven’t had any rejections so far. Good question!

Eric in chat, replying to “When you decide to p…”: Fair. Cool. Thanks

Relying on external repos is a big problem for SUSE. Some of the maintainers just discontinue their repo. One problem with SUSE is they don’t have an EPEL-like repo. EPEL is a great thing. The barrier to add a package to EPEL is not low but not high, either. If you if you’re an advanced packager you can add a package quick enough. Of course it requires review from others. But this a big problem for SUSE.

Lack of maintenance is a problem. We have a repo but they don’t update it; so I have to go find another repo from build.opensuse.org, change it, update the website, change our build instance, etc. That’s a big problem.

David Wheeler (he/him) in chat, replying to “The tooling ideally …”: I want to add build and test success/fail matrices to extension pages on PGXN

Florents Tselai in chat: How do you handle Pl/Python-based extensions + pip dependencies? Especially with virtualenv-based installations. i.e. Pl/Python usually relies on a /usr/bin/python3, but people shouldn’t install dependencies there.

And then there’s costs! What’s the cost of RPMifying an extension? Hosting a build server? We have a very beefy bare metal build server hosted by Enterprise DB, just because I’m working for them and they have a spare machine. Hosting a build server is a cost.

I have to use some external resources for architecture reasons, like some of our build instances, like PPC 64 ,is hosted somewhere else. There are some admin tasks to keep everything and running, like EDB’s IT team actually helped me to fix an issue today in both of our PPC instances.

Jason Petersen in chat, replying to “How do you handle Pl…”: I think early on he said he makes the extensions rely on RPMs that provide those Python dependencies

David Wheeler (he/him) in chat, replying to “How do you handle Pl…”: I have used this pattern for RPMifying Perl tools

Then, maintaining build instances requires keeping them up-to-date, and also that each update doesn’t break anything. It’s not like “dnf update and build a package”. No. It may be a problem with Fedora because Fedora may can update anything any time they want. But it’s a less problem for SUSE and RedHat, but we have to take care that the updates don’t break anything.

Redhat, the company, actually follows our release schedule. We release every three months. Unless something bad happens, we know the next release is in May, on a Thursday. So every Wednesday, one day before our minor release, RedHat releases their new maintenance releases. RedHat is going to release 9.4 on Wednesday before our minor release. What does that mean for us as an RPM packager for RedHat?

*RedHat releases a new version with a new LLVM, for example, and then it means we have to rebuild the packages against the new LLVM so that people can use it. That means I have to work until Thursday morning to build the packages. That’s fine but another problem is for Rocky and Alma Linux users, because they’re are not going to have the updated LLVM package, or any any updated package, like GCC. It’s not like the old RedHat days; they change everything uh in minor versions.

So I have to rebuild GCC and LLVM on our instances, add them to our special repo “sysupdates”, which is in the config file, and this takes many hours because building GCC and LLVM is a big thing.

In the last two years I have not been able to build the from GCC Source RPM. I had to edit everything and not edit the spec files blah blah to be able to build it. I have no idea how how they can break in Source RPM.

So that’s another cost: in May I’m going to spend lots of cycles to keep up with the latest RedHat release, and also make the make the Rocky Linux and Alma Linux users happier. Maintaining build systems is not as easy as running Yup or Zypper update. It requires employing the packager — because I have the bills pay I have the beers to drink.

[Addressing Florents’s PL/Python question]: I don’t know what the PL/Python based extensions are, but I tried to get rid of everything related to pip. I’m not a developer, a DBA isn’t a developer, a Sysadmin isn’t a developer. They’re not suposed to use pip; they are supposed to use the package manager to keep up with everything. My point is if someone needs pip then I should fix it. That’s what I did for Patroni. I added lots of packages to our Git repo just to be able to support Patroni.

Ian Stanton in chat: Need to drop, thank you Devrim!

Jeremy S in chat, replying to “How do you handle Pl…”: A lot of larger companies have inventory management and risk control processes that heavily leverage package management

Alvaro Hernandez in chat: Need to go, ttyl!

vrmiguel in chat, replying to “you said “gcc” and n…”: Do you think there are no guarantees at all? For instance, Postgres loads up the extension with dlopen, which could fail with version mismatch. If that doesn’t occur and the extension loads ‘fine’, how likely do you think an issue could be?

Also I’m curious how often you’ve seen problems arise from libc itself (rather than any of the many things that could cause UB in a C program) and how these problems have manifested

Ahmet Melih Başbuğ in chat: Thank you

Conclusion

I thanked Devrim and all the discussion, and pitched the next mini-summit, where I think Jonathan Katz will talk about the TLE vision and specifics.

Thank you all for coming!

More about… Postgres Devrim Gündüz PGXN Extensions PGConf Summit Yum ZYpp

Webistemology - John Wunderlich

Eclipse Photos

xkcd nails it again.

xkcd nails it again.

Tuesday, 09. April 2024

The Pragmatic Engineer

What Changed in 50 Years of Computing: Part 2

How has the classic book on software engineering, ‘The Mythical Man-Month,’ aged with time, and how have architecture approaches and tech org structures changed in half a century?

‘The Mythical Man-Month’ by Frederick P. Brooks, Jr. is a classic in the software industry. First published in 1975, it’s almost 50 years old. Brooks was the project manager of one of the most complex software projects in the world at the time: the IBM System/360 operating system. He published this book based on his personal experience of spending several years building it, and leading several hundred programmers.

I’ve been making my way through this book from the dawn of software, to see which predictions the book gets right or wrong, and what’s different about engineering today - and which things just never change. In Part 1 of this series, we covered chapters 1-3. In this article, we cover chapters 4-7:

Evolution of architecture approaches: The outdated separation of “architect” and implementer, the dated “technical manual” concept, software design and the “second-system effect,” & telephone logs. 

Architecture approaches: what’s the same. A well-architected system balances simplicity and functionality, and it’s still hard to do. Processes are still needed for proposing and discussing architecture changes for mature products and larger teams.

Communication challenges on large projects: then and now. Surprisingly, communication best practices have changed little in 50 years; mixing informal, formal, and written forms for efficient projects.

Ideal structure of tech orgs. The tree structure was the most favored for tech organizations in the 1970s, and it still is the most popular choice today. Are simplicity and straightforwardness the reason why?

1. Evolution of architecture approaches

When it comes to how we design software, a lot has changed in 50 years:

Read more


Jon Udell

Code in Context: How AI Can Help Improve Our Documentation

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/code-in-context-how-ai-can-help-improve-our-documentation/. Writing documentation from scratch is as uncommon as writing code from scratch. More typically, you’re updating or expanding or refactoring existing docs. My expectation was that an LLM-powered tool primed with both code and documentation could provide

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/code-in-context-how-ai-can-help-improve-our-documentation/.

Writing documentation from scratch is as uncommon as writing code from scratch. More typically, you’re updating or expanding or refactoring existing docs. My expectation was that an LLM-powered tool primed with both code and documentation could provide a powerful assist, and Unblocked did.

I don’t know how to measure the boost it gave me. But I do know that I’ll never again want to undertake this kind of project without a tool that can help me assemble the necessary context.

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

18 Using AI to Improve Bad Business Writing

Monday, 08. April 2024

Doc Searls Weblog

Aviation vs. Eclipse

Contrails form behind jet aircraft flying through the stratosphere. Since high-altitude aviation is happening all around the earth more or less constantly, planes are painting the sky everywhere. (Here is one time-lapse. And another. And one of my own.) Many contrails don’t last, of course, but many do, and together they account for much of […]
Contrails in the stratosphere, smearing sideways into broad cloud cover.  This view is toward the place in the sky where a full solar eclipse will happen a few hours later.

Contrails form behind jet aircraft flying through the stratosphere. Since high-altitude aviation is happening all around the earth more or less constantly, planes are painting the sky everywhere. (Here is one time-lapse. And another. And one of my own.)

Many contrails don’t last, of course, but many do, and together they account for much of the cloud cover we see every day. The altocumulus, altostratus, and cirrus clouds that contrails produce are now officially recognized as homogenitus and homomutatus, which are anthropogenic: owing to human activity.

And today, Eclipse Day, Delta is offering to fly you along the path of totality. Others too? I don’t know. I’m taking a few moments to write this before we walk up to our hilltop cemetery to watch the eclipse for over four minutes, thanks to our lucky location near the very center of Totality.

I’m curious to see and hear contrail reports from others now awaiting their few minutes out of the sun.

1:14pm—The moon’s shadow made landfall in Mexico a short time ago. Here in Bloomington, the sky is well-painted by contrails. Mostly it looks like high-altitude haze, but believe me: if it weren’t for commercial aviation, the sky would be solid blue. Because the contrails today are quickly smeared sideways, losing their form but not their color.

5:00pm—Contrails were aplenty, and a spread-out contrail did slide in front of the sun and the moon…

but it was still a spectacular sight:


Damien Bod

BFF secured ASP.NET Core application using downstream API and an OAuth client credentials JWT

This article shows how to implement a web application using backend for frontend security architecture for authentication and consumes data from a downstream API protected using a JWT access token which can only be accessed using an app-to-app access token. The access token is acquired using the OAuth2 client credentials flow and the API does […]

This article shows how to implement a web application using backend for frontend security architecture for authentication and consumes data from a downstream API protected using a JWT access token which can only be accessed using an app-to-app access token. The access token is acquired using the OAuth2 client credentials flow and the API does not accept user access tokens from the UI application. OpenIddict is used as the OpenID Connect server. The STS provides both the OAuth2 client and the OpenID Connect client as well as the scope definitions.

Code: https://github.com/damienbod/bff-aspnetcore-angular-downstream-api

The BFF web application is implemented using ASP.NET Core and Angular as the UI tech stack. The Angular part of the web application can only use the ASP.NET Core APIs and secure same site cookies are used to protect the access. The whole application is authenticated using an OpenID Connect confidential code flow client (PKCE). If the web application requires data from the downstream API, a second OAuth client credentials flow is used to acquire the access token. The downstream API does not accept the user delegated access tokens from the UI application.

BFF OIDC code flow client

Implementing the OpenID Connect confidential client is really simple in ASP.NET Core. The AddAuthentication method is used with cookies and OpenID Connect. The cookies are used to store the session and the OpenID Connect is used for the challenge. All server rendered applications are setup like this with small changes required for the OIDC challenge. Due to these small differences, the different OIDC implementations provide specific implementations of the client. These are normally focused and optimized for the specific OIDC servers and do not work good with other OIDC servers. Once you use more than one OIDC server or require multiple clients from the same OIDC server, the client wrappers cause problems and you should revert back to the standards.

var stsServer = configuration["OpenIDConnectSettings:Authority"]; services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie() .AddOpenIdConnect(options => { configuration.GetSection("OpenIDConnectSettings").Bind(options); options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = "name" }; });

Yarp Proxy

The Angular UI can only request data from the ASP.NET Core backend using secure http only cookies. The Angular UI is deployed as part of the ASP.NET Core application in production builds. When creating applications, software developers need to use their preferred tools and YARP is used to support this in the development setup. As a further downstream API is used, YARP can also be used to support this. The proxy takes the API request, validates the cookie, uses another access token and forwards the request to the downstream API. YARP has an ITransformProvider interface which is used to implement this. This also means we have two different YARP configuration setups for development and deployments. (test, integration, production).

using System.Net.Http.Headers; using Yarp.ReverseProxy.Transforms; using Yarp.ReverseProxy.Transforms.Builder; namespace BffOpenIddict.Server.ApiClient; public class JwtTransformProvider : ITransformProvider { private readonly ApiTokenCacheClient _apiTokenClient; public JwtTransformProvider(ApiTokenCacheClient apiTokenClient) { _apiTokenClient = apiTokenClient; } public void Apply(TransformBuilderContext context) { if (context.Route.RouteId == "downstreamapiroute") { context.AddRequestTransform(async transformContext => { var access_token = await _apiTokenClient.GetApiToken( "CC", "dataEventRecords", "cc_secret"); transformContext.ProxyRequest.Headers.Authorization = new AuthenticationHeaderValue("Bearer", access_token); }); } } public void ValidateCluster(TransformClusterValidationContext context) { } public void ValidateRoute(TransformRouteValidationContext context) { } }

The AddReverseProxy is used to add the YARP services.

builder.Services.AddReverseProxy() .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy")) .AddTransforms<JwtTransformProvider>();

And the middleware:

app.MapReverseProxy(); API client credentials client

The YARP proxy uses the OAuth client credentials client to get an access token to access the downstream API. The token is stored in a cache and only rotated when it expires or is missing. The app-to-app security has nothing to do with the delegated client from the web application.

using IdentityModel.Client; using Microsoft.Extensions.Caching.Distributed; namespace BffOpenIddict.Server.ApiClient; public class ApiTokenCacheClient { private readonly ILogger<ApiTokenCacheClient> _logger; private readonly HttpClient _httpClient; private static readonly object _lock = new(); private readonly IDistributedCache _cache; private readonly IConfiguration _configuration; private const int cacheExpirationInDays = 1; private class AccessTokenItem { public string AccessToken { get; set; } = string.Empty; public DateTime ExpiresIn { get; set; } } public ApiTokenCacheClient( IHttpClientFactory httpClientFactory, ILoggerFactory loggerFactory, IConfiguration configuration, IDistributedCache cache) { _httpClient = httpClientFactory.CreateClient(); _logger = loggerFactory.CreateLogger<ApiTokenCacheClient>(); _cache = cache; _configuration = configuration; } public async Task<string> GetApiToken(string api_name, string api_scope, string secret) { var accessToken = GetFromCache(api_name); if (accessToken != null) { if (accessToken.ExpiresIn > DateTime.UtcNow) { return accessToken.AccessToken; } else { // remove => NOT Needed for this cache type } } _logger.LogDebug("GetApiToken new from STS for {api_name}", api_name); // add var newAccessToken = await GetApiTokenInternal(api_name, api_scope, secret); AddToCache(api_name, newAccessToken); return newAccessToken.AccessToken; } private async Task<AccessTokenItem> GetApiTokenInternal(string api_name, string api_scope, string secret) { try { var disco = await HttpClientDiscoveryExtensions.GetDiscoveryDocumentAsync( _httpClient, _configuration["OpenIDConnectSettings:Authority"]); if (disco.IsError) { _logger.LogError("disco error Status code: {discoIsError}, Error: {discoError}", disco.IsError, disco.IsError); throw new ApplicationException($"Status code: {disco.IsError}, Error: {disco.Error}"); } var tokenResponse = await HttpClientTokenRequestExtensions.RequestClientCredentialsTokenAsync(_httpClient, new ClientCredentialsTokenRequest { Scope = api_scope, ClientSecret = secret, Address = disco.TokenEndpoint, ClientId = api_name }); if (tokenResponse.IsError || tokenResponse.AccessToken == null) { _logger.LogError("tokenResponse.IsError Status code: {tokenResponseIsError}, Error: {tokenResponseError}", tokenResponse.IsError, tokenResponse.Error); throw new ApplicationException($"Status code: {tokenResponse.IsError}, Error: {tokenResponse.Error}"); } return new AccessTokenItem { ExpiresIn = DateTime.UtcNow.AddSeconds(tokenResponse.ExpiresIn), AccessToken = tokenResponse.AccessToken }; } catch (Exception e) { _logger.LogError("Exception {e}", e); throw new ApplicationException($"Exception {e}"); } } private void AddToCache(string key, AccessTokenItem accessTokenItem) { var options = new DistributedCacheEntryOptions() .SetSlidingExpiration(TimeSpan.FromDays(cacheExpirationInDays)); lock (_lock) { _cache.SetString(key, System.Text.Json.JsonSerializer.Serialize(accessTokenItem), options); } } private AccessTokenItem? GetFromCache(string key) { var item = _cache.GetString(key); if (item != null) { return System.Text.Json.JsonSerializer.Deserialize<AccessTokenItem>(item); } return null; } }

Downstream API

The downstream API is protected using JWT access tokens. This is setup using the AddJwtBearer method. The scope and other claims should also be validated.

services.AddAuthentication() .AddJwtBearer("Bearer", options => { options.Audience = "rs_dataEventRecordsApi"; options.Authority = "https://localhost:44318/"; options.TokenValidationParameters = new TokenValidationParameters { ValidateIssuer = true, ValidateAudience = true, ValidateIssuerSigningKey = true, ValidAudiences = ["rs_dataEventRecordsApi"], ValidIssuers = ["https://localhost:44318/"], }; }); Notes

This setup can be used for all server rendered applications. You should always use an external identity provider in enterprise setups and never roll out your own identity system as this is expensive to maintain and many enterprise environments no longer accept this due to the extra operation costs. Replacing Angular with react, Vue.js, Svelte or Blazor WASM does not require changes to the authentication. The different UI have differences on how the scripts are loaded or used and some require weaker session security setups.

You should also avoid downstream APIs if not required. Modular monoliths have performance advantages.

Links

https://github.com/damienbod/bff-aspnetcore-angular

https://learn.microsoft.com/en-us/aspnet/core/introduction-to-aspnet-core

https://nx.dev/getting-started/intro

https://github.com/isolutionsag/aspnet-react-bff-proxy-example

https://github.com/openiddict

https://github.com/damienbod/bff-auth0-aspnetcore-angular

https://github.com/damienbod/bff-azureadb2c-aspnetcore-angular

https://github.com/damienbod/bff-aspnetcore-vuejs

https://github.com/damienbod/bff-MicrosoftEntraExternalID-aspnetcore-angular

https://microsoft.github.io/reverse-proxy/articles/transforms.html

https://github.com/microsoft/reverse-proxy/tree/main/samples/ReverseProxy.Transforms.Sample

Sunday, 07. April 2024

Wrench in the Gears

Arkansas Bound – New York Trip Quick Summary With Photos

Well, our house sold, and it looks like my next journey is about to begin. I decided to leave town and let the realtor do his thing. On April Fool’s Day we had multiple offers. Who knew that vintage charm held such sway in a world where the incessant drum beat of emerging tech tries [...]

Well, our house sold, and it looks like my next journey is about to begin. I decided to leave town and let the realtor do his thing. On April Fool’s Day we had multiple offers. Who knew that vintage charm held such sway in a world where the incessant drum beat of emerging tech tries to drown out everything else? The process was about as quick and painless as putting one’s life on display with money attached can be. Please know how much I appreciate all of you who have been sending good thoughts and prayers my way. It’s not an easy transition, but the long goodbye is coming to a close. I’ll be taking many boxes of books down to Arkansas by U-Haul next week – priorities, right? Even though the house sale went faster than I ever expected, I continued my planned trip through Syracuse, Oneida, Palmyra, Rochester, Burlington, and Montpelier. It was cold and rainy interspersed with sleet and snow, which made the stretch through the Adirondacks even more picturesque. I’m going to imagine it as a prolonged cleanse of my soul in preparation for the road ahead.

I did a brief overview video tonight, because I was afraid of not having time to do a more polished presentation. For the next month my focus has to be on clearing out the house and moving everything into storage. I did forget to mention my first stop at SUNY Binghamton on the ride up, the stomping grounds for Jeffrey Yass where he started his gambling habit. I’m going to drop some photos in this post, so you can see highlights from the trip so far. These aren’t my best hearts, because materials were limited, the weather was challenging, and the wind wanted to blow everything all over the place. Still, my good intentions were there even if my cold fingers and water drops on my glasses made the presentation less than polished. I do plan to stop at Ben and Jerry’s tomorrow – one of the first benefit corporations.

 

 

SUNY Binghamton, where Jeffrey Yass of Susquehanna International Group refined his gambling technique.

Oneida Community Mansion, Bible Communism, free love, divination, and eugenics in the context of corporate morality, with some amazing trees!

 

Cumorah and Joseph Smith’s Sacred Grove in Palmyra, NY –

Strong Memorial Hospital was the first “Flexner” model hospital, underwritten George Eastman of Eastman Kodak. The hospital was used by the University of Rochester for nonconsensual radiation exposure experiments during the Manhattan Project. It also had one of the first integrated psychiatric units. The Terrence Building, now abandoned, was one of the largest state mental hospitals in the United states from the 1950s through the 1990s, and had over 1,000 beds.

Rochester Institute of Technology’s Magic Spell Lab instructing in immersive video game design, with sculpture installations outside.

Obelisk to the Fox Sisters, founders of the American spiritualist movement. Paid for by Sir Arthur Conan Doyle (of Sherlock Holmes fame) who was a member of the Society for Psychical Research. When he came to Rochester he attended the Plymouth Street Spiritualist Church, which has since been relocated to the neighborhood of the Eastman mansion.

The relocated church.

The remains of the foundation of the cottage where the Fox sisters encountered the spirit of the murdered peddler.

The George Eastman, of Kodak, mansion, garden courtyard. These are wisteria pods.

Rainbow-themed Peace Park along Lake Ontario on land donated by George Eastman.

The very dark and threatening Xerox building in downtown Rochester.

Bausch and Lomb world headquarters located opposite the Xerox building.

Strong Museum of Play with International Video Game Hall of Fame and Digital Worlds “Level-Up” Experience

 

Portal?

Triskelion with water and sphere outside the entrance to the museum.

Time bending reference in the divination toy display.

 

 

MLK Park Opposite Strong Museum in Rochester

 

Derelict Kodak buiding with short dumping in the parking lot – boat and tires.

Seneca Park Zoo that had a partnership with IXO Foundation on conservation impact data. Naked mole rats are mammals that live in a colony like eusocial insects.

On the way to Vermont I drove through Rome, NY (archetypal field?) and spent some time driving around the former Griffiss Air Force Base, which specialized in electronic warfare. It is being redeveloped as a public-private partnership defense-tech mini-city with stack and pack housing, a private international airport, and huge airplane hangers that have been converted to incubator space for quantum computing and nano-tech R&D. Oh, and there’s a sculpture garden with disc golf featuring many mythological characters. I had to leave a few hearts there, too. Note the dramatic chrome Pegasus in the traffic circle. We are living in surreal times folks. Every day is like the Percy Jackson series…

Innovare, supported by the Griffiss Institute represented by an origami griffin.

https://www.innovare.org/

https://www.griffissinstitute.org/

Notice the vintage-vibe Alcoa sticker that was lying in the grass next to the statue. Freya is the Norse goddess of destiny. Her symbol is the distaff.

I made a wish.

 

Farrell Hall at the TRINITY campus of the University of Vermont in Burlington. The building houses the office of Joshua Bongard who assisted in the creation of Xenobots, “living” robots made from frog cells. The building is also home to the Gund Institute for the Environment. You can see the Gund Foundation’s involvement in early childhood impact finance, equity and justice finance, climate solutions, etc. etc. in the organization’s LittleSis entry.

 

 

 

 

Friday, 05. April 2024

Doc Searls Weblog

Talking Artificial Intelligence with the Real Don Norman

Artificial is AI’s frst name. And Intelligence is a quality, not a quantity. You can’t measure it with a dipstick, a ruler, or an IQ test. If you could, you’d get the same result every time.* But being artificial doesn’t mean AI isn’t dangerous, fun or both. It is, and will be, what we make […]

Artificial is AI’s frst name. And Intelligence is a quality, not a quantity. You can’t measure it with a dipstick, a ruler, or an IQ test. If you could, you’d get the same result every time.*

But being artificial doesn’t mean AI isn’t dangerous, fun or both. It is, and will be, what we make of it.

That’s what Don Norman says, and he’s been publishing in AI journals since 1973. His laboratory produced the first multi-layer neural nets in the 1980s. He wrote Things that Make us Smart in 1993.

In the opinion of myself and countless others, Don is also the foremost authority on design—of anything and everything. For more on that, check out Don’s Web page, his Wikipedia page, and his books. Or, if you just want to sample some of his thoughts on AI, watch this.

Or you can skip all that and come to the good stuff: joining us in a talk with Don in the final salon of this semester on the topic of Artificial +/vs. Human Intelligence. It’s next Tuesday, April 9, at Noon Eastern time. (That’s less than 24 hours after the shadow of the Moon passes over the Indiana University campus. Yes, totality will be local here.)

Also, this won’t be a lecture or a presentation. It will be a lively discussion because Don is especially good at that.

It’s also free and online, but you have to register first. Do that here.

*For what it’s worth, my own known IQ test scores have an 80-point range. I’ve written about that, and the myth of “IQ” here, here, here, here, and I suppose in too many other places.


Fishing For Free TV Signals

I think I will be the last person in Bloomington to try getting free over-the-air TV from what’s left of all the major networks. But that’s just my style, so roll with me while I explain how I’m hoping to do it, with the antenna above, which I’ll need because here is what the Search […]
By expert acclaim, this is the best antenna for receiving hard-to-get over-the-air (OTA) TV signals

I think I will be the last person in Bloomington to try getting free over-the-air TV from what’s left of all the major networks. But that’s just my style, so roll with me while I explain how I’m hoping to do it, with the antenna above, which I’ll need because here is what the Search Map at RabbitEars.info says we might get here:

We live next door right now, and the top station above, WTIU from Indiana University (our PBS affiliate), comes from a tower you can walk to from here. We can get that signal by using a straightened paper clip for an antenna. (You jam the clip into the center hole of the coaxial connector in the back of the TV.) Even a real indoor antenna connected to the same jack gets nothing else, not even the two stations above with “Fair” signal strength.

But this Televes antenna might do the job because we’re on the slope of a hill that faces the Indianapolis stations that carry CBS (WTTV/4 on 27), ABC (WRTV/6 on 25), NBC (WTHR/13 on 13), and Fox (WRDB/41 on 32)*. These range from 27 to 54 miles away, in roughly the same direction. VHF and UHF signals always gain strength when they hit the faces of hills, similar to how surf builds as it approaches a sand bar or a shore. Also, the Televes DAT BOSS antenna gets great reviews:

TechHive: Televes Dat Boss Mix LR review: This is a great outdoor antenna Tyler the Antenna Guy: Televes DATBOSS LR Mix Outdoor Antenna Review 149883 Solid Signal: ALL NEW Televes DATBOSS Mix LR Antenna TESTED (w/assembly instructions) Amazon: Televes DAT Series BOSS Mix LR Outdoor High-VHF/UHF HDTV Antenna (see the reviews)

I was going to put it in our new attic before the drywall goes up. However, the attic space is low and full of close cross-braces. Worse, the antenna is not small and kinda complicated to fit in a space that’s a web of short 2x4s. Dig:

So it will go on a pole in the backyard and feed a coaxial line that will tunnel through conduit under the yard and inside to the new living room.

But I would like to test it first, preferably with a tuner gizmo I can plug into my laptop. I had one of those for years: the Elgato EyeTV Hybrid TV Tuner stick, which looked like a fat thumb drive,with USB-A at one end and a coax connector for an antenna at the other. It was sold in the ’00s and picked up both analog and digital TV (the Digital Transition was happening then), on every North American channel, and came with good software that ran on Macs and operating systems that have long been abandoned. Far as I can tell there are no replacements that run on current hardware or operating system, other than this one sold in Europe. Far as I can tell, it only works on TV bands over there. But I could be wrong. If anybody knows of a gizmo/softward combo I can use, please tell me. My only other option is to buy or find a cheap TV and try that out. Any advice is welcome. Thanks!

*After the digital transition in 2008, and again with the “repack” after 2016, most TV stations moved onto channels other than their original ones, using less spectrum overall. All the TV channels above 36 were auctioned off, first in 2008 and again in 2018. Most buyers were cellular and other short-range wireless carriers, which have been repurposing the old TV spectrum for 5G and other modern uses. The only station in Indianapolis that didn’t move its channel position was WTHR/13. That one is listed in the chart above as one of the “bad” signals for this location. The Televes antenna is designed specifically for “high band” VHF (channels 7-13) and the remaining UHF (14 to 36) TV channels. It also filters out any 5G signals that the antenna might pick up on what used to be the higher UHF channels. By the way, the old “low band” VHF channels (2 to 6) are still in use in some places, but by very few TV stations.  So it’s not worth it for Televes to design an antenna to pick those channels up. Such an antenna would also be a lot bigger and longer because the low-band elements of the antenna would be much longer.

Thursday, 04. April 2024

Doc Searls Weblog

Feed Time

Two things worth blogging about that happened this morning. One was getting down and dirty trying to make DALL-E 3 work. That turned into giving up trying to find DALL-E (in any version) on the open Web and biting the $20/month bullet for a Pro account with ChatGPT, which for some reason maintains its DALL-E […]
I asked ChatGPT to give me “people eating blogs” and got this after it suggested some details.

Two things worth blogging about that happened this morning.

One was getting down and dirty trying to make DALL-E 3 work. That turned into giving up trying to find DALL-E (in any version) on the open Web and biting the $20/month bullet for a Pro account with ChatGPT, which for some reason maintains its DALL-E 3 Web page while having “Try in ChatGPT︎” on that page link to the ChatGPT home page rather than a DALL-E one. I gather that the free version of DALL-E is now the one you get at Microsoft’s Copilot | Designer, while the direct form of DALL-E is what you get when you prompt ChatGPT (now 4.0 for Pro customers… or so I gather) to give you an image that credits nothing to DALL-E.

The other thing was getting some great help from Dave Winer in putting the new Feedroll category of my Feedland feeds placed on this blog, in a way similar stylistically to old-fashioned blogrolls (such as the one here). You’ll find it in the right column of this blog now. One cool difference from blogrolls is that the feedroll is live. Very cool. I’m gradually expanding it.

Meanwhile, after failing to get ChatGPT or Copilot | Designer to give me the image I needed on another topic (which I’ll visit here later) I prompted them to give me an image that might speak to a feedroll of blogs. ChatGPT gave me the one above, not in response to “people eating blogs” (my first attempt), but instead to “People eating phone, mobile and computer screens of type.” Microsoft | Designer gave me these:

Redraw your own inconclusions.


The Pragmatic Engineer

The Pulse #88: are we at peak AI hype?

Several signs are pointing that we’ve hit the peak of this AI hype cycle: that things could cool down soon enough. Also: the sudden license change at Redis; HashiCorp looking for a buyer, and more.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. A global software hack stopped in an improbable way; security company Rubrik prepares to go public; Mark Zuckerberg personally convincing AI engineers to join Meta, and more.

Peak AI hype? There are several signs pointing that we have hit the peak of the AI hype – and perhaps already passed it by a little. Companies spending 10x on GPUs than revenue generated, AI startups unable to raise the next round of funding, and irrationally high valuations for AI startups whose technology is about to be commoditized.

Commercial open source companies in trouble? Redis Labs changed the formerly permissive open source Redis license to a restrictive one, with the goal to have cloud providers pay when they host Redis. As a response, cloud providers started the Valkey project, which could become the “new and still permissive Redis.” HashiCorp is facing similar challenges with Terraform / OpenTofu. We could be seeing the end of billion-dollar companies built on permissive open source licenses.

Section 174: still hurting US software companies. The US remains the most hostile country in the world to start a software business: because developer salaries cannot be deducted as an expense. The US Senate has a bill that would fix this: but this bill shows no signs of being passed. The US is likely to see fewer software engineers hired until these Section 174 changes remain in place.

1. Industry pulse

Read more


Doc Searls Weblog

Death is a Feature

When Parisians got tired of cemeteries during the French Revolution, they conscripted priests to relocate bones of more than six million deceased forebears to empty limestone quarries below the city: a hundred miles of rooms and corridors now called The Catacombes. It was from those quarries that much of the city’s famous structures above—Notre Dame, et. al.—were built […]
When Parisians got tired of cemeteries during the French Revolution, they conscripted priests to relocate bones of more than six million deceased forebears to empty limestone quarries below the city: a hundred miles of rooms and corridors now called The Catacombes. It was from those quarries that much of the city’s famous structures above—Notre Dame, et. al.—were built in prior centuries, using a volume of extracted rock rivaling that of Egypt’s Great Pyramids. That rock, like the bones of those who extracted it, was once alive. In the shot above, shadows of future fossils (including moi) shoot the dead with their cell phones.

Elon Musk wants to colonize Mars.

This is a very human thing to want. But before we start following his lead, we might want to ask whether death awaits us there.

Not our deaths. Anything’s. What died there to make life possible for what succeeds it?

From what we can tell so far, the answer is nothing.

To explain why life needs death, answer this: what do plastic, wood, limestone, paint, travertine, marble, asphalt, oil, coal, stalactites, peat, stalagmites, cotton, wool, chert, cement, nearly all food, all gas, and most electric services have in common?

They are all products of death. They are remains of living things or made from them.

Consider this fact: about a quarter of all the world’s sedimentary rock is limestone, dolomite and other carbonates: remains of beings that were once alive. The Dolomites of Italy, the Rock of Gibraltar, the summit of Mt. Everest, all products of death.

Even the iron we mine has a biological source. Here’s how John McPhee explains it in his Pulitzer-winning Annals of the Former World:

Although life had begun in the form of anaerobic bacteria early in the Archean Eon, photosynthetic bacteria did not appear until the middle Archean and were not abundant until the start of the Proterozoic. The bacteria emitted oxygen. The atmosphere changed. The oceans changed. The oceans had been rich in dissolved ferrous iron, in large part put into the seas by extruding lavas of two billion years. Now with the added oxygen the iron became ferric, insoluble, and dense. Precipitating out, it sank to the bottom as ferric sludge, where it joined the lime muds and silica muds and other seafloor sediments to form, worldwide, the banded-iron formations that were destined to become rivets, motorcars and cannons. The is the iron of the Mesabi Range, the Australian iron of the Hammerslee Basin, the iron of Michigan, Wisconsin, Brazil. More than ninety percent of the iron ever mined in the world has come from Precambrian banded-iron formations. Their ages date broadly from twenty-five hundred to two thousand million years before the present. The transition that produced them — from a reducing to an oxidizing atmosphere and the associated radical change in the chemistry of the oceans — would be unique. It would never repeat itself. The earth would not go through that experience twice.

Death produces building and burning materials in an abundance that seems limitless, at least from standpoint of humans in the here and now. But every here and now ends. Realizing that is a vestigial feature of human sensibility.

Take for example, The World Has Plenty of Oil, which appeared in The Wall Street Journal ten years ago. In it, Nansen G. Saleri writes, “As a matter of context, the globe has consumed only one out of a grand total of 12 to 16 trillion barrels underground.” He concludes,

The world is not running out of oil any time soon. A gradual transitioning on the global scale away from a fossil-based energy system may in fact happen during the 21st century. The root causes, however, will most likely have less to do with lack of supplies and far more with superior alternatives. The overused observation that “the Stone Age did not end due to a lack of stones” may in fact find its match.

The solutions to global energy needs require an intelligent integration of environmental, geopolitical and technical perspectives each with its own subsets of complexity. On one of these — the oil supply component — the news is positive. Sufficient liquid crude supplies do exist to sustain production rates at or near 100 million barrels per day almost to the end of this century.

Technology matters. The benefits of scientific advancement observable in the production of better mobile phones, TVs and life-extending pharmaceuticals will not, somehow, bypass the extraction of usable oil resources. To argue otherwise distracts from a focused debate on what the correct energy-policy priorities should be, both for the United States and the world community at large.

In the long view of a planet that can’t replace any of that shit, this is the rationalization of a parasite. That this parasite can move on to consume other irreplaceable substances it calls “resources” does not make its actions any less parasitic.

Or, correctly, saprophytic; since a saprophyte is “an organism which gets its energy from dead and decaying organic matter.”

Moving on to coal, the .8 trillion tons of it in Wyoming’s Powder River Basin now contributes 40% of the fuel used in coal-fired power plants in the U.S. Here’s the biggest coal mine in the basin, called Black Thunder, as it looked to my camera in 2009:

About half the nation’s electricity is produced by coal-fired plants, the largest of which can eat the length of a 1.5-mile long coal train in just 8 hours. In Uncommon Carriers, McPhee says Powder River coal at current rates will last about 200 years.

Then what? Nansen Saleri thinks we’re resourceful enough to get along with other energy sources after we’re done with the irreplaceable kind.

I doubt it.

Wind, tide, and solar are unlikely to fuel aviation, though I suppose fresh biofuel might. Still, at some point, we must take a long view, or join our evolutionary ancestors in the fossil record faster than we might otherwise like.

As I fly in my window seat from place to place, especially on routes that take me over arctic, near-arctic, and formerly arctic locations, I see more and more of what geologists call “the picture”: a four-dimensional portfolio of scenes in current and former worlds. Thus, when I look at the seashores that arc eastward from New York City— Long Island, Block Island, Martha’s Vineyard, Nantucket, Cape Cod—I see a ridge of half-drowned debris scraped off a continent and deposited at the terminus of an ice cap that began melting back toward the North Pole only 18,000 years ago—a few moments before the geologic present. Back then, the Great Lakes were still in the future, their basins covered by ice that did not depart from the lakes’ northern edges until about 7,000 years ago or 5,000 B.C.

Most of Canada was still under ice while civilization began in the Middle East and the first calendars got carved. Fly over Canada often enough and the lakes appear to be exactly what they are: puddles of a recently melted cap of ice. Same goes for most of the ponds around Boston. Every inland swamp in New England and upstate New York was a pond only a few dozen years ago, and was ice only a dozen or so centuries before that. Go forward a few thousand years and all of today’s ponds will be packed with accumulated humus and haired over by woods or farmland. In the present, we are halfway between those two conditions. Here and now, the last ice age is still ending.

As Canada continues to thaw, one can see human activity spark and spread across barren lands, extracting “resources” from ground made free of permafrost only in the last few years. Doing that is both the economic and the pestilential thing to do.

On the economic side, we spend down the planet’s principal, and fail to invest toward interest that pays off for the planet’s species. That the principal we spend has been in the planet’s vaults for millions or billions of years, and in some cases cannot be replaced, is of little concern to those spending it, which is roughly all of us.

Perhaps the planet looks at our species the same way and cares little that every species is a project that ends. Still, in the meantime, from the planet’s own one-eyed perspective, our species takes far more than it gives, and with little regard for consequences. We may know, as Whitman put it, the amplitude of time. We also tend to assume in time’s fullness all will work out.

But it won’t.

Manhattan schist, the bedrock anchoring New York City’s tallest buildings, is a little over half a billion years old. In about the same amount of time, our aging Sun, growing hotter, will turn off photosynthesis. A few billion years later, the Sun will swell into a red giant with a diameter wider than Earth’s orbit, roasting the remains of our sweet blue planet and scattering its material out into the cosmos, perhaps for eventual recycling by stars and planets not yet formed.

In a much shorter run, many catastrophes will happen. One clearly is what our species is already doing to the planet during what geologists correctly call the Anthropocene. I suppose that’s a good reason for Elon and crew to “save” a few members of our vain little species. But why fuck up Mars before we’re done fucking up Earth, when there’s still some leverage with the death we have at home and that Mars won’t begin to have until stuff dies on it?

I’ve always been both an optimist and a realist. Specifically, I’m an optimist for at least the short run, by which I mean the next few dozen years. But I’m a pessimist for our civilization — and our species. Death is always a winning bet.

But hey, maybe nature knows better what to do with us than we do.

The first ancestor of this piece appeared in blogs.harvard.edu on 4 March 2008. The second is here on Medium.

Tuesday, 02. April 2024

Moxy Tongue

America Works For People, Building For Happiness..

America comes with a proposition ratified (LLPoH); Sovereignty is processed as a runtime event at the edges. Middleization is an attack vector. Inadequate system designs are a feature flaw of inadequate engineering design processes. Governance by administration of law is absolutely evolutionary. Failures happen. Edits happen. Time continues throughout, and personal views are coherent in context. C

America comes with a proposition ratified (LLPoH); Sovereignty is processed as a runtime event at the edges. Middleization is an attack vector. Inadequate system designs are a feature flaw of inadequate engineering design processes. Governance by administration of law is absolutely evolutionary. Failures happen. Edits happen. Time continues throughout, and personal views are coherent in context. Context is an edge driven event requiring insight by Governance under scrutiny. Scrutiny is a cost of Sovereignty. Accurate scrutiny processes are a function of engineering design process. Participation by people in American civil society is an edge-driven experience by people with blood in their veins. Administration of law by Sovereign process requires accurate context for accurate processing. 

Current inadequate understandings and administration by poorly/ wrongly trained managers of Sovereign law is a time-based experience. Yes, lives are affected. "We hold these truths to be self-evident..."; turns out, with digital context, Sovereignty is not so self-evident to some... tcp/ip engineers under administrative influence released a tool of military-use into civil society without understanding scale of impact "breaking silos" would have, and what Constitutes a silo where data is transported meaningfully.
Structure yields results.. welcome to 2024.
Reminder: Communists have no property Rights. 
Sovereignty; recursive accuracy for people.

The Pragmatic Engineer

GenZ software engineers, according to older colleagues

Responses to a survey about GenZ suggest this new generation possesses standout differences. We explore what makes GenZ distinctive, and check out ideas for ways to work fruitfully together.

👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

Before we start: I’ve published details on how indicate interest in writing a guest article The Pragmatic Engineer. My goal is to work with hands-on engineering professionals to occasionally bring details, learnings and approaches from "working in the weeds.” A good example was last week’s article on Thriving as a founding engineer. If you’re interested in potentially writing a guest post, see the details.

A few months ago, I met up with my former engineering manager colleagues for dinner. As we caught up on each other’s news, one topic of chat was the new generation of tech workers. A common thing mentioned was that GenZ feels notably different to work with. Positive in many ways – and a bit confusing in others!

So, what is the new generation like at work, according to their colleagues? In a survey, we asked this of readers who identify as Millennial, GenX, or Boomer, and have dug into the replies. 

Today, we cover:

Who are GenZ? A recap.

Survey respondents. Mostly managers and seniors who are Millennial or GenX, based in the US and Europe.

Values. Higher salary and benefits expectations, challenging to retain, more distrustful of senior leadership.

Knowledge. GenZ are on the “bleeding edge” of frameworks, higher up on the tech stack, and use different learning resources.

Mentoring GenZ: what works? Display purpose, focus on the individual, and connect!

Working with GenZ: successful approaches. Give feedback, consider extra onboarding and training, and accept they’ll probably leave.

Pandemic impact? Enforced remote work began just as many people started their first jobs. It was a hard time for graduates and young professionals, for whom staying indoors – and on endless Zoom calls – was likely the last thing they wanted to do.

More observations and advice for GenZ. A generation of “mages,” and a possible bimodal distribution for ambition. Also, sharing the unfiltered survey responses.

This is part 1 of a two-part high-level overview on the new generation of talent in tech workplaces. This article is based on a small survey of people’s personal opinions, beginning with the views of more “seasoned” folks. But it won’t be a one-sided series! In part 2 to come soon, we hand the mic to tech professionals from GenZ, to learn what it’s like being young and in tech today – and what older colleagues are like to work with.

Some of the findings we’ll explore in this article 1. Who are GenZ? A recap.

Generation Z (GenZ) is the group name for people born between 1997 and 2012, and it’s commonly used in media and popular culture. In workplaces, the most senior members of this new generation have joined colleagues from the cohorts “Millennial,” “GenX,” and “Baby Boomer.” Each of these names refers to a span of time, and birth date determines which one you fit into: 

An overview of demographic cohorts. Image source: Wikipedia

The oldest members of GenZ are 27 years old, and the youngest are 12, meaning that graduate developers hired over the past few years are almost all GenZ’ers.

They’re the first generation to grow up “mobile native,” with consumer tech like smartphones, often from a very young age. The oldest members of this generation were aged 10 when the Apple iPhone launched in 2007 and kicked off the smartphone revolution.

There are countless studies claiming GenZ is different from earlier ones. A summary from Wikipedia:

“Compared to previous generations, members of Generation Z tend to live more slowly than their predecessors when they were their age; have lower rates of teenage pregnancies; and consume alcohol (but not necessarily other psychoactive drugs) less often. Generation Z teenagers are more concerned than older generations with academic performance and job prospects, and are better at delaying gratification than their counterparts from the 1960s, despite concerns to the contrary. (...) Nostalgia is a major theme of youth culture in the 2010s and 2020s.”

For reference, here is how Millennials are described, also by Wikipedia:

“As the first generation to grow up with the Internet, Millennials have also been described as the first global generation. The generation is generally marked by elevated usage of and familiarity with the Internet, mobile devices, and social media. The term "digital natives", which is now also applied to successive generations, was originally coined to describe this generation.

Millennials have also been called the ‘Unluckiest Generation’ because the average Millennial has experienced slower economic growth since entering the workforce than any other generation in U.S. history. The generation has also been weighed down by student debt and child-care costs.

Millennials across the world have suffered significant economic disruption since starting their working lives; many faced high levels of youth unemployment during their early years in the job market in the wake of the Great Recession, and suffered another recession in 2020 due to the COVID-19 pandemic.”

And let’s not forget GenX:

“As adolescents and young adults in the 1980s and 1990s, Xers were dubbed the ‘MTV Generation’ (a reference to the music video channel), sometimes being characterized as slackers, cynical, and disaffected. Some of the many cultural influences on Gen X youth included a proliferation of musical genres with strong social-tribal identity such as alternative rock, hip hop, punk, post-punk, rave, and heavy metal. (...) Video games, both in amusement parlors and in devices in Western homes, were also a major part of juvenile entertainment for the first time. Politically, in many Eastern Bloc countries, Generation X experienced the last days of communism and the transition to capitalism as part of its youth. In much of the Western world, a similar time period was defined by a dominance of conservatism and free market economics.

In their midlife during the early 21st century, research describes Gen Xers as active, happy, and achieving a work–life balance. The cohort has also been more broadly credited as entrepreneurial, and productive in the workplace.”

2. Survey Respondents

Sixty three readers took part in the survey. Here’s a breakdown of their current job titles:

Most are in management roles (manager, director-level, VP or C-level) or software engineers (senior-and-above.) “Other” covers product and data roles

Most respondents are based in the US and Europe:

Respondents by location

Most respondents belong to the Millennial or GenX generations.

Respondents, by generation. The lack of Boomers is unsurprising, since most are near retirement

What I don’t know is the extent to which participants in the survey reflect the newsletter’s readership. But there’s a way to find out. My newsletter platform (Substack) launched surveys that let you vote with a click. So, I have a request; would you please tap below on the generation you belong to, based on your birth date? Voting is anonymous, and your vote isn’t linked to your email.

Many thanks to all who share this detail! With that, let’s dive into the survey’s findings.

3. Values

A note of caution: of course, it’s impossible to sum up a whole generation based on a mix of opinions from a few workplaces! The details in this article are anecdotal, based on individuals’ experiences. 

We know labels like “GenZ” are loose and fail to account for the inherent diversity of millions of people, which simply cannot be summed up in a collection of observations, no matter how detailed. There are GenZ folks to whom none of the respondents’ views apply, as well as people with traits which no respondent captured. 

Our goal with this survey is just to try and identify any distinctive characteristics which could be worth paying attention to. How can we work together with this younger generation and build great products and a thriving business? It’s undeniable that differences exist between successive generations, and it’s worth attempting to verbalize these. The new generation of software engineers is no different!

Higher expectations for salary, promotions and benefits

Common themes from the responses:

GenZ expects bigger raises and faster promotions

Benefits like flexible or remote working are expected

Some respondents say younger colleagues don’t appear to offer as much in return, such as shouldering higher expectations, or taking on extra responsibilities. To be fair, this sounds like fairly typical “early-stage career” behavior; I remember Millennials being described like this 10-15 years ago!

Hard to retain, high expectations, & little “loyalty”

It’s clear that survey respondents think GenZ expects promotions faster, and aren’t shy to jump ship for better pay. Respondents say there is little to no “corporate loyalty” either. This is understandable; can we entirely blame people for putting their own career first, especially after waves of ruthless job cuts in software engineering by companies following their own interests? Also, switching jobs for better pay occurs across the generations.

A lack of corporate “loyalty” among young colleagues is noted positively by some respondents:

“Retaining them is hard, there is no loyalty to the company. But this is a good thing, as we have seen that companies, even great ones like Google, aren't loyal to their people! So, you need to be creative, move GenZ colleagues around a lot within the team (keep the work fresh), use retention strategies (options, bonuses, etc.,) and find empathetic managers.”

– Director of Engineering at a startup acquired by Big Tech (GenX, US)

“From my old-school perspective, they are somewhat entitled, expecting flexibility, yearly raises and promotions – even when the company has no need for more managers, for example. They also expect significant benefits on top of our higher-than-market pay. 

As many before me have said, they have cast off outdated expectations of ‘loyalty’ to jobs that aren't loyal to them, in return. It's admirable for the most part, though I do not think everyone should get raises/promotions every year in a small company.”

– Head of Product at a startup (Millennial, US)

Retaining GenZ workers is harder, according to some responses:

“Motivating and retaining seems to be increasingly difficult, as they have all been advised to job hop around the two-year mark in order to have a chance at earning more. Which I don’t blame them for – it’s just that it is tough for managers who have little say/influence on their reports’ pay packages.”

– Senior Software Engineer at a Fintech (GenX, US)

“I see a challenge around motivation and retention. There is somehow a baked-in assumption that just turning up to work means you should get a pay rise. Until very recently (due to the downturn in tech jobs,) it was extremely difficult to retain without using a lever, such as compensation.” 

– VP of SRE at startup (GenX, US)

“I find retaining them hard when it comes to money: sometimes they get crazy offers for positions! They will apply to positions that they are only starting to be ready for. Older peers tend to take a different approach: they sometimes need to feel more confident about their capabilities in the target role before changing jobs.”

– Platform Director at a Fintech (Millennial, France)

Could social media influencers be fuelling career impatience among young professionals? One respondent reckons so:

“Almost impossible to retain; expected salaries and titles more often than not surpass their actual ability about 6-12 months into tenure. From conversations with fellow managers and university lecturers, it seems YouTube and LinkedIn influencers are fuelling unrealistic expectations. 

Specifically: over-stating market-rate salaries; while understating levels of responsibility that accompany a given salary range”

– Head of Engineering at a Series A startup (Millennial, UK)

It’s tempting to blame social media, but I reckon what we’re seeing is that GenZ have far more (good and bad) career advice from peers than previous generations did. Sure, this advice isn’t always perfect, even when it’s well-meant.

Senior leadership distrusted, demand for transparency

Survey respondents think GenZ colleagues have less trust in management, and are skeptical of business decisions:

“I find GenZ to be strong believers in radical transparency. They want to know the ‘why’ behind many decisions. The speed with which they want to consume this information is also much higher than other generations. 

I have observed that GenZ have a stronger distrust of senior leadership than other generations. My hypothesis is that with the tech retraction in 2022-2023, this generation was going through its first downturn and thus questioned why there were no visible consequences for company leadership. 

Looking back to the sub-prime crisis of 2008 – my first downturn after graduating college – I was also concerned. However, I saw that this was more than just a specific company issue, and was a much larger economic problem. One could correctly say that the tech retraction of 2022 and the sub-prime crisis of 2008 are not apples-to-apples, but I think there are some similarities.”

– Engineering Manager at a startup (Millennial, Boston, US)

“They are very aware and cynical of the impact of capitalism on business decisions.”

– Frontend Engineer at Amazon (GenX, Canada)

Workplaces with low transparency may have a hard time retaining GenZ workers, some respondents believe:

“My hunch would be that GenZ won’t stay long in cultures where transparency is not the norm, and where it’s not okay to question upwards, and in a direct way.”

– Senior DevOps Engineer in the public sector (Millennial, Singapore)

A fitting summary comes from an engineering manager:

“The don’t take management bullsh*t.”

– Group Engineering Manager at a scaleup (GenX, Australia)

Outspoken, open communication style

Blunt communication is something older colleagues notice:

“They have no trouble interacting with their peers, older folks. There weren’t any occasions where communication was a problem, I’ve not received or heard of any negative feedback. 

This being said, they were more likely to speak out against and voice their concerns directly. Whether it was work related (disagreeing with an analysis, decision, etc,) or non-work related (good movies and tv shows.) This happens at varying levels
between individuals, but it is common among the GenZ’ers I’ve interacted with.

Some of my colleagues were disarmed by their frankness. Some found that the GenZ were too open, insensitive, and should be more sensitive of their environment, and of people hearing what they said. Others embraced the directness and it built strong relationships that lasted beyond the internship. “

– Senior DevOps Engineer in the public sector (Millennial, Singapore)

A few other respondents mention that the communication style ruffles the feathers of some senior colleagues:

“It can be hard for older peers, as younger new joiners often feel free to share strong opinions about how things are done, that could be harmful for others.”

– Platform Director at a scaleup (Millennial, Spain)

The reasons for such directness and bluntness are well-intended, says one respondent:

“They are clearly intent on improving the status quo.”

– VP of SRE at a scaleup (GenX, UK)

An important trait pointed out by a VP:

“They are usually willing to speak truth to power.”

– VP of Engineering (Boomer, US)

Does the age-old age gap explain older professionals’ reactions to young colleagues, or is there more to it? Certainly, many survey responses mention GenZ’s “in your face” communication style. Cultural factors may be at play, and perhaps GenZ is a generation that speaks its mind, regardless of whether to a peer, or the business’s Boomer CEO!

Personally, I think this fits into existing workplace trends; better teams already spend time ensuring graduate colleagues feel unafraid to speak up when senior peers are in the room. Indeed, companies like Meta got rid of titles of seniority in order to encourage precisely this behavior. But it’s always good to bear in mind the thin line between directness and rudeness. 

Modern values?

Survey respondents say GenZ colleagues are more vocal about work-life balance, flexibility of work, mental health, and diversity. They are more protective of free time, but friendlier and more personal in the office than older colleagues. Survey responses from GenZ tech workers confirm these observations.

Protecting free time:

“They are much less willing to do the work to grow on their own time. They are also much more willing to take a break and protect their own time.”

– Head of Design at a scaleup (GenX, US)

“They have a dim view of working outside the 9-5.“

– Staff Engineer at a startup (Boomer, US)

Informality and openness:

“GenZ is much more open about sharing personal situations: mental health, life events, etc. As a manager, I like this as it lets me know what is going on and how to help them.

– Director of Software development, large tech company (GenX, US)

“Much more informal in the workplace; they tend to have friendlier, closer relationships with colleagues. They also excel at workplace culture and work-life balance.”

– Frontend engineer at Amazon (Millennial, US)

Diversity and inclusivity:

“They are very much pushing for a great, healthy culture and a lot of diversity – this is good! Mental health is something front and center for them, which is great to see in the workplace.”

– Staff Software Engineer at a scaleup (GenX, New York, US)

“I’ve found that GenZ engineers are more inclusive of different backgrounds and people at the company/on teams.”

– Senior Software Engineer, scaleup (Millennial, US)

“They are more aware of diversity issues.”

– CTO at a mid-sized company (Boomer, Europe)

The importance of flexibility:

“Flexible hours are very important for them, as is independence, and working in a ‘flat’ hierarchy.”

– CTO at a mid-sized company (Boomer, Europe)

Workplace politics:

“The mere existence of people who don't agree with them politically seems deeply upsetting oftentimes, but maybe that's a symptom of a larger issue of polarization in the U.S. When I entered the workforce, I thought it was understood that religion and politics were completely off the table AT WORK because it should go without saying that not everybody is in complete lockstep with the same ideology, and yet we must still work together and see one another every day.“

– Senior Software Engineer at a startup (Millennial, US)

Caring about things earlier generations do not:

“From my point of view as a member of GenX, GenZ can be critical of things that I don't even notice. For example, they might use Firefox over Chrome because Firefox is part of the truly open internet.”

– Head of Design at a dev tools company (GenX, US)

Purpose and meaning

GenZ seems more inspired by a strong sense of purpose and meaning, according to their older colleagues. Maybe this is why they’re seen as questioning everything: they want to know the why of tasks and decisions, and to do things for the right reasons!

“I see the GenZ people need a strong purpose. When the company can provide it to them, it's easier to keep them motivated.”

– Team Lead at a startup (Millennial, Brazil)

“GenZ engineers can be extremely motivated to do right by the user, particularly if the problem they are solving has a deep impact on something meaningful.“

– Head of Design at a dev tools company (GenX, US)

4. Knowledge

Read more


Damien Bod

Create conditional access base policies for a Microsoft Entra ID tenant

This article shows some of the base conditional access policies which can be implemented for all Microsoft Entra ID tenants. Phishing resistant authentication should be required for all administration flows and some other user policies like sign-in risk MFA or terms of conditions. I recommend these base policies when implementing an Microsoft Entra ID tenant […]

This article shows some of the base conditional access policies which can be implemented for all Microsoft Entra ID tenants. Phishing resistant authentication should be required for all administration flows and some other user policies like sign-in risk MFA or terms of conditions. I recommend these base policies when implementing an Microsoft Entra ID tenant using a P2 license.

Disable security defaults

The security defaults are a good basic setup, but when a P2 license is used, conditional access policies can be applied and the tenant can be setup to force things like phishing resistant authentication.

Disable on the tenant in the “your-tenant” | Overview | Properties

All the security defaults are disabled and good conditional access policies are now required.

Activate conditional access policies

There are many conditional access policies. These are applied and different depending on the tenant requirements. The following base policies make sense in all tenants:

Force MFA conditional access policy (All users) Require Terms of Use policy Block legacy authentication (All users) Enable Sign-in risk policy (All users) Require phishing resistant authentication for admins Enable User risk policy (All users)

A single break glass account is excluded from these policies and this account should never be used except in an emergency. Alerts are required on this account.

1. Force MFA conditional access policy

Multi-factor authentication can be forced for all users except the break glass account. This uses the “Require authentication strength” policy and the tenant can set the default strength as required.

Add the following policy ( Force MFA All users except break glass account )

2. Require Terms of Use policy

Add a Require Terms of Use for app ( App Require Terms of Use ) policy. You can use Microsoft Entra ID to force the users of the tenant and all the client apps to except the terms of conditions required by the tenant and the hosted applications.

The terms of use needs to be added to the Azure tenant:

https://learn.microsoft.com/en-us/entra/identity/conditional-access/terms-of-use

The policy can be created for the terms of use. See the Microsoft docs for details.

https://learn.microsoft.com/en-us/entra/identity/conditional-access/require-tou

3. Block legacy authentication

Block the legacy authentication in the tenant. The Client apps should select only the Exchange ActiveSync clients and Other clients and the access must be blocked.

https://learn.microsoft.com/en-us/entra/identity/conditional-access/howto-conditional-access-policy-block-legacy

4. Enable Sign-in risk policy

You can activate the sign-in risk and choose how strict. If a risky sign-in is detected, the user is required to do a multi-factor authentication. This requires a P2 license for user accounts. See the Microsoft docs for details:

https://learn.microsoft.com/en-us/entra/id-protection/howto-identity-protection-configure-risk-policies

5. Require phishing resistant authentication for admins

Phishing resistant MFA should be applied to app administrator workloads. This can be created from the Azure provided template.

https://learn.microsoft.com/en-us/entra/identity/conditional-access/how-to-policy-phish-resistant-admin-mfa

The policy is applied to the Azure roles:

Global Administrator Application Administrator Authentication Administrator Billing Administrator Cloud Application Administrator Conditional Access Administrator Exchange Administrator Helpdesk Administrator Password Administrator Privileged Authentication Administrator Privileged Role Administrator Security Administrator SharePoint Administrator User Administrator

When a user account has one of the Azure admin roles, phishing resistant authentication is required for access to the tenant.

6. Enable User risk policy (All users)

If a user account has a high or medium level possibility that it has been compromised, the user is required to do a multi-factor authentication. Why not Self-service password reset (SSPR)? I don’t really see the point of this if you are using passwordless sign-ins. Without a SSPR for a user with a password, the user-risk is not reset and the user will be forced to MFA again. I am not sure how this policy works with passwordless or phishing resistant authentication flows. This policy only makes sense with the high threat category and the block user. This requires a P2 license for users accounts.

Summary

These are the base policies and further policies can be added depending on the tenant requirements. Some session based controls would normally make sense as well.

Notes

The examples of the continuous access policies are shown and set up using the Azure portal. This would be way better as a terraform script and a fully automated set up using something like Azure DevOps or Github actions.

Links

https://learn.microsoft.com/en-us/entra/identity/conditional-access/howto-conditional-access-policy-block-legacy

https://learn.microsoft.com/en-us/entra/identity/conditional-access/require-tou

https://learn.microsoft.com/en-us/entra/identity/conditional-access/howto-conditional-access-policy-block-legacy

https://learn.microsoft.com/en-us/entra/id-protection/howto-identity-protection-configure-risk-policies

https://learn.microsoft.com/en-us/entra/identity/conditional-access/how-to-policy-phish-resistant-admin-mfa

https://learn.microsoft.com/en-us/entra/identity/conditional-access/concept-continuous-access-evaluation

Monday, 01. April 2024

Just a Theory

Talk: Postgres Yum Extensions

On Wednesday at noon US Eastern / 16:00 UTC., Devrim Gündüz will be giving a talk: “Overview of the yum.postgresql.org architecture, how new RPMs are added, and issues and challenges with distributing RPMed extensions.”

Quick reminder that Devrim Gündüz of EnterpriseDB will be giving a talk at this week’s extension ecosystem mini-summit, an overview of the PostgreSQL Yum Repository architecture, how new RPMs are added, and issues and challenges with distributing RPMed extensions.

The community Yum and Apt repositories are the canonical distributors of PostgreSQL community builds on Linux platforms, with a long history of robust support for multiple OSes, OS versions, and PostgreSQL versions. Devrim will help us understand how the Yum repository works, the inclusion of extensions from the broader community, and the challenges for automatic binary packaging for all publicly-available extensions.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite.

Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

More about… Postgres Yum RPM Extensions PGConf Summit

Sunday, 31. March 2024

Timothy Ruff

Five Failed Blockchains: Why Trade Needs Protocols, Not Platforms

The trade industry needs a “universal translator” First we.trade failed in mid-2022. Then TradeLens at the end of 2022, then Marco Polo in early 2023, then Contour in late 2023. Of the five major trade-related blockchains only Komgo has survived, but only after ditching blockchain. The Trade Blockchain Graveyard Each failed blockchain had enormous support and resources at the outset: We.tra
The trade industry needs a “universal translator” First we.trade failed in mid-2022. Then TradeLens at the end of 2022, then Marco Polo in early 2023, then Contour in late 2023. Of the five major trade-related blockchains only Komgo has survived, but only after ditching blockchain. The Trade Blockchain Graveyard

Each failed blockchain had enormous support and resources at the outset:

We.trade, launched in 2017 and built in collaboration with IBM on Hyperledger Fabric, boasted Deutsche Bank, HSBC, Santander, Societe Generale and UBS.

TradeLens, launched in 2018 and also built on Fabric, had industry giants IBM and Maersk.

Marco Polo, launched 2019 and built on R3’s Corda, had Commerzbank, BNY Mellon and SMBC with financial backing from ING Ventures and BNP Paribas.

Contour, launched in 2020 and also built on Corda, had ANZ, BNP Paribas, HSBC, and Standard Chartered banks along with a strong contingent of trade integration and documentation partners including Finastra, CargoX, Bolero and Surecomp.

Komgo, launched in 2018 on Quorum blockchain infrastructure and dedicated to trade finance, still boasts Citi, ING, and a dozen others partners and has over 100 customers apparently still using the platform, but has since ditched blockchain.

Despite all the money and power behind them, all major trade blockchains have failed in a surprisingly short period of time. There are several other blockchains dedicated to trade, but these five are the big ones.

Why Business Blockchains Fail

So why all the embarrassing failures? Ledger Insights explains the troubling trend this way:

“In most cases, the issue was a failure to achieve market fit and scale before the money ran out rather than any particular blockchain technology.”

I agree, blockchain’s technology didn’t cause these failures. Blockchains fail because they are platforms — an assertion I defend in detail below — and as we’ve seen in industry after industry, it’s nearly impossible to get entire industries to join the same platform. Blockchains succeed when they out-compete other blockchains for adoption, no different than centralized platforms.

What’s needed to enable secure global digital interoperability for trade are protocols, not platforms, specifically protocols for the secure peer-to-peer exchange of verifiable trade instruments. (<< Read the linked essay, it’s important.) While such protocols aren’t the only thing needed to achieve this elusive objective —standard schema also comes to mind , and is discussed below— protocols are necessary to escape the fatal limitations inherent to blockchains and other platforms.

How Blockchains Are Platforms

Blockchains inherently follow the same “EUM” pattern of the centralized, proprietary platforms they claim to replace, screaming “Everybody Use Me”, or more accurately, “Everybody use my service”. And though permissionless blockchains like Bitcoin and Ethereum may have decentralized governance — which is more than cool — business blockchains have their governance centralized around a single decision-making body, typically a consortium, that makes all the rules.

But regardless of whether a blockchain has decentralized or centralized governance, one common, inescapable fatal flaw remains: a blockchain is a single logical database that all counterparties must agree to use; a singular, rent-seeking source of truth that sits in the middle of all interactions just like proprietary platforms do.

Blockchains are not truly peer-to-peer, despite their claims. The fact that all participants depend upon the same master set of data — along with its bespoke technology, economics, security, and governance — is why there are so many blockchains competing for adoption, and with near-zero interoperability between them.

The problem isn’t with blockchain technology, it’s the blockchain model itself.

Platforms vs. Protocols

All blockchains are platforms, and are not protocols. This is not a subjective distinction. There is a simple way to determine whether something is a platform or a protocol: all platforms have one telltale sign in common: a rent-seeking entity or network that sits in the middle of all interactions. A protocol, such as TCP/IP, HTML, SMTP, 802.11, etc., is a language used entirely peer-to-peer, with no rent-seeking entity or network between peers and no master dataset generated or depended upon.

Being a platform doesn’t guarantee failure, however, as we see with Komgo. The world is full of successful platforms and Komgo is still around because it out-competed other trade blockchains for adoption, but then ditched their blockchain when they realized that blockchain was slowing them down and a traditional centralized database was better. The fact that they could make such a smooth transition away from blockchain illustrates my point: whether using a blockchain or a database under the hood, the platform-style relationship between Komgo and its users was identical, with Komgo sitting in the middle of every interaction and all participating counterparties paying rent. That’s a fine way to make money but an impossible way to make global trade digitally interoperable, because for twenty years we’ve learned and re-learned the same lesson, in industry after industry: it’s nearly impossible to get all necessary counterparties to use the same platform, no matter how powerful its partners are and no matter how much money it has.

Like all industries that struggle with digital interoperability, trade needs protocols, not platforms. For two decades now the trade industry has endeavored to create an electronic bill of lading to replace the paper one, without success. In my view this lack of success has been because competitive platforms lack the incentive to directly interoperate, seeing it as a competitive sacrifice, and they have no protocol available as an indirect alternative. Traditional proprietary platforms started this EUM, winner-takes-all pattern, then the blockchain consortia came along and promised something different: an exciting new way to break through the paper-to-digital logjam. But these trade blockchains acted no differently and did no better than the traditional platforms, ultimately leaving the industry still dependent on paper and paper facsimiles (PDFs).

In contrast, an open, non-proprietary protocol designed for trade could bring a new alternative: a common language that even competitive trade counterparties can speak to each other that brings interoperability without sacrificing competitive standing.

A “Universal Translator” For Trade

This brings us to the image and its caption at the beginning of this article: The trade industry needs a “universal translator”.

In Star Trek, when wearing a universal translator you can speak to any creature in the galaxy using your preferred language and they’ll hear you in theirs, and when they speak their preferred language you hear it in yours. Now imagine if all creatures in the universe were required to use a single, proprietary rent-seeking platform, network, or dataset in the middle to accomplish this, instead of adopting a common protocol/language… it would never work. The only way to solve such a galactic problem: a common, open protocol that’s adopted everywhere that enables devices to use a common, agreed-upon language with each other while speaking bespoke languages to each wearer.

That’s what protocols are: languages. In the case of Star Trek’s universal translator, the protocol is the language spoken between the devices; it must be free, open, and separate from the language spoken between the device and its wearer. Protocols don’t generate shared datasets like blockchains do and they don’t have rent-seeking actors in the middle. Protocols are like SMTP that enabled email senders and recipients to each build or buy their own favored email clients, rather than having to use a common platform like AOL, Prodigy, or CompuServe to exchange messages. Languages can be spoken peer-to-peer between any two parties without a third-party between them, so they are inherently decentralized.

Digital Containers & Payloads

It’s tempting to think of a data schema as a protocol, and in a sense it can be: for two parties to have semantic interoperability — to understand each other and be able to ingest and utilize received data from each other — there must be some degree of agreement about data schema. But the same is true with paper, which depends on common data elements if not exact schema. For digital environments, standardized schema for trade is being worked out by the ICC DSI and others, but… even ubiquitously agreed-up on data schema does not address the need for securely exchanging the schematized data; that part is still missing, it’s like agreeing on the format of physical mail without considering the need for a mail carrier. In the digital realm that ‘carrier’ must have verifiable security from origination to delivery. Without security, common schema is pointless; without common schema, security is useless.

To be crystal clear on this critical point: even when a data schema has been widely agreed to, without a protocol — a common language — for exchanging data securely, there’s no way to verify the source of the data and that it hasn’t been tampered with, revoked, or expired. This security element was supposed to be supplied by blockchains — which are quite secure, though they don’t address critical identity or key management issues — but their platform-like attributes prevented broad adoption for the reasons listed above, leaving trade back where it started: paper and paper-like facsimiles (PDFs).

What’s needed are protocols for secure data exchange that act like digital ships and containers, with schematized data as the payload. Imagine a trading world where all data is received in the form of a standardized schema that arrived in a secure data container that is instantly verifiable in three critical ways:

The identity of the originators of the data, and of all who contributed to or endorsed it; It hasn’t been tampered with, revoked, or expired since origination; It doesn’t require subscribing to any particular blockchain, network, or proprietary platform.

Having secure, globally verifiable data containers with schematized data as the payload is when the global trading game really changes. And it requires protocols, not platforms, to eliminate competitive sacrifices, maximize adoption, and have limitless scale.

In Conclusion

This is something we are working on at Digital Trust Ventures: free, open protocols for trade that work like digital ships and containers and result in a “universal translator”-like effect for counterparties, enabling each to speak and hear their chosen “language”. And like physical containers, these digital containers will have no opinion about the payload inside; the job of these protocols is to ensure that exchanged data is verifiably not tampered with, revoked, or expired, without need of platforms, networks, other entities in the middle.

More to come, later this year. Stay tuned.

Saturday, 30. March 2024

Wrench in the Gears

Collective Seekers and Degrees of Freedom

For about a week every spring the sugar ants and I have a stand off. With the house going on the market next week, my heart sank to see a few of them wandering around on the counter. While a nuisance, I can see we are cut from the same cloth – foragers, lone seekers. [...]

For about a week every spring the sugar ants and I have a stand off. With the house going on the market next week, my heart sank to see a few of them wandering around on the counter. While a nuisance, I can see we are cut from the same cloth – foragers, lone seekers. I pulled out the Terro ant bait and laid a couple of packs with gooey gel out on the counter where they would be sure to find it. This season their preferred gateway seems to be around the back of the stove. Most years it’s around the sink.

I finished putting together this video around 2am. The house was chilly. As I trundled in my navy polar fleece robe to put my insulated tea mug in the dishwasher, I saw a thick line of teeny, tiny black bodies huddled around the bait. Stigmeric pheromones were at work. A few scout ants had blazed the trail, and now a miniature highway of busy bodies were doing what they do best. Thanks to Stephers’ work I’ve been talking about pheromones and swarm behavior for several years. Still, it’s fascinating to see it in action. Each ant an agent, participating in a complex dance to achieve a group goal in the most efficient way.

The ants don’t realize in their bustling collectivity, they are laying the groundwork for the end of their colony, one of many hidden civilizations that exist among the Lenten Rose and Hostas and under the brick pavers in the back garden. Their behavior is innate and reinforced by the culture into which they were born. The pest control industry knows these ants well and uses their instinct against them. I keep saying we don’t understand the weapons. Heck, I don’t think we even understand the nature of the engagement.

I am a humanities person who chafes at the imposition of the primacy of STEM that has ramped up over the past few decades. I love stories. I love culture. I love the amazing ways humans mold themselves to their environment and one another. When I was in middle school I had a big map on the wall of my bedroom with marks on the many places I wanted to see around the world. I imagined I was going to be an international correspondent. I chuckle now looking back. I have a few cross-border trips under my belt, and with the way passports are going, I expect to spend this final phase of my life not far from a new home. I’ll trade busy airports for quiet mountain streams and a garden and a pile of books. I’ll travel in my mind, perhaps get acquainted with my astral body equipment… It actually seems like a pretty good outcome all told.

In this video I am struggling with ideas of free will, conformity to groups, civilization, faith, and complex systems. It’s certainly not an original question, how many degrees of freedom do we actually have? Ants have an instinct to seek out sources of food for the colony, and yes there are times when after a very rainy week they will be washed out of their regular stomping grounds and end up in my kitchen. And, I am equipped to use their instinct against them. We see it playing out time and time again on the feed. People seeking camaraderie among the organized bits, but how often are they excitedly bringing back poison?

I spent much of my adult life researching civilization as a historic preservationist, a cultural geographer. Was I studying an insidious domestication of the spirit? Or an intricate unfolding of our divine purpose within a collective framework crystallized over countless soul journeys? I’m going to have to get comfortable with the not knowing; at least until I get to the other side.

 

 

 

 

 

 

Friday, 29. March 2024

Aaron Parecki

OAuth: "grant" vs "flow" vs "grant type"

Is it called an OAuth "grant" or a "flow"? What about "grant type"?
Is it called an OAuth "grant" or a "flow"? What about "grant type"?

These are common questions when writing documentation for OAuth-related things. While these terms are all used in RFC 6749 and many extensions, the differences between the terminology is never actually explained.

I wanted to finally write down a definition of the terms, along with examples of when each is appropriate.

flow - use "flow" when referring to the end-to-end process, for example: "the client initiates the flow by..." "the flow ends with the successful issuance of an access token" This can also be combined with the type of flow, for example: "The Authorization Code flow starts by..." grant - use "grant" when referring to the specific POST request to the token endpoint, for example: "The authorization code grant includes the PKCE code verifier..." "The refresh token grant can be used with or without client authentication..." "Grant" also refers to the abstract concept of the user having granted authorization, which is expressed as the authorization code, or implicitly with the client credentials grant. This is a bit of an academic definition of the term, and is used much less frequently in normal conversation around OAuth. grant type - use "grant type" when referring to the definition of the flow in the spec itself, for example: "there are several drawbacks to the Implicit grant type" "the Authorization Code grant type enables the use of..."

Let me know if you have any suggestions for clarifying any of this, or any other helpful examples to add! I'm planning on adding this summary to OAuth 2.1 so that we have a formal reference for it in the future!

Thursday, 28. March 2024

Just a Theory

PGXN v2: Go or Rust?

What programming language(s) should we use to build new and revamp existing PGXN services and tools: Rust or Go? Vote your preference!

TL;DR: I’d like Postgres community input on a decision: Should we build PGXN v2 services and tools in Go or Rust? Context for the question and some weighing of options constitutes the rest of this post, but to skip to the end, 🗳️ Vote your choice! Poll closes April 12 at the end of the day (midnight) New York time.

The PGXN v2 project now under way requires developing or updating several services and tools, including:

A root registry for source distribution A package registry for binary distribution A command line client for developing, building, and distributing extension packages An interactions service for notifications and stats aggregation

And more. Historically, the PGXN tools were written in Perl, which was an ideal choice for me back in 2011, and quite familiar to many members of the core team, but also deeply foreign to most everyone else. Furthermore, its dynamic nature and need for dozens of dependencies in most apps makes installation and packaging a challenge, to say the least.1

In the last ten years I’ve become quite proficient in Go. I appreciate its near system-level performance, memory safety, robust standard library, concurrency design, and short learning curve — especially for web services. But perhaps most eye-opening to me, as a long-time user of dynamic languages, is that, like C, Go compiles an application into a single static binary. Not only that, but Go provides cross compilation natively. This makes distribution incredibly simple.

Distribution Digression

Compare, for example, the Dockerfile for Sqitch, the database change management system I wrote and maintain in Perl. It’s…a lot. Sure there are a ton of system dependencies, but what’s invisible in this file is the weeks of work that went into Module::Build::Sqitch, which performs a bunch of tricks to build the Sqitch “app” as a single directory with all of its Perl dependencies. Don’t get me wrong, the work was worth it for Sqitch, and powers the Homebrew formula, as well. But even there, I’ve not been able to get Sqitch into the Homebrew core because every single dependency requires a checksum, and I’ve not had the time (or energy) to figure out how to generate them.

Contrast with this Dockerfile for a Go service compiled into a binary named thinko:

FROM gcr.io/distroless/base-debian12:latest # TARGETOS and TARGETARCH: https://docs.docker.com/build/guide/multi-platform/ ARG TARGETOS ARG TARGETARCH COPY "_build/${TARGETOS}-${TARGETARCH}/thinko" /thinko/bin/ USER nonroot:nonroot ENTRYPOINT [ "/thinko/bin/thinko" ]

That’s the whole thing. There are no dependencies at all, aside from a few included in distroless image. And where does that image come from? This is the relevant from the project Makefile:

.PHONY: all # Build all binaries all: local linux darwin windows freebsd linux: thinko-linux darwin: thinko-darwin windows: thinko-windows freebsd: thinko-freebsd thinko-linux: _build/linux-amd64/thinko _build/linux-arm64/thinko thinko-darwin: _build/darwin-amd64/thinko _build/darwin-arm64/thinko thinko-windows: _build/windows-amd64/thinko _build/windows-arm64/thinko thinko-freebsd: _build/freebsd-amd64/thinko _build/freebsd-arm64/thinko # Build Thinko for specific platform _build/%/thinko: cmd/thinko GOOS=$(word 1,$(subst -, ,$*)) GOARCH=$(word 2,$(subst -, ,$*)) $(GO) build -o $@ ./$<

This configuration allows me to build thinko for every OS and architecture at once:

$ make thinko go build -o _build/local/thinko ./cmd/thinko GOOS=linux GOARCH=amd64 go build -o _build/linux-amd64/thinko ./cmd/thinko GOOS=linux GOARCH=arm64 go build -o _build/linux-arm64/thinko ./cmd/thinko GOOS=darwin GOARCH=amd64 go build -o _build/darwin-amd64/thinko ./cmd/thinko GOOS=darwin GOARCH=arm64 go build -o _build/darwin-arm64/thinko ./cmd/thinko GOOS=windows GOARCH=amd64 go build -o _build/windows-amd64/thinko ./cmd/thinko GOOS=windows GOARCH=arm64 go build -o _build/windows-arm64/thinko ./cmd/thinko GOOS=freebsd GOARCH=amd64 go build -o _build/freebsd-amd64/thinko ./cmd/thinko GOOS=freebsd GOARCH=arm64 go build -o _build/freebsd-arm64/thinko ./cmd/thinko

Those first two commands build thinko for Linux on amd64 and arm64, right where the Dockerfile expects them. Building then is easy; a separate make target runs the equivalent of:

$ docker buildx build --platform linux/arm64 -f dist/Dockerfile . $ docker buildx build --platform linux/amd64 -f dist/Dockerfile .

The --platform flag sets the TARGETOS and TARGETARCH arguments in the Dockerfile, and because the directories into which each binary were compiled have these same terms, the binary compiled for the right OS and architecture can be copied right in.

And that’s it, it’s ready to ship! No mucking with dependencies, tweaking system issues, removing unneeded stuff from the image. It’s just the bare minimum.

This pattern works not just for Docker images, of course. See, for example, how [Hugo], the Go blog generator, releases tarballs for a bunch of OSes and architectures, each containing nothing more than a README.md, LICENSE.md, and the hugo binary itself. This pattern allows both the Hugo Homebrew formula and its Dockerfile to be incredibly simple.

Back to PGXN

I very much want these advantages for the next generation of PGXN tools. Not only the services, but also the command-line client, which would become very easy to distribute to a wide variety of platforms with minimal effort.

But there are other variables to weigh in the choice of language for the PGXN servers and tools, including:

Familiarity to other developers: Ideally someone can quickly contribute to a project because they’re familiar with the language, or there’s a short learning curve.

Safety from common issues and vulnerabilities such as buffer overflows, and dangling pointers.

Tooling for robust and integrated development, including dependency management, testing, distribution, and of course cross-compilation.

Decisions, Decisions

In my experience, there are two language that fulfill these requirements very well:

🐿️ Go2 🦀 Rust

Which should we use? Some relevant notes:

I expect to do the bulk of the initial development on PGXN v2, as the only person currently dedicated full time to the project, and I’m most familiar with Go — indeed I enjoy writing web services and CLIs in Go!. I’d therefore be able go ship Go tools more quickly.

But I’ve played around with Rust a number of times over the years, and very much would like to learn more. Its syntax and long feature list steepen the learning curve, but given my background in Perl — another language with unique syntax and context-sensitive features — I’m certain I could become incredibly proficient in Rust after a few months.

My employer, Tembo, is a Rust shop, and we’ll likely borrow heavily from the trunk project, especially for the CLI and binary registry. It would also be easier for my coworkers to contribute.

pgrx, the tooling to build Postgres extensions in Rust, has taken the community by storm, rapidly building familiarity with the language among extensions developers. Perhaps some of those developers would also be willing to turn their expertise to PGXN Rust contributions, as well. It’s likely some features could be borrowed, as well.

Sadly, the plgo project appears to have stalled, so has not built up the same community momentum.

This leaves me torn! But it’s time to start coding, so it’s also time to make some decisions. Should PGXN v2 services and tool be:

🐿️ Written in Go 🦀 Written in Rust 🐿️ + 🦀 Some of each (e.g., Go for web services and Rust for CLIs)

What do you think? If you were to contribute to PGXN, what language would you like to work in? Do you think one language or the other would be more compatible with community direction or core development?3

Got an opinion? 🗳️ Vote! Poll closes April 12 at the end of the day (midnight) New York time.

And if those choices aren’t enough for you, please come yell at me on Mastodon, or via the #extensions channel on the Postgres Slack. Thanks!

Ever wonder why PGXN isn’t hosted by community servers? It’s because I screwed up the installation trying to balance all the dependencies without wiping out Perl modules the systems depend on. 🤦🏻‍♂️ ↩︎

Pity there’s no gopher emoji yet. ↩︎

I can imagine a future where an extension CLI was included in core. ↩︎

More about… Postgres PGXN Go Rust Perl

Werdmüller on Medium

An intrusive thought about Trump 2024

What if the worst happens? Continue reading on Medium »

What if the worst happens?

Continue reading on Medium »


The Pragmatic Engineer

The Pulse #87: Stripe’s investment in reliability, by the numbers

The Fintech giant spends more on running test suites than Agoda does for all its infra. Plus, why taking out a loan for equity can backfire, and why did Donald Trump’s social media company use a SPAC?

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. The EU’s AI regulation; Reddit’s successful IPO; No raises at Amazon, and more.

Stripe’s investment in reliability, by the numbers. Stripe uses more hardware just to run their test suite (500,000 CPU cores) than travel booking platform Agoda’s complete infrastructure footprint. And more interesting details straight from the Fintech giant.

SPACs and Truth Social. 2020-2021 saw a boom in tech companies sidestepping the IPO process to get listed on public markets by merging with a “shell company.” Donald Trump’s company followed this exact same path: and now the company is worth almost as much as Reddit, despite having less than 1% of Reddit’s revenue.

Loans to exercise Bolt options early was a terrible idea. The cofounder and former CEO of the one-click-checkout offered loans to employees for the early exercising of stock options in the company. He touted it as the “most employee-friendly options program possible,” and convinced more than half of staff to sign up. He ignored warnings about the risks of this from peers – who were right.

1. Industry pulse

Read more


@_Nat Zone

vLEI関連企画である KERI、ACDC、CESRの3つ共に4月20日までのパブリックレビューにかかっています

Trust Over IP Foundation (ToIP) で、KERI, ACDC, CESRの3つの仕様がパブリックレビューにかかっています。GLEIFが主導している vLEI (Verifiable Legal Entity Identifier) 関連の規格です。締め…

Trust Over IP Foundation (ToIP) で、KERI, ACDC, CESRの3つの仕様がパブリックレビューにかかっています。GLEIFが主導している vLEI (Verifiable Legal Entity Identifier) 関連の規格です。締め切りは4月20日です。

AUTHENTIC CHAINED DATA CONTAINERS (ACDC) TASK FORCE ANNOUNCES PUBLIC REVIEW

Key Event Receipt Infrastructure (KERI) specification Authentic Chained Data Containers specification (ACDC) Composable Event Streaming Representation specification (CESR)

ToIP曰く

この仕様一式は、識別子、「クレデンシャル」(脚注参照)、および認証の真に分散化された、真正かつ検証可能なエコシステムを構築するための青写真を提供します。

この仕様には、一連のユニークで革新的な機能が記述されています:

鍵の事前ローテーションによる、真に無制限な期間識別子の実現; 暗号化ルート・オブ・トラスト; 連鎖した「クレデンシャル」[脚注参照]と、完全に検証可能な所有者証明および作者証明; 妥協のないスケーラビリティのためにルックアヘッド・ストリーミングをサポートする独自の特性を備えた、テキストとバイナリの両方の表現に等しく最適化されたシリアライゼーション・フォーマット。

この仕様スイートには、Out-Of-Band Introduction、自己アドレス識別子、および組織アイデンティティの包括的なソリューションを提供するために必要な署名付きコンテナの画期的な「パス署名」アプローチなどのサブ仕様が追加されています。

(出所)ToIP

GLEIFの主導しているvLEIではこの仕様が使われているということで、影響が大きくなることが予想される仕様です 。一時はIETFで規格化するという話でしたが、諸般の事情があり、ToIPで規格化することになったようです。ToIPにとっては初めての規格化になるようです。

残念ながらわたしは読む時間はなさそうですが、暗号関係者の方々にはぜひ読んで頂いてコメントをしていただければと思います。

Wednesday, 27. March 2024

Phil Windleys Technometria

Web 2.0 is Collapsing Under its Own Weight

The overhead of performing even simple tasks online is getting larger and larger. I question the security of almost all these supposedly "secure" messaging systems. And I'm tired of the 'Utopia of Rules' mindset pervasive in every organization. It's exhausting how they expect customers to constantly adapt to their needs.

The overhead of performing even simple tasks online is getting larger and larger. I question the security of almost all these supposedly "secure" messaging systems. And I'm tired of the 'Utopia of Rules' mindset pervasive in every organization. It's exhausting how they expect customers to constantly adapt to their needs.

I don't know if you recall the game Kerplunk. It's a classic children's game that has been around for decades. I remember playing it with my sister. The basic setup involves a transparent plastic tube, a number of sticks, and marbles. The sticks are threaded through the tube to form a web or nest at the bottom on which the marbles rest. We'd take turns removing a stick at a time, trying not to let any marbles fall through the web and out of the tube. At some point, the remaining sticks can't hold the marbles and everything falls down.

Share

The modern web reminds me more and more of a big Kerplunk game and I think the marbles are about to fall. What started out as an easier way to do things like shop, bank, and get health care information has become increasingly complex over time. More and more of the email I receive seems to be simply directing me to log into some bespoke system to retrieve a message or engage in some workflow. And even with a password manager, the act of logging in is often a chore with different user interfaces, custom MFA requirements, and weird rules for passwords. Once you're on the system, session time-outs induce their own form of anxiety since stepping away for a few minutes to attend to something else might require going through the whole Kafkaesque process all over again. The modern web has turned into a dystopian theater of the absurd where even reading a simple appointment reminder from your doctor requires several minutes of stress-inducing interaction with baroque systems and processes.

And it's not just doctors, of course, banks, government agencies, hospitals, ecommerce sites, and customer service systems all adopt these special purpose messaging systems. If you ask these organizations why they use bespoke messaging systems, they'll list things like "timely and improved communication," "convenience," and "privacy and security." But the real reason is that it's more convenient for them because these systems are integrated with their backends and make their processes more manageable. There's certainly nothing about them that's more convenient, timely, or better than email for their customers1.

I also question the privacy and security premise. Email can be insecure. And your email provider can see the contents of your emails. But the messaging system run by your doctor or bank is likely less secure than the email systems run by Apple, Google, and the others. And achieving privacy by making everything incompatible so that you have to use a different system for each correspondent is like chopping off your finger to prevent hangnails.

How did we get here? Bureaucracy. Not just government bureaucracy, but bureaucracy of all kinds. In Utopia of Rules2, David Graeber talks about how power imbalances force the less powerful group to perform what he calls interpretive labor, the work of understanding and implementing what's better or more convenient for the more powerful partner. People are not equal participants in online interactions. We don't have the tools to be fully embodied online3. Because of this we are forced to play by the rules organizations online who are digitally embodied with servers, identity systems, customer management systems, and so on. And part of that is being forced to use their inconvenient and anemic messaging systems.

What's the answer? People need tools. I think digital wallets (a bad name for an important tool), autonomic (peer) identifiers with strong cryptography, and verifiable credentials are a huge step forward. These tools provide the means for people to be peers online rather that mere ghosts in someone else's machine. That's why I insist on using the term self-sovereign rather than decentralized to describe these systems. Cogito Ergo Sum.

Subscribe now

Notes

For a deeper dive into why one-off messaging systems are never as good as email, see Rich Sharing and Personal Channels. Email and other useful messaging systems exhibit a property called rich sharing that makes them much more robust that the simple idea of "sharing a message" would bring to mind.

If you're interested in power imbalances and how they come about, I can't recommend Graeber's book highly enough. He had such a keen understanding of this problem and wrote about it in a way that's both informative and entertaining.

I talk about this in more detail in Chapter 17 of Learning Digital Identity when I discuss authentic digital relationships.

Photo Credit: Playing Kerplunk from DALL-E (public domain) Prompt: Draw a picture of a boy and girl playing kerplunk that's 1200x500 pixels


@_Nat Zone

Googleマイアドセンターを見て自分がどう認識されているか見てみよう〜ちなみにわたしは18〜34歳独身女性になってたw

Googleマイアドセンターは、Googleがユーザーのオンライン行動から収集した情報を基に、ユーザーのプロフィールを推定し、それを本人に開示するためのプラットフォームです。主な特徴は以下の通りです。 マイアドセンターは、Googleがユーザーをどのように理解しているかを本人に開…

Googleマイアドセンターは、Googleがユーザーのオンライン行動から収集した情報を基に、ユーザーのプロフィールを推定し、それを本人に開示するためのプラットフォームです。主な特徴は以下の通りです。

興味関心カテゴリ: Googleがユーザーのウェブ閲覧履歴などから推定した、ユーザーの興味関心のあるカテゴリを表示します。 人口統計情報: 年齢層や性別など、Googleが推定したユーザーの人口統計情報を確認できます。 広告設定: ユーザーは自分のプロフィール情報に基づいたパーソナライズド広告の表示を管理・オプトアウトできます。 情報の透明性: Googleがユーザーをどのように認識しているかを本人が確認できるため、情報収集の透明性が高まります。 プライバシー管理: ユーザーはマイアドセンターから、Googleによる情報収集・利用に関するプライバシー設定を変更できます。

マイアドセンターは、Googleがユーザーをどのように理解しているかを本人に開示することで、情報の透明性を高め、ユーザーのプライバシー管理を促進するためのツールです。ユーザーは自分のプロフィールを確認し、必要に応じて設定を変更することで、パーソナライズド広告などのサービスを自分の好みに合わせて管理できます。

というわけで、早速行って、Googleが膨大な情報をつかってわたしをどのように認識しているかを見てみましょう!みなさんもご一緒に

(図表1)Googleはわたしを女性、18〜34歳、独身 etc とみなしているようだ(出所)Google

どうやら、Googleさんは私のことを18〜34歳女性、独身、大企業所属と認識しているようですw。このくらいの女性って、プライバシーを守るために画面の右上に出ているような老人男性のアバターを使う傾向があるのでしょうか…。なお、正解はアラ還暦既婚男性中小企業所属です。1

みなさんはどう出ましたか?面白いのが出たら、ぜひコメントに書き込んでください。

(図表2)推測に使われているアクティビティ(出所)Google 追伸

ちなみに、Googleがユーザーの興味関心カテゴリを推定する際には、主に以下の情報源が使われているとのことです。

検索履歴: ユーザーがGoogleで検索したキーワードやフレーズから、関心のあるトピックを推定します。 ウェブ閲覧履歴: Googleアナリティクスを導入しているサイトを訪問した際の閲覧履歴から、ユーザーの興味関心を推測します。 YouTubeの視聴履歴: ユーザーがYouTubeで視聴した動画の内容から、関心のあるカテゴリを割り出します。 Googleアカウントの情報: ユーザーがGoogleアカウントに登録した情報(年齢、性別、職業など)も、興味関心の推定に活用されます。 Androidデバイスの使用履歴: Androidスマートフォンやタブレットを使用している場合、インストールしているアプリやアプリの利用状況から興味関心を推測します。 位置情報: ユーザーの位置情報から、よく訪れる場所やその場所に関連するカテゴリを推定します。

これらの情報を機械学習アルゴリズムで分析することで、Googleはユーザーの興味関心カテゴリを自動的に推定しています。ただし、ユーザーがプライバシー設定でこれらの情報の収集を制限している場合は、推定の精度が下がる可能性があります。

また、興味関心カテゴリは定期的に更新されるため、ユーザーの関心の変化に合わせてカテゴリも変化していきます。

しかし、行動からのプロファイリングってうまくいかないものなんですね。わたしの場合はまぁ良いですが、こうした間違ったプロファイリングはプライバシー上の被害を産むこともありますから、きちんとチェックしたいものです。 

Tuesday, 26. March 2024

The Pragmatic Engineer

Thriving as a Founding Engineer: Lessons from the Trenches

Being a founding engineer at an early-stage startup is a vastly different, broader role than many people think. Lessons from “serial” early-stage and founding engineer Apurva Chitnis.

👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

Q: “I’m a software engineer joining an early-stage startup as one of the first engineers. What’s my job, how should I build the product, and how can I help my startup succeed?”

It’s common for early-stage startups to hire “founding engineers,” who are so called because they’re among the very first recruits. A couple of distinguishing features of the role are generous equity packages (between 0.1 and 1% of company stock), and the fact that expectations are frequently broader than typical developer roles. Apurva Chitnis is the ideal person to tell us what makes being a founding engineer unique, and to answer questions in the kind of detail that only comes from being on the ground, doing it. And Apurva certainly has done it, almost on repeat:

An early engineering hire at Improbable, a metaverse virtual worlds startup. Among the first 50 hires; stayed for 4 years, until the company reached 650 headcount.

Founded Sidetrack, assuming the founding engineer and CTO roles. Sidetrack was an early-stage company focused on reducing cloud costs.

Joined as a founding engineer at Koodos (and later became their CTO), building an app called Shelf. Shelf connects users with the media which they and their friends love (check out Apurva’s shelf here). His team currently consists of three founding engineers, and he remains hands-on.

In this issue, Apurva covers:

The goal of a founding engineer. Find product-market-fit (PMF) before the money runs out. It’s a race against time.

Approaches for success. Choose engineering approaches based on how much uncertainty exists, balance scope/quality/timeline, and more.

Tech debt. Startups can use tech debt to achieve a PMF quicker, in ways which large companies don’t. Tech debt can be an advantage early on, so use it!

Talking to customers and end-users. Spend some time away from the code, interacting with customers and end-users directly.

Process problems are features at startups, not bugs! Expect plenty of things to break, and fixing them to not be a priority. This may be purposeful, and you may encounter extreme cases, like a successful startup with no version control.

Why join an early-stage startup. Impact, learning, seeing all parts of the business, and more.

Why avoid early-stage startups. Long hours, compensation, and uncertainty are challenges.

Finally, in the takeaways we’ll go over how Uber was almost shut down a few months after hiring its first mobile engineer, Jordan Bonnet. He and I worked together for 2 years, including during the ride-hailing app’s epic “YOLO” rewrite. Jordan happens to be available for hire, so if you're looking to build a world-class mobile experience, then reach out to him!

Relatedly, also check out From the trenches: working at a startup vs in a Big Tech. To keep up with Apurva’s writing, subscribe to his newsletter

With that, it’s over to Apurva.

1. The goal of a founding engineer

Before discussing how to work at a startup as a founding engineer, let’s first define what a startup is. So, a short story.

During my career, I’ve attended hundreds of all-hands meetings, when the whole team comes together for important company events, like introducing new recruits, strategy changes, and to share updates and successes. Among them all, one at Improbable has stuck with me.

The chief of staff sketched out our financials in the all-hands; plotting costs, revenue, and predicted revenue growth. He combined them to calculate how our bank balance would change over the next year. This led to a painful realization: our startup was going to run out of money and go bankrupt – unless things drastically changed.

A diagram like Improbable’s chief of staff drew up, showing profit was only possible if revenue shot up

This was a bold call to action for us all, to focus all our energy on projects that increased product usage and therefore revenue. Everything else had to be ignored!

Here is a different way to look at the same idea, from the point of view of how much investment and eventually profitable company needs to “burn” at first:

The race to product-market-fit

All startups begin as “default dead,” meaning their starting trajectory inevitably leads to bankruptcy. According to a 2022 research by Startup Genome, 90% of startups fail and shut down. This is a truth you need to acknowledge and lean into: that if things don’t change, the business will fail.

However, there is a proven path to avoid this outcome:

Build something people want to use; aka finding product-market-fit (PMF)

Grow user base, also called finding product-channel fit

Monetize usage and create a profitable business by following the steps above

A software engineer’s responsibilities evolve with the company; from founding, to an exit via an IPO or acquisition, hopefully.

A founding engineer’s goal: find product-market fit!

As an engineer, your goal is to help your team achieve its goals. As a founding engineer, your team is the company, so the first priority is to build something people want. This is the product-market fit. Without PMF, companies struggle to grow, retain users, raise funding, and become profitable. If no PMF is identified, startups end badly.

Your goal is to find this all-important PMF, and don’t let anyone tell you otherwise. This constraint affects all aspects of a founding engineer’s work:

Technologies used. Be pragmatic. This generally means using standard, ‘boring’ technologies you're familiar and proficient with, instead of shiny, unproven technologies.

Managing technical risks. Do this while balancing short and long-term priorities. Using your knowledge of the product and tech to pragmatically balance risk, short term features, and longer term enablers.

Getting user feedback. Talk to users early and often. Go deep to understand their motivations, behaviors, and what they're trying to achieve.

Attitude towards bugs, technical debt, and product quality. That phrase again: be pragmatic. Use technical debt tactically to accelerate development, and use opportunities to pay it off as you build.

The software development life cycle. Startups have many superpowers to use to their advantage. The ability to iterate fast is one, so use it! The development life cycle is generally much shorter and simpler than at larger companies – see more in Quality Assurance Across the Tech Industry.

Collaborating with others. Working with design, user research, sales, and other teams. These are your partners; you're collectively trying to solve the problem of finding the PMF.

Relationship with end users and customers. See this as a collaboration in jointly building a delightful product.

Using data for product decisions. At an early-stage company with few users, qualitative user research (talking to users to gauge their needs and motivations,) is essential because it's the only way to understand them deeply enough to build what they want. Conversely, quantitative research like data and statistics about aggregate or longitudinal user behaviors, is valuable later for measuring retention and growth.

If you’re working in a leadership role, the need to find PMF also affects how you build and organize your team:

How you manage

How you prioritize features and set product strategy

How you set technical strategy

Who you hire

It’s necessary to think beyond writing code, building features, and solving user needs. A founding engineer’s aim is to build out a company-wide system which enables the business to find its PMF.

Remember, you’re a founding employee. Success means achieving PMF for the company, and then taking it to the next level. Make this your focus when taking decisions!

The race to product-market fit. Without finding this: the company would eventually run out of money! 2. Approaches for success

As a founding engineer, there are countless decisions to make which impact the product and tech stack, such as:

How scrappy or polished is the minimum viable product (MVP)

Whether to outsource functionality to a vendor, or build and maintain it in house

Whether to build fast and dirty, or with scale and future requirements in mind

So, how do you make these choices, and how do decisions change as a product evolves? A founding engineer must make them pragmatically, using knowledge of the business, product, and users, for guidance.

Business uncertainty is a big driver in engineering decisions

Begin by taking these factors into account during design decisions:

Users’ needs and product’s value to them. How well do you understand your users and their needs, and how confident are you that the product meets them? If your answer is “not very well” – which is likely in the early days – then it’s best to avoid technical decisions which are expensive or limit flexibility.

Product roadmap and vision. Use your knowledge of the roadmap and the uncertainty it contains, to guide how you build enablers which support upcoming features.

Your team’s expertise. The more knowledge and expertise your team has in a technology, the more confidently you can make bold, complex design choices using it. Stick to what you know well!

Tech debt. Is tech debt a continuous source of bugs which limits engineering velocity? If so, building a new feature might be an opportune time to pay down tech debt (we cover this in the next section.)

You’ll notice most of these constraints are external to engineering; they’re about the business itself!

Prioritize flexibility during uncertainty. The younger a startup is, the more “cloudy” are the user, product, value proposition, channels, and business model. Early decisions are based on high-level vision and intuition about what should work. As you test ideas in the real world, you’ll need to iterate fast. So, as an engineer, prioritize flexibility in your decisions so the product can change rapidly and evolve.

As you iterate and the product (hopefully) moves towards PMF – not usually a straightforward process! – you develop more understanding of its requirements, meaning you can justify investing time in building an architecture that targets users’ needs and the product’s aims.

Evolve your technical decision making as the product evolves

Your decisions must evolve with the product. The right decision on day 1 is almost certainly not the right one on day 365. This was our experience of building our app, Shelf, over the past year. As a reminder, Shelf is an app that connects users with the media they and friends love.

Start simple. We started with a basic data model. New songs a user listened to, TV shows and movies they watched, and books they read, were stored as new rows in an append-only table called activity:

The initial data model was an SQL table with three columns and an append-only structure

This spartan data model allowed us to build a highly-functional minimum viable product (MVP.) It was even flexible enough for us to implement new features, including ones we didn’t think of building our MVP, such as weekly recap, which allows users to see a summary of the media they had consumed in the past week; and history, which allows users to see other songs, books, tv shows and movies that had previously been on their Shelf.

Notice the cracks. However, as our product became more functional and complex, cracks started to show:

Implementing new features took ever longer, dragging down engineering velocity.

Fetching data resulted in complex queries to the database, which were increasingly sluggish and challenging to write. Getting the top songs, artists, and so on for weekly recap often required aggregating 1000s of rows of data and performing complex joins and logic. The user experience became noticeably laggy.

Bugs were hard or even impossible to solve, resulting in poor user experience. For example, users wanted to see how the book that they’re currently reading changed over time. Our data model simply didn’t store this data in an easy-to-retrieve way, and so we couldn’t display it to them!

Migrate when it’s really necessary. Too many of these issues piled up for us, so we decided to migrate to a data model better suited to the product’s and the engineering team’s needs. By this point, we had a better understanding of our app, its constraints and opinions, and the abstractions we wanted to work with as engineers. Given this context, we were able to design a more effective data model.

There were two key insights. The first was to store history as a first-class concept in the data model. In particular, we stored how a user’s top song, top album, recent TV show or book changed over time:

The data model was then extended with the history table

Understanding how a user’s top artist changed became as simple as digging into two rows of data in this table. Previously, we’d had to aggregate thousands of rows in the activity table! 

The second key insight was to store progress, such as play counts, pages read, and current episodes, as a first-class object in our data model:

Introducing a table to store progress

Understanding a user’s progress through a TV show, movie, book, etc, was much easier, as we simply needed to look up the relevant row!

Building this new data model required much greater investment in time to design and implement it, compared to data model v1. Our work paid off, though. The new data model allowed us to build new features faster and at higher quality.

When we started building Shelf, many things were uncertain. We didn’t really know which features would resonate with users, and our vision of the app was blurry. But after a year, we had gained more certainty about the product’s direction and we were able to make more “expensive” decisions with greater confidence, such as investing in a better data model.

Your job is more than building features

Building features is one way to help a business achieve its goals, but it’s not the only thing in your toolbox. Founding engineers use tools which have nothing to do with code, at all!

At Sidetrack – a company I cofounded to reduce cloud hosting costs – my most productive period was when I wrote very little code. This was early in the company's lifecycle, when we believed we could reduce cloud costs by shutting down virtual machines that had low utilization, thereby saving their expense. But how could we test this hypothesis?

Building an MVP of a working product would have taken months. We were in the business-to-business (B2B) market, and our target customers were large enterprises with several dedicated infrastructure teams. These orgs would obviously not let us run a service that wasn’t properly tested and vetted. There was also the problem of sales; even if we did pull off a rock-solid MVP, we didn’t have anyone to sell it to!

So, step one was to find companies potentially interested in working with us, as we built the real version of the product. Here’s how we went about locating them:

Find customers. Pinpoint businesses spending too much on infrastructure.

Get their attention. Demonstrate how much they could save with our product.

Collaborate with them. Work alongside them as we built an MVP and a production-ready version.

Step 2 was key for progressing to PMF. We had no problem finding customers, but we needed to get their attention. Otherwise, there was no point building anything.

Instead of starting to build the “real” product, I wrote a scrappy Python script. This script demonstrated to users how much money they were wasting. The script only took a few hours to write and was very simple. It had two inputs: the workloads our users wanted to run, and their CPU and memory demands, and also the infrastructure they were able to run on, and their CPU and memory supply. The script calculated the minimal infrastructure needed to run these workloads, solving the bin packing problem using an off-the-shelf optimisation library.

A suboptimal solution to the bin packing problem that uses more machines than necessary An optimal solution to the bin packing problem, using the fewest-possible number of machines

We then ran this script to calculate how much potential customers were wasting on unused compute. The results were surprising: we found teams were paying up to 4-5x too much, even when accounting for surge capacity! Simply having this information got people excited enough about a real version of our product to sign prospective contracts to continue working together.

These companies were then happy to spend time talking with a young startup because they saw value in our approach. This scrappy script also validated the existence of demand for our solution, not just for securing early customers. This commitment was gold dust when it came to raising investment from VCs, who wanted to see early signs of traction and market validation.

As a founding engineer, the job is to do what it takes to help the company reach its next milestone. This lesson was clear, even as later, we pivoted from cost optimization to something better suited to our backgrounds in product management and engineering. In this example, this meant being creative to get people’s attention.

It also involves other things, such as:

Developing hacky scripts and prototypes, over insisting on shipping production-ready code

Proactively working with sales and marketing, automating tedious and manual workflows, or creating powerful new ones. 

Not limiting interactions to fellow engineers.

Talking to users to better understand their needs and to build community; not assuming founders or product folks do this task.

Building dashboards to analyze data and provide visibility into key metrics, instead of accepting limited information.

Improving internal processes, instead of adapting to broken ones.

Writing documentation, FAQs, and guidelines, which help users use your product, instead of assuming they know how to use it correctly.

Think broadly about your role and contribution as a founding engineer. Don’t limit yourself – mentally or otherwise – to working on product features!

Scope, quality, timeline: pick two

Often, you want to a build a product feature that is:

Complex (broad scope)

Bug-free (is of high quality)

Quickly built

My concrete experience is that you cannot do all three with limited resources, so the pragmatic approach is to choose two. But which ones? It depends on your company and the market. For startups, “quickly” and “high quality” are often the preferred solution because they force you to reduce scope.

You should choose high quality over broad scope at a startup because an unreleased feature is a gamble. You don’t know for sure it will have the impact you expect until it’s released and users’ reactions are in. Implementing functionality takes time, obviously. Reducing scope allows you to learn from users and to iterate fast, giving you time to quickly test another hypothesis and approach if the first doesn’t work.

Reducing scope is also a great forcing function to focus on the core of your idea, pushing you to cut everything that isn’t core. Being able to ruthlessly cut unneeded functionality is a hallmark of an effective founding team.

When deciding the scope, a founding engineer can – and should! – provide context on what gives the best “bang for your buck.” It’s rare for a founding engineer to own the decision of what’s in or out of scope: the founder or product manager owns this decision. However, founding engineers can – and should! – provide context to the product person about which features are the most time-consuming to build, and why. The best founding engineers help their team understand feasibility and complexity, even before the feature is designed, and especially before engineers start building.

This kind of conversation is a regular one at my current startup. Just last week, we decided to de-scope a new feature called “stories.” We cut the scope to limit the feature to current users. The “stories” feature won’t be shown to new users we onboard to the initial version.

A tech stack is merely a means to an end

At Improbable, my first project was to check the correctness of the distributed game engine we were building. I did this by performing an integration test. I outlined my thinking in a design document, listing the taks:

Run different instances of the game engine, simulating real-world users playing a game with tanks, where each player controls a tank and aims to destroy other tanks.

To mimic the chaotic environment of a real-world game, I would add non-player character (NPC) tanks.

Use artificial intelligence (AI) for NPC agents to interact with one another like real-world players would. 

Human players are smart, so I decided I need “smarter” AI. I used reinforcement learning to guide NPC tanks to choose targets, using the learnings from runs to continuously improve strategy. I’d spent the previous year building theoretical machine learning models like this, so figured it was the perfect experience to utilize.

Satisfied with my thorough approach, I shared my proposal with teammates in the expectation of receiving congratulations. Instead, I got immediate pushback! Someone said building would take too long, someone else said it would be near-impossible to maintain the custom reinforcement model-based AI, and another didn’t even think I could make it work, and that it was probably unneeded, anyway.

It turns out I didn’t actually need all this complexity; I didn’t need a complicated, reinforcement model-based AI, even though it would be fun to implement. The goal of the project was to answer the question:

“Is our distributed game engine working or not?”

A very simple simulation would do, wherein tanks interacted with each other using a very simple, rule-based logic (if-then-else). I built this in a fraction of the time that reinforcement model-based AI takes, and finished the project in a couple of weeks, not several months.

It’s tempting to reuse old technologies and patterns – but avoid this temptation. Reusing what I already knew was exactly what I was doing: I had just finished my engineering degree, and spent the year prior building theoretical machine learning models – including reinforcement learning. It was instinctive to use that tool for a design challenge, and I’ve seen this so many times at early-stage startups:

Using an approach that worked at a large or mid-sized company

Copying an approach or using a new product from Hacker News

Taking an approach from university or bootcamp that was taught as the ‘right way’

I don’t recommend copying old approaches, as they often come from very different environments:

Typically post-PMF

In a different market or geography

At a company with a different technical history

A different product

A different problem from the one you’re solving

Don’t forget, all startups and companies are different. If you know which technologies are appropriate for your context, then great – use them! If not, carefully reflect on why technologies you used in previous experiences worked, and whether they’re actually appropriate now.

A final note on the tech stack; as an engineer, you likely take pride in things like writing elegant code using TypeScript, and managing your infrastructure using Kubernetes. However, users and the business don’t care about this! All they care about is impact: the usefulness of the product, and whether it solves their problems.

3. Tech debt

Read more


@_Nat Zone

ナショナルオーストラリア銀行の円卓会議で開会の辞を行いました

3月26日日本時間正午ナショナルオーストラリア銀行のコーポレートデジタルIDラウンドテーブルでオープニングアドレス(開会の辞)を行いました。National Australia Bank (ナショナルオーストラリア銀行)、通称NABはオーストラリア国内最大の資産を持つ都市銀行です…

3月26日日本時間正午ナショナルオーストラリア銀行のコーポレートデジタルIDラウンドテーブルでオープニングアドレス(開会の辞)を行いました。National Australia Bank (ナショナルオーストラリア銀行)、通称NABはオーストラリア国内最大の資産を持つ都市銀行です。

プログラム

コーポレートデジタルIDラウンドテーブルのプログラムは以下のとおりです。(和訳は以下)

Corporate ID in a nutshell

What is the purpose and benefits of Corporate Digital ID? What are the current pain points for ID&V for corporate entities? What are the similar and unique challenges for Corporate Digital ID vis-a-vis Individual Digital ID?   Is a common framework linking Individual Digital ID and Corporate Digital ID necessary/desirable? Why/Why not?

Spotlight on the Global LEI systems (as a potential global and interoperable framework for Corporate Digital ID)

What are the origins of the GLEI, how does it operate and what is its purpose? What are the key adoption challenges and opportunities? A role for global standards? What’s needed for success?

Industry / Government / Cross-border collaboration and partnerships

What is the role of Government/Industry in Corporate Digital ID ecosystems (how does this differ from Individual Digital ID)? What are the opportunities for industry collaboration, public-private partnerships? What are the opportunities for cross border co-operation to promote adoption and interoperability of corporate digital ID systems like the GLEI system? How do we ensure interoperability, accommodate within the broader framework of Digital ID in Australia where this is warranted/desirable

Conclusion and next steps

企業IDの概要

企業デジタルIDの目的とメリットとは? 法人向けID&Vの現在のペインポイントとは? 個人デジタルIDに対する法人デジタルIDの類似点、独自課題とは? 個人デジタルIDと企業デジタルIDをつなぐ共通のフレームワークは必要/望ましいか?なぜ/なぜそうしないのか?

グローバル LEI システムのスポットライト(企業デジタル ID のための潜在的なグローバルで相互 運用可能な枠組みとして)

GLEI の起源、運用方法及び目的は何か? 主要な導入の課題と機会は何か? グローバルスタンダードの役割とは? 成功のために何が必要か?

産業界/政府/国境を越えた協力とパートナーシップ

企業デジタル ID エコシステムにおける政府/産業の役割(個人デジタル ID との違い)とは? 産業界の協力、官民パートナーシップの機会とは? GLEI システムのような企業デジタル ID システムの採用と相互運用性を促進するための 国境を越えた協力の機会とは? 相互運用性を確保し、オーストラリアにおけるデジタル ID の広範な枠組みの中で相互 運用性が保証される/望ましい場合には、どのように対応するか? 背景文献

このラウンドテーブルの背景文献としてはDouglas Arner 他による国際決済銀行(BIS)の論文 「 Corporate digital identity: no silver bullet, but a silver lining」が指定されていました。なお、Douglas Arner氏も本ラウンドテーブルに参加していました。

スピーチ

これを受けて、オープニングアドレスとしてわたしは以下のようなスピーチをしています(和訳はこの後にあります)。

Introduction:

Thank you for the introduction and thank you for inviting me to this roundtable. It is my honour to be with you all. Let me provide a few remarks on Corporate Digital Identity. 

Corporate digital identity, or corporate ID, has the potential to dramatically simplify the identification and verification of companies, reducing the risks and costs of doing business. It can act as an admission ticket for companies to access financial services more efficiently. But developing effective corporate ID systems requires addressing several key challenges. I have enumerated seven of them for today. 

Key points: Corporate ID is more complex than individual ID. A company’s attributes like directors and ownership structure can change frequently and span multiple jurisdictions. Identifying ultimate beneficial owners is a particular challenge.  Company registries play a foundational role as the authoritative source of core company data. But many registries need to enhance data openness, quality, depth and connectivity to better support corporate ID. Often, data do not get updated in a timely manner and we would be looking at stale data.  With Bank-related initiatives: Banks have an opportunity to monetise their KYC investments by providing corporate ID services, thus turning a cost centre into a profit centre. The Open Digital Trust Initiative, jointly launched by the Institute of International Finance and the OpenID Foundation is such an initiative, but face obstacles around cost, data sharing, liability and competition. KYC utilities offer potential but have seen mixed results so far. The Legal Entity Identifier (LEI) provides a global, unique, and interoperable identifier as a starting point for corporate ID. Enhancements like the verifiable LEI and LEI embedded in digital certificates could expand benefits and adoption. However, coverage of the registered companies is yet to be improved. Also, vLEI being built on a completely different technical stack than other verifiable credentials systems may pose adoption challenges.  OpenID, an open standard and decentralized authentication protocol, and its extension “OpenID for Identity Assurance”, which expresses the provenance and quality of the attributes, have significant potential for enabling secure and trusted identity assurance and data sharing between companies and service providers. The Global Assured Identity Network (GAIN) initiative aims to build on banks’ existing KYC processes and the OpenID standard to create a global, interoperable corporate ID and authentication system. It is like building bridges among islands of ecosystems. Interoperability among different ecosystems has been demonstrated through a technical proof of concept, which was led by Dima Postonikov in the Sydney room, but business and operational reality must catch up before it becomes ready to take off.  Decentralized identifiers and verifiable credential models, sometimes built on blockchain, aim to give companies more control over their data while enabling trusted data sharing. However, significant infrastructure investment and maturity are still needed.  Some governments are proactively developing corporate ID infrastructure as a public good, such as the account aggregator framework in India which empowers SMEs to digitally share their data for better access to finance. While it has found tractions in these economies, whether the pattern will propagate to other parts of the world is yet to be determined.  Conclusion:

In summary, corporate ID is progressing but remains fragmented today. There is no silver bullet – a range of stakeholders including registries, banks, service providers and policymakers have important roles to play. Enhancements to the LEI system, OpenID-based initiatives like GAIN, decentralized identity standards, and public infrastructure could help accelerate the development of a trusted and inclusive corporate ID ecosystem supporting improved financial stability, integrity, and access. Political will and multi-stakeholder coordination is essential to realizing the full potential.

I am hopeful that I will find hints to some of the key challenges that I have cited in today’s roundtable.  

以下、自動翻訳です。

はじめに:

この円卓会議にお招きいただき、ありがとうございます。皆さんとご一緒できて光栄です。コーポレート・デジタル・アイデンティティについて少し述べさせていただきます。

コーポレート・デジタル・アイデンティティ、すなわち企業IDは、企業の識別と確認を劇的に簡素化し、ビジネスを行う上でのリスクとコストを削減する可能性を秘めています。企業が金融サービスをより効率的に利用するための入場券として機能します。しかし、効果的な企業IDシステムを開発するには、いくつかの重要な課題に取り組む必要があります。今日はそのうちの7つを列挙してみました。

キーポイント 法人IDは個人IDよりも複雑です。役員や所有形態など企業の属性は頻繁に変更され、複数の法域にまたがる可能性があります。最終的な受益者の特定は特に難しい課題です。 企業レジストリは、企業データの権威ある情報源として中核的な基礎的な役割を果たしています。しかし、多くのレジストリは、企業IDをよりよくサポートするために、データの公開性、質、深さ、および接続性を強化する必要があります。多くの場合、データはタイムリーに更新されず、私たちは古いデータを見ることになります。 銀行関連のイニシアティブ 銀行は企業IDサービスを提供することで、KYCへの投資を収益化し、コスト・センターをプロフィット・センターに変えるチャンスがあります。国際金融研究所(Institute of International Finance)とOpenIDファウンデーション(OpenID Foundation)が共同で立ち上げたオープン・デジタル・トラスト・イニシアチブ(Open Digital Trust Initiative)はそのようなイニシアチブですが、コスト、データ共有、責任、競争などの面で障害に直面しています。KYCユーティリティは可能性を秘めていますが、これまでのところ結果はまちまちです。 取引主体識別子(LEI)は、企業 ID の出発点として、グローバルで一意かつ相互運用可能な 識別子を提供。検証可能な LEI や電子証明書に組み込まれた LEI のような機能強化は、便益と採用を拡大する可能性がある。しかし、登録企業のカバレッジはまだ改善されていない。また、vLEI は他の検証可能なクレデンシャル・システムとは全く異なる技術スタック上に構築され ていることから、採用の課題が生じる可能性がある。 オープンスタンダードで分散型の認証プロトコルである OpenID と、その拡張機能である「OpenID for Identity Assurance」は、属性の実証性と品質を表現するものであり、企業とサービスプロバイダ間で安全で信頼できる ID 保証とデータ共有を可能にする大きな可能性を秘めています。グローバル・アシュアード・アイデンティティ・ネットワーク(GAIN)イニシアチブは、銀行の既存のKYCプロセスとOpenID標準を基盤として、グローバルで相互運用可能な企業IDおよび認証システムを構築することを目指しています。これは、エコシステムの島々の間に橋を架けるようなものです。異なるエコシステム間の相互運用性は、シドニーの部屋で参加しているディマ・ポストニコフが主導した技術的な概念実証を通じて実証されていますが、ビジネスと運用の現実は、それが離陸する準備が整う前に追いつかなければなりません。 非中央集権的な識別子と検証可能なクレデンシャル・モデルは、時にはブロックチェーン上に構築され、信頼できるデータ共有を可能にしながら、企業がデータをよりコントロールできるようにすることを目指しています。しかし、インフラへの多大な投資と成熟がまだ必要です。 インドのアカウント・アグリゲーター・フレームワークのように、公共財としての企業IDインフラを積極的に開発している政府もあります。このような経済圏では牽引役となっていますが、このパターンが世界の他の地域に伝播するかどうかはまだわかりません。 結論:

要約すると、企業IDは進歩していますが、現在も断片的なままです。レジストリ、銀行、サービス・プロバイダー及び政策立案者を含む様々な利害関係者 が果たすべき重要な役割。LEI システムの強化、GAIN のようなオープン ID ベースのイニシ アティブ、分散化された ID 標準、及び公共インフラは、金融の安定性、完全性、及びアクセスの改 善を支援する信頼され包括的な企業 ID エコシステムの開発を加速するのに役立つ可能性があ ります。政治的な意志とマルチステークホルダーの協調が、その可能性を完全に実現するために不可欠です。

本日の円卓会議で私が挙げた重要な課題のいくつかにヒントが見つかることを期待しています。

Monday, 25. March 2024

Just a Theory

Mini Summit Two

A rough transcript my Ian Stanton’s talk, “Building Trunk: A Postgres Extension Registry and CLI”, along with interspersed chat comments.

We had such thoughtful and engaged discussion at this week’s Postgres Extension Ecosystem Mini-Summit! I did learn that one has to reserve a spot for each mini-summit individually, however. Eventbrite sends reminders for each one you sign up for, not all of them.

To reserve a spot and be reminded for forthcoming meetings, hit the Eventbrite page and select a date and hit “Reserve a Spot” for each date you’d like to attend.

Back to this week’s meetup. My colleague Ian Stanton of Tembo gave a great talk, “Building Trunk: A Postgres Extension Registry and CLI”, that provided background on the motivations and problems that inspired the creation of trunk, a binary packaging system for Postgres extensions.

The presentation was followed by 35+ minutes of questions, discussion, and brainstorming, which I’ve summarized below. But first, links!

Video PDF Slides

Now down to business.

Introduction I opened the meeting and introduced Ian Stanton. Presentation

Ian introduced himself and trunk, “an extension registry and CLI supporting Tembo Cloud. Wants to tell a story, starting with backstory.

Tembo founded November 2022, provide managed Postgres solution called Tembo Cloud. Idea is Postgres can be used for so many different things through the power of extensions, so built use-case optimized “stacks” flavors of Postgres powered by extensions and configurations. Super proud of them, including Message Queue, for which we build an open-source extension.

Envisioned ability to install any extension, including user-provided extensions. Knew we’d need an extension management solution. So we built it.

It’s called trunk, an extension registry and CLI, an open-source app for the community that hosts binary packages for extensions, and powers Tembo Cloud’s extension management.

Q1 2023 had build Tembo CLoud v1 with all extensions bundled in containers. But wanted way to install them on the fly, ideally with installable packages. Explored the ecosystem for tool we could use.

PGXN first we found. Love it, backed by the community, been around since 2011, but hosted source code, not binaries. Also little development since 2012.

Apt and Yum repositories are community-backed and are binaries, just what we wanted, but smaller subset of extensions relative to the 1000s available. Thought it would be too time-consuming to add them all through the community process.

*Steven Miller: in chat: “Also with apt packaging, it requires to install to a specific path, but we needed to customize the install path based on what pg_config shows for share lib and package lib dir. That way we could persist extension installations on tembo cloud”

Weighed pros and cons of building one. Pros:

Full control over integration with Tembo Cloud Binary distribution We could build new features quickly We could publish new extensions quickly

Cons:

How will the community react? Recreating the wheel?

Expected to publish 2–3 extension a day, only do-able with a solution we built.

Want to build something meaningful for Tembo Cloud and the community.

Astronomer Registry for Airflow: Built by Astronomer to find modules for Airflow, very well received by the community.

PGXN, Apt, and Yum repos: Wanted to take the best of them and build on it.

crates.io: Wanted a similar great experience for Postgres extensions.

Vision boiled down to discoverability, categories, ratings system, certification, and indexing of cloud provider support.

Want to package any extension, whether SQL, C/SQL, or pgrx.

Simple experience, like cargo publish and cargo install cargo-pgrx.

Eric in chat: “❤️”

Hopes and Dreams: had idea people would magically show up, contribute to the code, and publish their extensions. Wanted to support multiple platforms, architectures, and Postgres versions, and for it to be a one-stop shop for Postgres extensions.

How it works.

CLI and Registry, written in Rust, uses Docker to build extensions. Packages named <trunk-project-name>-<version>-<pg-version>.tar.gz. Published with trunk publish and installed with trunk install, putting all the files in the right places.

Steven Miller in chat: “The approach to use docker for building has been nice. It allows for cross-compile, for example, building for any platform docker supports with the —platform flag”

Registry stores metadata and service web site and API, and uses S3 bucket for the tar-gzip files.

Example building semver extension:

Create Trunk bundle: bitcode/src/semver/src/semver.bc bitcode/src/semver.index.bc semver.so licenses/LICENSE extension/semver--0.10.0--0.11.0.sql extension/semver--0.11.0--0.12.0.sql extension/semver--0.12.0--0.13.0.sql extension/semver--0.13.0--0.15.0.sql extension/semver--0.15.0--0.16.0.sql extension/semver--0.16.0--0.17.0.sql extension/semver--0.17.0--0.20.0.sql extension/semver--0.2.1--0.2.4.sql extension/semver--0.2.4--0.3.0.sql extension/semver--0.20.0--0.21.0.sql extension/semver--0.21.0--0.22.0.sql extension/semver--0.22.0--0.30.0.sql extension/semver--0.3.0--0.4.0.sql extension/semver--0.30.0--0.31.0.sql extension/semver--0.31.0--0.31.1.sql extension/semver--0.31.1--0.31.2.sql extension/semver--0.31.2--0.32.0.sql extension/semver--0.32.1.sql extension/semver--0.5.0--0.10.0.sql extension/semver--unpackaged--0.2.1. sql extension/semver.control extension/semver.sql manifest. json Packaged to •/. trunk/pg_semver-0.32.1-pg15.tar.gz

Package up SQL files, control file, SO files, bitcode files into gzip file.

Once it’s published, API surfaces all this information:

[ { "name": "pg_semver", "description": "A semantic version data type for PostgreSQL.", "documentation_link": "https://github.com/theory/pg-semver", "repository_link": "https://github.com/theory/pg-semver", "version": "0.32.0", "postgres_versions": [ 15 ], "extensions": [ { "extension_name": "semver", "version": "0.32.0", "trunk_project_name": "pg_semver", "dependencies_extension_names": null, "loadable_libraries": null, "configurations": null, "control_file": { "absent": false, "content": "" } } ], "downloads": [ { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.0.tar.gz", "pg_version": 15, "platform": "linux/amd64", "sha256": "016249a3aeec1dc431fe14b2cb3c252b76f07133ea5954e2372f1a9f2178091b" } ] }, { "name": "pg_semver", "description": "A semantic version data type for PostgreSQL.", "documentation_link": "https://github.com/theory/pg-semver", "repository_link": "https://github.com/theory/pg-semver", "version": "0.32.1", "postgres_versions": [ 15, 14, 16 ], "extensions": [ { "extension_name": "semver", "version": "0.32.1", "trunk_project_name": "pg_semver", "dependencies_extension_names": null, "loadable_libraries": null, "configurations": null, "control_file": { "absent": false, "content": "# semver extension\ncomment = 'Semantic version data type'\ndefault_version = '0.32.1'\nmodule_pathname = '$libdir/semver'\nrelocatable = true\n" } } ], "downloads": [ { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg14-0.32.1.tar.gz", "pg_version": 14, "platform": "linux/amd64", "sha256": "f412cfb4722eac32a38dbcc7cd4201d95f07fd88b7abc623cd84c77aecc8d4bb" }, { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.1.tar.gz", "pg_version": 15, "platform": "linux/amd64", "sha256": "9213771ffc44fb5a88726770f88fd13e62118b0f861e23271c3eeee427a23be9" }, { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz", "pg_version": 16, "platform": "linux/amd64", "sha256": "8ffe4fa491f13a1764580d274e9f9909af4461aacbeb15857ab2fa235b152117" } ] } ]

Includes different tar-gzip files for different versions of Postgres, the contents of the control file, dependencies; loadable libraries and configurations; and the one extension in this package — some can have many like PostGIS. Then Postgres version support and some other metadata.

What it looks like on the web site, includes README contents, data from the last slide, install command, etc.

This is what installation looks like:

$ trunk install pg_semver Using pkglibdir: "/usr/lib/postgresql/16/lib" Using sharedir: "/usr/share/postgresql/16" Using Postgres version: 16 info: Downloading from: https://cdb-plat-usel-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz info: Dependent extensions to be installed: [] info: Installing pg_semver 0.32.1 [+] bitcode/src/semver/src/semver.bc => /usr/lib/postgresql/16/lib [+] bitcode/src/semver. index.bc => /usr/lib/postgresql/16/lib [+] semver.so => /usr/lib/postgresql/16/lib info: Skipping license file licenses/LICENSE [+] extension/semver--0.10.0--0.11.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.11.0--0.12.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.12.0--0.13.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.13.0--0.15.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.15.0--0.16.0.sql = /usr/share/postgresql/16 [+] extension/semver--0.16.0--0.17.0.sql => /us/share/postgresql/16 [+] extension/semver--0.17.0--0.20.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.2.1--0.2.4.sql => /usr/share/postgresql/16 [+] extension/semver--0.2.4--0.3.0.sql > /us/share/postgresql/16 [+] extension/semver--0.20.0--0.21.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.21.0--0.22.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.22.0--0.30.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.3.0--0.4.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.30.0--0.31.0.sql = /usr/share/postgresql/16 [+] extension/semver--0.31.0--0.31.1.sql => /usr/share/postgresql/16 [+] extension/semver--0.31.1--0.31.2.sql => /usr/share/postgresql/16 [+] extension/semver--0.31.2--0.32.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.32.1.sql => /usr/share/postgresql/16 [+] extension/semver--0.5.0--0.10.0.sql => /usr/share/postgresql/16 [+] extension/semver--unpackaged--0.2.1.sql => /usr/share/postgresql/16 [+] extension/semver.control => /usr/share/postgresql/16 [+] extension/semver.sql => /usr/share/postgresql/16 *************************** * POST INSTALLATION STEPS * *************************** Install the following system-level dependencies: On systems using apt: libc6 Enable the extension with: CREATE EXTENSION IF NOT EXISTS semver CASCADE;

CLI pulls down the tar-gzip, unpacks it, and puts the files in the right places and tells the users what other commands are needed to enable the extension.

Pause to take a sip of water.

David Wheeler (he/him) in chat: “STAY HYDRATED PEOPLE!”

State of the project. Trunk powers extension management for Tembo Cloud, 200 extensions on the platform, install and enable on the fly. Tembo Cloud likely trunk’s #1 user.

Get lots of site traffic, especially around categorization, addresses the discoverability problem set set out to solve.

Jeremy S in chat: “Interested in insights from site traffic - you mentioned that ‘categorization’ was popular - any other things that traffic patterns seem to suggest you might have done really well, or clearly is needed?”

But pretty minimal community involvement, out fault for not involving the community early on.

Did we solve the problem?

For Tembo Cloud: yes! Trunk is core component of the Tembo Cloud platform that lest us offer high number of extensions. For the community: no! But helped bring more awareness to the opportunities to improve the ecosystem as a community. Saw other solutions arise around the same time, including dbdev and pgxman, and Yurri at Omnigres is working on something as well. Huge opportunity to solve this together. Steven Miller in chat: “I think it is very nice way to install other extensions via an extension how dbdev works” David Wheeler (he/him) in chat: “GRANT!” Grant Holly in chat: “Oh hi”

Lessons Learned

It’s a really hard problem to solve! As you add more layers of complexity, like different architectures, versions of Postgres, it gets harder and harder.

Steven Miller in chat, Replying to “The approach to use …”: “The downside for this approach includes missing chip-specific instruction support, for example AVX512, which optimizes performance on some extensions. However if you are building with docker on the same architecture as the host, then it still includes these instructions.”

David Wheeler (he/him) in chat, Replying to “The approach to use …” “Also presumably no support for building for non-Linux platforms, yes?”

The extension ecosystem truly is the wild west, not really best practices around building, versioning, and releasing, and when you’re collecting and housing them, it makes things difficult. A huge opportunity for us to come up with those standards and share them with the community.

Community involvement is crucial, wish we’d done it better early on, that’s why we’re all here today! Solution to build together doesn’t happen if we don’t tackle it as a community.

Similarly, wish we’d reached out to folks like David and Devrim early on, to get more insight from them and bring the community into the project from the beginning

The future of trunk

Registry and CLI will continue to serve Tembo Cloud

Has paved the way for binary packaging and distribution in PGXN v2 that David is spearheading, will at least will inform and perhaps be the basis for that part of the project.

That’s all, thank you, back to you, David!

Discussion

David Wheeler (he/him): Thanks for history an context, Ian! Questions or discussion topics? Some comments in the thread from Steven and Tobias.

Tobias Bussmann: in chat: speaking of paths: it would be super helpful if postgresql would support loading extensions from additional paths beside the $SHAREDIR/extension and $PKGLIBDIR directories. At least following directory symlinks within…

Steven Miller in chat, Replying to “The approach to use …”: I tried to make it work for Mac, for example, but the docker support didn’t work basically. I think it might work for Windows, since they have better container support. However I didn’t try that yet.

David Wheeler (he/him): Reads Tobias’s comment. You can specify a subdirectory in the sharedir and maybe the moduledir? But it’s a little hinky right now. Steve, do you want to talk about the us of Docker to build images?

Steven Miller: Yeah, I’d love to. To Tobias’s point, agree, on Tembo Cloud, we have a persistent directory where wer’re sintalling extensions, but because there is no way for an extra sharedir or package dir, we’re persisting all of the library files, including Postgres core. Not ideal, especially for upgrades.

Approach for building ind Docker: been nice, do the build ina Dockerfile, start the container, then install and compare the difference between layers and zip up all the new files. Great for cross-compile but, not working for mac or other systems. Will need a fallback option to do a local build.

Jeremy S: in chat, Replying to “speaking of paths: i…”: Exactly same point was also mentioned just this morning on slack by Matthias

David Wheeler (he/him): Makes sense, thanks. What other bits do you feel like could be useful for packaging binaries at a community level?

Steven Miller: Sometimes we install binaries with trunk, but then difficult to know what has been installed. Nothing like apt where there is a history of what is installed or uninstall. Would be nice to do something like trunk list and see everything that has been installed. Also, future should be not just install but management, including turning extensions on, and there are a lot of ways to turn them on.

Ian Stanton: uninstall would be useful, too.

David Wheeler (he/him): Other questions about trunk or challenges to binary distribution it brings?

*Tobias Bussmann in chat, Replying to “speaking of paths: i…”: this would allow an immutable PostgreSQL base package and still allow to install extensions on top. This is esp. important if you need to have singned packages like on macOS

*nils in chat, Replying to “speaking of paths: i…”: Guess there is some prior art in how search_path in pg work, or the PATH in unix’s.

Should be doable to allow to specify some kind of ordered search path, where Postgres will look for extensions. That way, Postgres can protect it’s own libs to no be overwritten by external libs, but allow for loading them from extra paths.

Yurri: There is CREATE EXTENSION and other extensions like logical decoding plugins. Does trunk handle them?

Steven Miller: We think of it as four types extensions into 2x2 matrix: 1. Does it require CREATE EXTENSION true or false; and 2. Does it have a loadable library true or false. The false/false category is output plugins; The true/true category, e.g. pg_partman, pg_cron; CREATE EXTENSION false and loadable library true, e.g., autoexplain, just a library, no upgrade concerns; and then CREATE EXTENSION true and loadable library false is the default case.

Ian Stanton: Steven wrote a blog on this.

Eric in chat: Does pgrx make the process of building easier or harder and is there anything we can do today to make rust extension building better?

Jason Petersen in chat: Yeah, it sounds like we need some sort of system database like apt has; would enable management, uninstall, version list, whether upgrades are available, etc

Yurri: That would be great. What other modules are there without extensions, like autoexplain?

Ian Stanton: auth delay is another, base backup to shell, considered parts of postgres, but we have trouble categorizing them. There are 10-15 I’ve come across.

Yurri: ARe these categories on Tembo, can you click a button?

Ian Stanton: Not a category, but would be a good one to add.

Steven Miller in chat: This one! https://tembo.io/blog/four-types-of-extensions

It’s in the API metadata

Sorry if I mispronounced your name Tobias

David Wheeler (he/him) in chat: SAME

Yurri: Did you say output plugins are handled with Tembo:

Steven Miller: YOu can install them with trunk, yes.

Yurri: And you have the build pipeline that will work without plugins too, yeah, cool.

David Wheeler (he/him): Tobias, did you want to say more about the path issues?

Tobias Bussmann: Sure! We are building the Postgres.app, distribution for macOS, working different from Linux systems. We distribute some extensions directly, but also allow building and installing extensions on it. Works nicely, even with pgxn client, but it’s built within the application, which breaks the code signature.

We always have to fight against a breaking system to allow that. Possible, but would be much cleaner to specify an extra directory where extensions could be loaded, and we could distribute packages with binary extensions that the user could download and install separately from the Postgres.app.

David Wheeler (he/him): You’re not suggesting a different directory for every extension with a module, but just another path in the search path that’s not subject to the signature verification.

Tobias Bussmann: Yes, that would be an option, but with a flexible system could use one per extension or just specify a second directory. Contrib extensions sometimes seen as part of Postgres, and they’re all stuffed in the same directory with third party extensions, which gets confusing and hard to manage.

Steven Miller in chat: In the previous extensions mini summit, Yuri mentioned that he was working on a patch to postgres for extra libdir, extra share dir, but I have not been tracking this one

nils: That’s what I was saying in chat, there is prior art in Postgres and Unix systems where you can specify a search path in postgres for a list of schemas, and in Unix the path is to find binaries. Give me a delimited list of directories on my system. Could be super user only, where they can specify where they’re installed, and we can go through the list ot find an extension.

David Wheeler (he/him): I might be imagining this, but I seem to recall there was a proposal to have extensions in their own directories, which would be nice for packaging, but then every time you add one you have to add another directory to the list and there is some fear the lookup time could be too long.

Jeremy S in chat, replying to “speaking of paths: i…”: (Or like LD_LIBRARY_PATH )

David Wheeler (he/him) in chat, replying to “speaking of paths: i…”: LD_LIBRARY_PATH is all but dead on macOS

Jason Petersen: If it happens at startup I can’t imagine that being a concern. If the list changes you reboot. It’s not gonna be a performance problem, I hope.

*David Wheeler (he/him): Or HUP it if you don’t want downtime.

Jason Petersen: Sure, but it doesn’t need to be on every command.

*David Wheeler (he/him): Eric, do you want to pose your question about pgrx?

Eric: Sure. Wanted to know, were there stumbling blocks to get pgrx support built into trunk, and does it make things easy or difficult? Different from C path, are there things we could do to make things easier today?

Ian Stanton: Yeah, I think the issue is mostly on our end. We have a separate image for each version of pgrx, and keeping up with the releases is challenging. We need to rethink our image building strategy. Shouldn’t be one image for each version of pgrx. That’s the biggest thing I’ve noticed, mostly on our side.

*David Wheeler (he/him): Because you need the install the version of pgrx that the extension requires before you do the build, and that’s just too slow?

Ian Stanton: Could be too slow. We’ve known about this problem for some time, just hasn’t been addressed yet.

Eric: Okay, maybe we can talk about it offline one day, be happy to chat. I think we’re close to being able to have the CLI, cargo-pgrx, be a different version than whatever version the extension uses.

Ian Stanton: That would be super useful!

Eric: Yeah, I think we’re close to being at that point, if not there already. We can talk about that offline.

Ian Stanton: Nice! We’ll reach out in Discord.

*David Wheeler (he/him): Other comments or questions, or people who have worked on other kinds of binary registry things, would love to hear more from other perspectives. Devrim is going to talk about the Yum repository next week [ed. correction: in two weeks].

Steven Miller in chat: Daniele last time mentioned Pip is good example of mixing source and binary distributions

Eric: I have a random question related to this. In the past and recent history, has hackers talked about some way of storing extension in the database rather than relying on the file system?

*David Wheeler (he/him): Yes! In this long thread from 2011 [ed. Correction: 2013] Dimitri was proposing a “unit”, a placeholder name, where the object would be stored in the database. Very long thread, I didn’t read the whole thing, lot of security challenges with it. If it needs a shared object library loading having to be written to the file system it’s just not going to happen. I don’t know whether that’d be required or not.

Dimitri also worked on a project called pginstall where you could install extensions from the database like dbdev, but not just TLEs, but anything. The idea is a build farm would build binaries and the function in the database would go to the registry and pull down the binaries and put them in the right places on the file system.

There were a lot of interesting ideas floating around, but because of the legacy of the PGXS stuff, it has always been a bit of a struggle to decide not to use it, to support something not just on the machine, but do something over libpq or in SQL. Lot of talk, not a lot of action.

*Tobias Bussmann in chat in response to “In the previous ex…”: still searching on hacker for it. Meanwhile I found: https://commitfest.postgresql.org/5/170/

Steven Miller in chat: That approach is very awesome (install via extension)

Eric: I can see why it would take some time to sort it all out. One thing to require super user privileges to create an extension, but also having root on the box itself? Yeah.

Yurri: TLE plugs into that a little bit for a non-shared object. Not exactly storing it in the database, but does provide a SQL based/function method of installing from inside the database, but only for trusted languages, not shared objects.

*David Wheeler (he/him): dbdev install does download it from database.dev and stores it in the database, and has hooks into the CREATE EXTENSION command and pulls it out of its own catalog. Was a similar model with pginstall, but with binary support, too.

Yurri: Back to trunk. When you start building, and have to deal with binaries, pgxn you can put the source up there, but I want to get to the whole matrix of all the different versions. Every extension author does it a little different. Some extensions have versions for Postgres 15, another for 14, some have the same version across all the majors, sometimes an extension works for some majors and others. Has trunk expanded to other Postgres versions to support the whole exploding matrix of stuff that does and doesn’t work, 5-6 majors, gets to be a large matrix, a lot to keep track of. How’s that working out for the builds and managing that matrix.

Steven Miller in chat: Dimensions I think are: pg version architecture chip-specific instructions (edge case for native builds?)

Steven Miller in chat: We just announced support for 14 and 16

David Wheeler (he/him) in chat, replying to “Dimensions I think a…”: OS, OS version

Steven Miller in chat,: Replying to “Dimensions I think a…”: Ah right

Ian Stanton: Steven do you want to take that one?

Steven Miller: Oh yeah. We’ve started toe-dipping on this one. Started with Tembo Cloud’s platform, but have no released Postgres 14 and 16, and also trunk has built-in support for other architectures, such as arm, or whatever the Docker --platform flag supports. We looked at mac builds, not working yet, might work for Windows, which ahs better container support, but I don’t know, and also there is an edge case for pg_vector especially, which compiles to include ship-specific instructions for AVX512, which helps with vector. So that’s another dimension to consider.

Yurri: Part of the idea behind this forum is to see if we can chart a path forward, maybe not solve everything. What can we solve, how can we make something a little better for Postgres at large?

Eric in chat: Even as a Mac user I don’t know the answer to this… what’s the common Postgres package there? Postgres dot app, homebrew, something else?

David Wheeler (he/him) in chat: pgenv! (self-promotion)

Eric in chat: I assume folks don’t use macOS in prod but developers are important too

nils in chat, Replying to “Even as a Mac user I…”:

$ git clone .. $ ./configure $ make $ make install

At least that is what I do 😄

Steven Miller: In my opinion, the way to approach it is to know all the dimensions you need, and in the metadata API say which binaries are available. Then get through it with testing and badging If we let things get built, to what extent is it tested and used? That can help. Daniele was in the previous call, said we could look to Pip and Wheel files for inspiration, and Adam on our team has said the same. This is something that has some binary and some source, and falls back on doing the build when it needs to.

*David Wheeler (he/him): I’ve been thinking about this quite a bit lately. Can see needing to take advantage of multiple platforms available through GitHub workflow nodes or the community’s build farm, which has a vast array of different architectures and platforms to build stuff. There are precedents!

I imagine a system where, when something is published on PGXN, another system is notified and queues it up to all its build farm members to build binaries, ideally without full paths like trunk, and making them available for those platforms. Building out that infrastructure will take a fair bit of effort, I think. With cross-compiling is available it might be…doable? But most modules and for SQL and maybe Rust or Go extensions, but a challenge for C extensions.

This is a problem I’d like us to solve in the next year or two.

Steven Miller in chat, replying to “I assume folks don’t…”: Yeah exactly, like trunk install after brew install postgres

Tobias Bussmann in chat, replying to “Even as a Mac user…”: this seems to be quite spread. There are also people that prefer docker based installs

Eric in chat: pgrx supports cross compilation

With a caveat or two!

Eric in chat, replying to “Even as a Mac user I…” @nils same. For v9.3 though 16!

*David Wheeler (he/him): What else? Reading the comments.

Yurri: I think maybe that PGXN JSON file, I know you’ve been spending time on it, David, including the proposal on namespacing a few days ago. That feels like it could be helpful to be part of this. IF it could be something we could center around… The first time I wanted to put an extension on PGXN, it took me a long time to figure out that JSON file. I didn’t find the blog post that goes through it in nice detail till like two weeks after. If I’d found it sooner I could have skipped so many things I tried to figure out on my own.

If we can center around that file, it’ll draw more attention to it, more links back to it, more examples people blog about here and there, it helps going forward. The trick is getting it right not being this massive thing no one can figure out, or has too many options, but hits all the points we need.

nils in chat, replying to “Even as a Mac user I…”: Well, mostly for extension, for Postgres I rely on David’s pgenv

*Eric * in chat, replying to “Even as a Mac user I…”: @Tobias Bussmann hmm. Makes it difficult to get an extension installed.

*David Wheeler (he/him): I’ve been thinking about this a lot, drafted a doc some of my colleagues at Tembo have read over and I hope to publish soon [ed. Note: now published], thinking through what a v2 of the PGXN Meta Spec might include. I think we should extend with list of external libraries required, or the architectures it supports, or it’s a loadable library or an app that doesn’t even go into the database.

I would like soon to draft an actual revision of the spec, and document it well but also turn it into a JSON Schema document so we can automate publishing it and verification in the same place. I also imagine building an eventual replacement or evolution of the PGXN client or trunk client or some client that you can use to manage that thing. I think pgrx does that, adding metadata via the client rather than parse and understand the whole file.

I’m with you it could get really complicated, but I’m not sure I see an alternative other than building good tooling to minimize the pain.

Ian Stanton: I think automatically pulling that information when it’s readily available would be super helpful. We use it as an app to just take care of things for people.

*David Wheeler (he/him): Right, and then if we’re successful in getting it done it’s getting people to take up the tools and start using them. There’s only so much we can infer. I can tell how to do a build if there’s a Makefile or a configure file or a cargo.toml, but that doesn’t reveal what libraries are required. This is why there’s a lot of hand-tuning of RPM and Apt spec files.

Steven Miller in chat: We are calling this “system dependencies”

Ssl and glibc the main ones 🙂

Jason Petersen in chat: And sometimes the package names aren’t even 1—1 mappings

Eric in chat: Ha! Try relying on elasticsearch as a runtime dependency! 😞

Yurri: That’s another thing to touch on. A lot of extensions are just a thin layer of glue between Postgres and some OSS library that someone else maintains. But the trick, when you want to build a Yum package, the dependency has a different name than the rest of the RedHat ecosystem vs. the Debian ecosystem. So part of what Devrim has to do to maintain the RPM packages is manually sort all that out, because you can’t automatically… libc! It’s called glibc in RedHat and just libc in Debian, and every package has slightly different names. Do how do you manage that in trunk? Do you pull the source for any dependencies? Does your Docker image…I don’t know how this is working.

David Wheeler (he/him) in chat: I want to build a json schema validation extension in Rust using https://github.com/Stranger6667/jsonschema-rs or something

Tobias Bussmann in chat, replying to “Ha! Try relying o…”: or V8 🤯

Ian Stanton: Two sides to that one is build time dependencies, and there there are runtime dependencies. I just dropped an example for some random extension. Tthe way we’ve been building this is to write out a Dockerfile that can include build time dependencies. [hunts for link…]

Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/main/contrib/age/Dockerfile

Ian Stanton: We specify them all there. But for runtime, we don’t know what’s required until we test the thing. We have stuff in our CI pipelines to install and enable the extension to see if it works. If it doesn’t, it will report a missing dependency. Then we know we need to add it to our Postgres images. Not the best flow for finding these dependencies. Steven, want to add anything more to the build time dependency piece?

David Wheeler (he/him) in chat, replying to “Ha! Try relying on …”: Next version of plv8 released on PGXN will have v8 bundled

Steven Miller: A lot share the same ones, SSL and glibc, so we just build with the same versions we run on Tembo Cloud. In the metadata we list all system dependencies, that’s what we build towards, and include them in the Docker image. If you pick a different stack, like the Machine Learning stack, it has all the Python stuff in the base image. We don’t really love this, but this is something where Python wheel might inspire us, becaus it has packaging and system dependencies.

Eric in chat, replying to “I want to build a js…”: I feel like I’ve seen one already?

David Wheeler (he/him) in chat, replying to “I want to build a js…”: GIMME

Yurri: If you really want to od this right, just like in the RPM repositories, you have to know what the dependencies are. David, I’m curious, what your thoughts are, if this is to be done right, there has to be a way to indicate dependencies in the META.json file, but then I’m talking about Debian and RedHat, but what about Mac? Windows doesn’t really have a packaging system. There are BSDs, other places Postgres can run, probably have to narrow the scope a bit to solve something.

Tobias Bussmann in chat, responding to “Ha! Try relying o…” Sounds promising, but for which architectures? I have good hope for pljs as replacement for plv8

Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/d199346/contrib/fuzzystrmatch/Trunk.toml#L13

David Wheeler (he/him): Fortunately there are only around 1100 extensions in the world, a relatively low barrier at this point. Some of these other things have thousands or millions of extensions.

Yurri: I guess when you put it that way! But I wasn’t going to go through all 1000 of them one-at-a-time.

David Wheeler (he/him): No. I posted about this on Ivory a few weeks ago [ed. correction: he means on Mastodon]: how does one do this in a platform-neutral way. There are some emerging standards where people are trying to figure this stuff out. One is called purl, where you specify dependencies by packing URLs, or “purls”, and then it’s up to the installing client to resolve them vai whatever the packaging system it depends on.

I would assume on Windows we’d have to say “it works great as long as you use Chocolatey” or something like that. But it’s certainly a difficult problem. I’m looking forward to your talk about your unique approach to solving it, Yurrii [ed. note: that’s the May 1 mini-summit], that’s going to be super interesting.

David G. Johnston: Ultimately you just crowd sourcing. If we just say “this is what we call this thing in PostgreSQL world”, then if people need to compile it on Chocolatey on Windows, they figure it out and contribute it. Or on Debian or RedHat. Just facilitate crowd-sourcing, metadata in a database.

David Wheeler (he/him): My initial idea was a global registry that people contribute to just by editing files in a GitHub repository.

David G. Johnston: HashiCorp has to have something like that already, there’s stuff out there, no need to reinvent the wheel. This is a global problem if we open-source it we can solve it.

David Wheeler (he/him): Right. Really appreciate everyone coming. Great discussion, I appreciate it. In two weeks, Devrim Gündüz is going to talk about the Yum Community Repository and the challenges of RPMifying extensions. I had this idea of automating adding extensions to the Yum and Apt repositories, an Devrim is a little skeptical. So super look forward to his perspective on this stuff. Two weeks from today at noon [ed.: America/New_York]. Thanks for coming!

Eric in chat: Thanks a ton! This is exciting stuff.

Tobias Bussmann in chat: Thanks all!

Grant Holly in chat: Thanks everyone. Great discussion

Jeremy S: in chat: Thanks david

Steven Miller in chat: Thanks all! Cya next time

Jeremy S in chat: Isn’t bagel supposed to come for the end

Ian Stanton in chat: Thanks all :)

More about… Postgres Ian Stanton PGXN Extensions PGConf Summit trunk

Sunday, 24. March 2024

Werdmüller on Medium

How to build a culture that supports engineers

And is productive for workers in every creative discipline Continue reading on Medium »

And is productive for workers in every creative discipline

Continue reading on Medium »

Friday, 22. March 2024

Identity Woman

Real Names & Glassdoor & Fishbowl

There is a Social Contract with sites and their name policies. Some how companies still don’t really understand this (See what WOW did in 2010) . There is a massive story breaking this week about how GlassDoor a site where people who can create profiles and anonymously rate their employers over their entire careers has […] The post Real Names & Glassdoor & Fishbowl appeared first on Iden

There is a Social Contract with sites and their name policies. Some how companies still don’t really understand this (See what WOW did in 2010) . There is a massive story breaking this week about how GlassDoor a site where people who can create profiles and anonymously rate their employers over their entire careers has […]

The post Real Names & Glassdoor & Fishbowl appeared first on Identity Woman.

Thursday, 21. March 2024

Just a Theory

RFC: PGXN Metadata Sketch

Request for comments on a sketch of a new metadata standard for Postgres extension packaging, distribution, and delivery, building on the PGXN Meta Spec to address its shortcomings and emerging use cases 12 years on.

Update 2024-03-22: To compliment high-level discussions on the #extensions channel on the Postgres Slack, I also created a pull request to make it easer directly comment to this post point by point. Keep the feedback coming, and thank you!

Update 2024-03-25: Clarified the definition of “Release” and made “Maintainers” plural. Thanks to Matthias van de Meent for the suggestions!

This post proposes a new metadata standard for extension packaging, distribution, and delivery, building on the PGXN Meta Spec to address its shortcomings and emerging use cases 12 years on. The goals include:

Don’t break the existing standard, but add to it Add explicit support for different types of Postgres extensions, such as background workers and loadable modules Add additional metadata for curation and automated binary compilation Add hooks for registry and third-party-contributed metadata, including binary distribution options and stats & reports.

There are two high level categories of metadata, split into two separate files:

Package Metadata provided by extension package authors Registry Metadata aggregated by the root registry from various sources, including data derived from the extension source code or package metadata, but also trusted third parties such as packagers, smoke testers, security scanners, and more.

Following community discussion of this proposal, the Package Metadata will lead to a draft for PGXN Meta Spec version 2.0.0, while the Registry Metadata will guide the design and implementation of the Root Registry APIs required to provide it.

Terms

Definition of terms as used in this document, because every language and distribution system uses terms differently. These are ours.

Extension: a software component that extends the capabilities of a PostgreSQL database or cluster. Extensions may be CREATE EXTENSION extensions, background workers, command-line apps, loadable modules, shared libraries, and more.

Package: A collection of extensions that are released, versioned, and distributed together. Packages may be downloaded directly from version control repositories or in archive files generated by a release tag.

Package Path: Identifies a package, declared in the package metadata file. A package path should describe both what the package does and where to find it. Typically, a package path consists of a repository root path — the directory that contains the metadata file — and a directory within the repository.

Repository: The version control repository where the package is developed, referenced as package path or URL.

Repository root path: The portion of the package path that corresponds to the root directory of the version control repository where the package is developed, and which contains the metadata file. Most package are defined in their repository’s root directory, so this is usually the entire path. For example, github.com/example/pgtap is the repository root path for the package of the same name.

Source Distribution: The contents of a single package bundled together with package metadata into distributable archive file, usually named with the last part of the package path or the main extension, a dash, and the version, e.g., pgtap-1.14.3.zip.

Binary Distribution: A binary software distribution of a package, compiled and packaged for a particular OS version, platform, and hardware architecture. Examples include deb, RPM, and wheel.

Release: A single version of the package made available to the public on PGXN, expressed as the package path, an at sign, and the semver. Example: github.com/theory/pgtap@v1.14.3.

Package Metadata

The extension author provides this data as part of a release submitted to the root registry. The registry itself will not change this data in any way, but provide it as-is.

Essentials

Essential information about the package itself, including its path, name, version, list of authors, license, etc. Ecosystem applications use this data for indexing, naming conventions, and display information.

Fields (all required):

Package: A package path uniquely identifying the package (usually the source repo host name and path; see this RFC) Name: The name of the extension package, usually the same as the main extension, not unique but conflicts are resolved by escalation to package Version: The version of the package, expressed as a SemVer Abstract: A short description of the purpose of the package Description: A longer description of the package, answering the question “what is this thing and what value is it?” Maintainers: List of maintainers, each an object with name and either email or url (or both) License: An SPDX 2.3 license expression comprised of one or more licenses from the SPDX License List Spec: Information about the metadata spec itself, a version and URL

Example:

{ "package": "github.com/example/pg-pair", "name": "pair", "version": "1.1.0", "abstract": "A key/value pair data type", "description": "Adds a key/value pair data type to Postgres, useful for specifying arbitrary key/value function parameters.", "maintainers": [ { "name": "Naomi Nagata", "email": "naomi@example.com", "url": "https://example.com/roci/naomi" } ], "license": "PostgreSQL OR MIT", "spec": { "version": "2.0.0", "url": "https://pgxn.org/meta/spec.txt" } }

Other possibilities:

License File: An alternative to license for nonstandard licenses (from cargo) Contributors: List of people who have contributed to the package Emeriti: List of former maintainers and contributors

PGXN Compatibility:

Overloads the string representation of the spec v1 field, but it should be able to parse either one, especially with the spec version to differentiate. Uniqueness of package and relationship to name PGXN has maintainer; is it really worth pluralizing it? Maybe we don’t need it at all, given repository and issue links. Artifacts

A list of links and checksums for downloading the extension package in one or more formats, including source code, binaries, system packages, and more. Apps use this information (along with registry metadata) to determine the best option for installing an extension on a particular system.

This information is optional, and provided in addition to the source code available from the root registry itself. The idea is that, on release, the release manager creates artifacts in one or more locations (such as GitHub releases), and it’s useful to link to them from the registry. Binary builds may be unlikely today, but leave the option open for now by allowing arbitrary text in the type field.

Each item must have:

Type: The type of artifact: Source, binary, RPM, etc. URL: Link the the artifact SHA: A SHA checksum

Each URL must properly resolve and the checksum must match.

[ { "type": "source", "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0.zip", "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004" }, { "type": "binary", "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-amd64.tar.gz", "sha1": "12d9bc5cfb6bc3c453627eac69511f48be63cfc0" }, { "type": "binary", "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-arm64.tar.gz", "sha1": "787dc39137f7d1510a33ab0a1b8905cd5f3f72d1" } ]

PGXN Compatibility:

New object, no conflicts.

Resources

External information about the package, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Fields:

Homepage: URL for the package’s home page Issues: URL to issues Documentation: URL to documentation Support: URL to support Repository: Source code repository URL Badges: Array of badge link objects Funding: Array of funding objects { "homepage": "https://pair.example.com", "issues": "https://github.com/example/pair/issues", "documentation": "https://pair.example.com/docs", "support": "https://github.com/example/pair/discussions", "repository": "https://github.com/example/pair", "badges": [ { "alt": "Test Status", "src": "https://test.packages.postgresql.org/github.com/example/pair.svg" } ], "funding": [ { "type" : "individual", "url" : "http://example.com/donate" }, { "type" : "patreon", "url" : "https://www.patreon.com/example" } ] }

PGXN Compatibility:

issues replaces bugtracker and is just a URL. Can support either repository is just a URL instead of an object. Can support either but with the package key referencing the repository path perhaps we don’t need to change it. Contents

A description of what’s included in the package. Often a package consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more.

Furthermore, some extensions are not CREATE EXTENSION-type extensions at all, such as background workers, command-line apps, loadable modules, shared libraries, and more. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Fields:

Extensions: Object describing CREATE EXTENSION extensions. Keys are extension names, fields are objects with the fields sql, doc, abstract, control (same as control file, but can use the file itself), tle (meaning it can be used as a trusted language extension), and preload (to indicate whether its libraries need to be loaded in advance via shared_preload_libraries, session_preload_libraries, or local_preload_libraries). Workers: Object describing background workers. Keys are worker names, fields are objects with the fields bin, doc, and abstract. Apps: Object describing applications, command-line or otherwise. Keys are app names, fields are objects with the fields bin, doc, and abstract. Modules: Object describing loadable modules that can be loaded into Postgres (not necessary for extensions that include libraries). Keys are module names, fields are objects with the fields lib (without file extension), doc, abstract, and preload. Libraries: Other libraries that may ship in the package and need to be installed but are not loadable modules, such as a dynamic library used by an app. Keys are library names, fields are objects with the fields lib (without file extension) or dir (for a directory of files), doc, and abstract. { "extensions": { "pair": { "sql": "sql/pair.sql", "doc": "doc/pair.md", "abstract": "A key/value pair data type", "preload": "session", "tle": true, "control": { "directory": "pair", "module_pathname": "$libdir/pair", "relocatable": true } } }, "workers": { "pair_pruner": { "bin": "bin/pair_pruner", "doc": "doc/pair_pruner.md", "abstract": "A worker to periodically prune pairs" } }, "apps": { "pair_rand": { "bin": "bin/pair_rand", "doc": "doc/pair_rand.md", "abstract": "Command to generate random pairs of strings" } }, "modules": { "lib_pair": { "lib": "lib/lib_pair", "doc": "doc/lib_pair.md", "abstract": "A library hooking function calls to convert pairs to named parameters", "load": "shared_preload_libraries" } }, "libraries": { "ruby_pair": { "dir": "lib/gems", "abstract": "Ruby libraries required to run the extension" } } }

Other options:

Allow or require an array of docs? Specify support for specific doc formats (markdown, manpage, plain text, MkDocs, etc.)? Support glob patterns? Add libraries as sub-key for workers and apps, e.g. for a Ruby app that requires a directory of gems?

PGXN Compatibility:

Significantly different from the existing provides object, so key this one under contents or manifest and continue to support both.

Dependencies

A list of external dependencies required to configure, build, test, install, and run the extensions in the package. These include not only other extension packages, but also external libraries and system dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

Fields:

platforms: Platforms supported by the package, defined as OS and architecture. Assume all platforms if none are listed.

postgres: Versions of PostgreSQL required by the package, specified as a version range, as well as compiled features (--with-* configure options)

pipeline: The build pipeline required to build the extension. Options: pgxs, meson (?) pgrx, gnu (./configure), gem, cpan, pip, go, etc.

prereqs: Object defining external dependencies required for different phases:

configure: Dependencies to configure the package (e.g., items required for ./configure or make to work) build: Dependencies to build the package (in addition to run dependencies) test: Dependencies to test the package (in addition to build and run dependencies) run: Dependencies to run the package develop: Dependencies to develop the package (in addition to all other phase dependencies)

Each points to an object with at least one of these keys:

requires: Required to use the package recommends: Not required, but recommended as a best practice suggests: Not required, but nice to have conflicts: Package will not work with these items

These, in turn, contain at least one of these keys:

package: Other packages, expressed by their package paths, pointing to a version expression external: External dependencies not included with the package contrib: List of Postgres contrib or development packages

variations: A list of dependency variations, each with two fields:

where: Specifies a configuration that, when matched, includes the dependencies. For example, to specify alternate dependencies for Linux arm64, it would be "platforms": { "linux": "arm64" }. For a specific version of Postgres, it would be something like "postgres": { "version": ">= 16, < 17" }. [This likely needs some deep thinking through.] dependencies: An object containing of the other key structures in this object, defining the additional dependencies for environments that match the where field.

The packages and external keys point to objects in which the keys are dependency names and the values are version range expressions (empty string means any version). The postgres.version field is also a version range expression.

The contrib key points to an array of Postgres-supplied apps and extensions. Versions are not necessary because they effectively have the same version as a supported Postgres version.

Example:

{ "postgres": { "version": ">= 12, < 17", "with": [ "xml", "uuid", "perl" ] }, "pipeline": "pgrx", "platforms": { "linux": [ "amd64", "arm64" ], "darwin": [ "amd64", "arm64" ], "windows": [ "amd64" ], "freebsd": [ "amd64" ] }, "dependencies": { "configure": { "requires": { "external": { "cargo-pgrx": "" } } }, "build": { "requires": { "external": { "curl": "", "awk": "", "sed": "", "perl": "5.20" } }, "recommends": { "external": { "jq": "", "perl": "5.38" } } }, "test": { "requires": { "contrib": [ "pg_regress", "plpgsql", "plperl" ], "packages": { "github.com/theory/pgtap": "1.1.0" } } }, "run": { "requires": { "contrib": [ "plperl" ], "packages": { "github.com/theory/hostname": "" } } } } }

Notes:

The external field is the tricky one. How does one provide a canonical dependency for a third-party library or app that may have loads of different names on different platforms, different packaging systems (apt vs rpm), and even different versions of them, or split apart for development and runtime use? Some possibilities:

Use purl and let downstream services handle the conversion Create a rule set that allows one to predict dependency name changes across ecosystems, similar to repology-rules Just use Repology?

Other notes:

Decide on a version range standards to adopt. Start with the PGXN version range, pep-0508, npm ranges, and Gem::Version. Is pipeline really necessary, given configure requirements? I think so, because it tells the client the preferred build system to use, in case it can’t detect it for some reason.

PGXN Compatibility:

Significantly differs from the existing prereqs object, so this one is keyed under dependencies and we can support either.

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field.

Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions that are frequently used together or for specific workloads or tasks.

Example:

{ "tags": [ "testing", "pair", "parameter" ], "categories": [ "Machine Learning" ] }

PGXN Compatibility:

Conflicts with the existing root-level tags array, but we can support either one.

Example: PGML Extension

A hand-coded example that attempts to capture all the complexity of the pgml extension:

{ "package": "github.com/postgresml/postgresml/pgml-extension", "name": "pgml", "version": "2.8.2", "abstract": "pgml: Created by the PostgresML team", "maintainers": [ { "name": "the PostgresML team", "url": "https://github.com/postgresml/postgresml/" } ], "license": "MIT", "spec": { "version": "2.0.0", "url": "https://pgxn.org/meta/spec.txt" }, "artifacts": [ { "type": "source zip", "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.zip", "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004" }, { "type": "source tgz", "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.tar.gz", "sha256": "845f28339c6159ac32daccea1cd17b386ea083c3e60bb8d58fb737725afe7eb5" } ], "resources": { "homepage": "https://postgresml.org/", "issues": "https://github.com/postgresml/postgresml/issues", "documentation": "https://postgresml.org/docs/", "support": "https://discord.com/invite/DmyJP3qJ7U", "repository": "https://github.com/postgresml/postgresml", "badges": [ { "alt": "Tests Passing", "src": "https://github.com/postgresml/postgresml/actions/workflows/ci.yml/badge.svg" }, { "alt": "Chat Activity", "src": "https://camo.githubusercontent.com/1988e7aaaa6a0c8ddcf880faacf1ec3263e23fac4aeff4710922082fde84442a/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f31303133383638323433303336393330303939", "link": "https://discord.gg/DmyJP3qJ7U" } ] }, "manifest": { "extensions": { "pgml": { "sql": "sql/pgml--2.8.2.sql", "preload": "shared", "tle": false, "control": { "comment": "pgml: Created by the PostgresML team", "default_version": "2.8.2", "module_pathname": "$libdir/pgml", "relocatable": false, "superuser": true, "schema": "pgml", "trusted": true } } } }, "dependencies": { "postgres": { "version": ">= 15, < 16" }, "pipeline": "pgxs", "platforms": { "linux": [ "amd64", "arm64" ], "darwin": [ "amd64", "arm64" ] }, "dependencies": { "configure": { "requires": { "external": { "cargo-pgrx": "", "bison": "", "cmake": "", "flex": "", "libclang-dev": "", "libopenblas-dev": "", "libpython3-dev": "", "libreadline-dev": "", "libssl-dev": "", "pkg-config": "" } } }, "run": { "requires": { "external": { "libopenblas": "", "libpython3": "", "libreadline": "", "libssl": "", "python3": "" } }, "recommends": { "external": { "python(pyarrow)": "=11.0.0", "python(catboost)": "", "python(lightgbm)": "", "python(torch)": "", "python(torchaudio)": "", "python(torchvision)": "", "python(xgboost)": "", "python(accelerate)": "", "python(bitsandbytes)": "", "python(ctransformers)": "", "python(huggingface-hub)": "", "python(deepspeed)": "", "python(einops)": "", "python(optimum)": "", "python(peft)": "", "python(tokenizers)": "", "python(transformers)": "", "python(transformers-stream-generator)": "", "python(InstructorEmbedding)": "", "python(sentence-transformers)": "", "python(rouge)": "", "python(sacrebleu)": "", "python(sacremoses)": "", "python(datasets)": "", "python(orjson)": "", "python(langchain)": "" } } } }, "variations": [ { "where": { "platforms": { "linux": [] } }, "dependencies": { "prereqs": { "run": { "recommends": { "external": { "python(auto-gptq)": "", "python(xformers)": "" } } } } } } ] }, "classification": { "tags": [ "machine learning", "ml", "transformers" ], "categories": [ "Machine Learning" ] } }

Note that PostgresML also provides a dashboard app, but it does not ship with the extension, so it’s not listed here. It could have its own package metadata and be separately distributed.

For the Python dependencies, in the absence of a neutral standard for specifying package, here they’re specified using a syntax borrowed from RPM to make clear that they’re Python dependencies. A couple of those dependencies are Linux-only, so listed in variations where the OS is Linux.

Registry Metadata

The second metadata format is registry metadata. It starts by with a copy the package metadata, but builds from there, adding information (such as the artifact link, SHA, and cryptographic signature from the registry itself) and sections (defined below).

Providing a separate metadata format enables broader community collaboration to augment the metadata for extension packages.

Ideally an ecosystem of trusted services will be developed by various parties, who will report back to the root registry as the source of record for all metadata about extension packages. For example, a binary distribution service like apt.postgresql.org or trunk will submit links and checksums for binary packages and perhaps periodic download stats.

Broadly speaking, there are three typologies for registry metadata: Distributions, Reports, and Classifications.

Distributions

A list of links and checksums for downloading binary distributions for the extension package. The root registry page for the package can therefore list all the ways to install an extension and expand the list as new binary registries add it.

Along with the Artifacts package metadata, the distribution metadata allows installer tools to select the best option to install based on the OS, Postgres version, and cluster configuration (e.g., install the community Apt package if the cluster was installed from the community Apt registry).

Each item must have:

Registry: The name of the distribution registry, unique Type: The type of distribution: Apt, RPM, TLE, etc. URL: Link to the the registry’s page for the distribution Command: In lieu of a URL and SHA it can provide instructions to install the distribution, e.g., apt-get install postgresql16-pair.

Each distribution registry must be trusted by the root registry to submit this information to add to the metadata, and have its own page with additional contextual information: its home page, docs, basic instructions, whatever. A README of sorts.

Example:

[ { "registry": "trunk.pgxn.org", "type": "trunk", "command": "pgxn install github.com/example/pair" }, { "registry": "apt.postgresql.org", "type": "apt", "command": "apt-get install postgresql16-pair" }, { "registry": "yum.postgresql.org", "type": "rpm", "command": "yum install postgresql16-pair" }, { "registry": "pgt.dev", "type": "apt", "url": "https://pgt.dev/extensions/pair", "command": "trunk install pair" }, { "registry": "pgxman.com", "type": "apt", "url": "https://pgxman.com/x/pair", "command": "pgxman install pair" }, { "registry": "database.dev", "type": "sql", "url": "https://database.dev/example/pair", "command": "SELECT dbdev.install('example-pair');" }, ]

Other Options:

Information about installing from a registry that doesn’t offer URLs, like Homebrew, where one just needs to know what the formula is called. Maybe the same should apply to Apt? Metrics and Reports

This object includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security certification, vulnerability detection, quality metrics and user ratings, and more. Some of these bits of data might include badges, e.g., for ratings or security assessments.

In the broader ecosystem, it would be the responsibility of the root registry to ensure that data comes only from trusted sources. However this data will be publicly readable, allowing any downstream applications to extend and publish metrics and reports with their own information.

Example:

{ "aggregates": { "downloads": 32 }, "sources": { "smoke-tester.example.com": { "stats": { "passes": 32, "fails": 8, "unknown": 0 }, "link": "https://smoke-tester.example.com/extension/pair", "badge": "https://smoke-tester.example.com/extension/pair.svg" }, "pgt.dev": { "downloads": 12, "tags": ["two", "kv"], "categories": ["Analytics"] }, "pgxman.com": { "downloads": 20 }, "ratings.example.com": { "stats": { "ratings": 324, "average": 3.2, "comments": 13 }, "link": "https://ratings.example.com/extension/pair", "badge": "https://ratings.example.com/extension/pair.svg" }, "stats.example.com": { "contributors": 2, "stars": 14, "last_active": "2024-01-12", }, "security-scan.example.com": { "advisories": { "high": 2, "moderate": 11, "low": 1, "cves": 0 }, "link": "https://security-scan.example.com/extension/pair", "badge": "https://security-scan.example.com/extension/pair.svg" } } }

Notes:

The aggregates section aggregates results from multiple sources, for example summing all downloads or averaging ratings. The list of items to aggregate could evolve regularly.

Each key in sources identifies a trusted downstream source of information. Each would have its own schema describing its objects and their meaning, along with URI templates to link to. For example, stats.example.com might have these templates:

{ "contributors": { "alt": "List of people who have contributed to the {name} project", "badge": "https://stats.example.com/{ package }/contributors.svg", "uri": "https://stats.example.com/{ package }/contributors" }, "stars": { "alt": "Star ratings for the {name} project", "badge": "https://stats.example.com/{ package }/stars.svg", "uri": "https://stats.example.com/{ package }/stars" } }

Might be worth adopting more of an API of multiple files, rather than one big file. Example: Homebrew analytics. Note also its support for date ranges for analytics (30/90/365 days, stats when directly installed vs. installed as a dependency).

More about… Postgres PGXN Extensions Packaging Metadata

Thursday, 21. March 2024

The Pragmatic Engineer

The Pulse #86: Is Shopify’s new “mastery” framework a response to higher interest rates?

The e-commerce giant is taking a new and different approach to career growth and promotions. Also: more evidence the tech job market is tougher, and AI consolidation is already underway

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. NVIDIA launches an even more powerful AI chip, The US sues Apple for abusing its smartphone market position, Dell says you can’t work remotely AND get a promotion, and more.

Shopify’s Mastery framework: a response to higher interest rates? The e-commerce giant is taking a new approach to career growth and promotions. I’ve talked with insiders about how it’s been received by them, after publishing a look into the levels of the Mastery framework, last year.

More evidence the tech job market is tougher. Carta’s report on startup hiring and compensation shows in black-and-white what we’ve long suspected: 2023 really was bad for job seekers. It was the first time in many years that startups shrunk in size by hiring fewer people than who left. Equity compensation also dropped, although salaries increased.

Are we at an AI consolidation “inflection” point, already? The AI industry seems to be speedrunning the creation of categories: rapid fundraising, fast product launches fast, swift regulation, and then… consolidation? Inflection AI raised $1.3B just 9 months ago, and now Microsoft has basically acquired the company in an unusual way which doesn’t trigger antitrust rules.

1. Industry pulse NVIDIA launches an even more powerful AI chip

Producing the world’s most sought-after AI supercomputer – the H100 – isn’t not enough, evidently: NVIDIA has unveiled an even more powerful one, called Blackwell (a chip called GB200). Blackwell offers better performance: around 5 times the computational power of the H100 (20 petaflops – floating point operations per second – versus 4 petaflops)

The interesting part of the announcement is that NVIDIA also announced an “inference microservice.” This will allow companies to run inference on their own models. This is much cheaper than training a model, and would allow companies buying NVIDIA GPUs to have a cheaper option than paying services like OpenAI, Atrophic, or others, to use their models.

NVIDIA is already the de facto hardware provider for AI applications, and is now moving up the stack to offer more software services, and take an even bigger cut of revenue. At this rate, I wouldn’t be surprised if NVIDIA becomes a competitor to AI model companies like OpenAI!

Apple refuses to let Spotify’s app update go live

Two weeks ago, the European Commission fined Apple $2B for anti-competitive practices because of restrictions it placed on music streaming companies like Spotify. The EU also mandated that Apple needs to allow Spotify to display pricing information for EU users.

Apple still has not approved an app update which Spotify submitted on 5 March, whose approval is mandated by the EU. Apple doesn’t offer exact timelines on how long it takes to approve an app, and approvals typically happen within a day. Apple has used the tactic of delaying app updates or rejecting them as a way to send a message to companies whose actions it doesn’t like. Spotify knows this all too well: in 2016-17 Apple made “more frequent, unexpected and unjustified rejections” to their app updates.

Apple is doing it again by delaying the latest update for obvious reasons. It’s yet another reminder to Spotify and other app developers that in the App Store, Apple is king. The EU might fine them $2B, it might mandate that Spotify needs to be allowed to show pricing information; but the EU cannot mandate how fast Apple grants approvals, right?

This is how Apple might slowly but surely become the most-hated company by its own developers; by having an overlord mentality and bullying its own customers. In this way, it feels that Apple is turning into the Microsoft of the 1990s, in terms of rock-bottom developer sentiment.

Apple is making clear why it needs to be regulated – and strictly. A good regulator wants to regulate as lightly as possible. However, what happens when one of the biggest players acts in bad faith, in not following the spirit of regulation?

Then, there’s no choice but to create more granular regulation, even to the point of “micromanaging.” This regulation will be more invasive and have more side effects. Apple, at this point, is dragging regulators towards making stricter rules. I think this will happen, and the tech giant will have only itself to blame.

The US sues Apple for abusing smartphone market position

Apple might have chosen a bad time to anger regulators, as it’s no longer just the EU that is stepping up against the Coupertino giant. The US Department of Justice has filed a lawsuit arguing that “Apple exercises its monopoly power to extract more money from consumers, developers, content creators, artists, publishers, small businesses, and merchants, among others.”

The goal of the lawsuit is stated as follows:

“This case is about freeing smartphone markets from Apple’s anticompetitive and exclusionary conduct and restoring competition to lower smartphone prices for consumers, reducing fees for developers, and preserving innovation for the future.”

Basically, the US has a similar issue with Apple as the EU does; Apple became the dominant smartphone platform in the US through innovation, but has since resorted to throttling competition in a way that the regulator says is unlawful.

This US lawsuit will teach us a lot more about Apple’s alleged anti-competitive practices, and how this secretive company really operates on the inside. If Apple follows a similar strategy as in the EU of ignoring the regulator and provoking a response, then it will backfire.

Dell: “Want a promotion? Come to the office”

Read more


Mike Jones: self-issued

Eight Specifications Published in Preparation for IETF 119

My co-authors and I published updated versions of eight specifications in preparation for IETF 119 in Brisbane. The specifications span three working groups: JOSE, COSE, and OAuth. The updated specifications and outcomes when discussed at IETF 119 are as follows. 1, 2, & 3: JSON Web Proof, JSON Proof Algorithms, and JSON Proof Token. Updates […]

My co-authors and I published updated versions of eight specifications in preparation for IETF 119 in Brisbane. The specifications span three working groups: JOSE, COSE, and OAuth. The updated specifications and outcomes when discussed at IETF 119 are as follows.

1, 2, & 3: JSON Web Proof, JSON Proof Algorithms, and JSON Proof Token. Updates were:

Normatively defined header parameters used Populated IANA Considerations sections Allowed proof representations to contain multiple base64url-encoded parts Specified representation of zero-length disclosed payloads Added Terminology sections Updated to use draft-irtf-cfrg-bbs-signatures-05 Updated to use draft-ietf-cose-bls-key-representations-04 More and better examples Improvements resulting from a full proofreading

Continued reviews and feedback from implementations are requested.

4: Fully-Specified Algorithms for JOSE and COSE. Updates were:

Published initial working group document following adoption Added text on fully-specified computations using multiple algorithms Added text on KEMs and encapsulated keys Updated instructions to the designated experts

It was agreed during the JOSE meeting to describe what fully-specified algorithms for ECDH would look like, for consideration by the working group.

5: OAuth 2.0 Protected Resource Metadata. Updates were:

Switched from concatenating .well-known to the end of the resource identifier to inserting it between the host and path components of it Have WWW-Authenticate return resource_metadata URL rather than resource identifier

It was decided to start working group last call during the OAuth meeting.

6: COSE “typ” (type) Header Parameter. Updates were:

Added language about media type parameters Addressed working group last call comments Changed requested assignment from 14 to 16 due to conflict with a new assignment Addressed GENART, OPSDIR, and SECDIR review comments

This document is scheduled for the April 4, 2024 IESG telechat.

7: Barreto-Lynn-Scott Elliptic Curve Key Representations for JOSE and COSE. Updates were:

Changed to use key type EC for JOSE and equivalent EC2 for COSE for uncompressed key representations Changed identifier spellings from “Bls” to “BLS”, since these letters are people’s initials

We received feedback to not add compressed key representations to the draft.

8: Use of Hybrid Public-Key Encryption (HPKE) with JavaScript Object Signing and Encryption (JOSE). Updates were:

Use existing "alg": "dir" value for HPKE Direct Encryption mode Aligned choices more closely with those of Use of Hybrid Public-Key Encryption (HPKE) with CBOR Object Signing and Encryption (COSE) Defined both Integrated Encryption mode and Key Encryption mode Added IANA Considerations section Removed Post-Quantum Considerations

It was decided to start a working group call for adoption during the JOSE meeting.

Thanks to all who contributed to the progress made on these specifications, both before and during IETF 119!

Tuesday, 19. March 2024

Just a Theory

Extension Registry Namespacing RFC

A proposal for an additional level of name uniqueness for Postgres extension packaging and distribution, based on URIs.

A few weeks ago I brainstormed about decentralized Postgres extension publishing, inspired in part by an examination of Go decentralized publishing. It was…a lot. I’ve been deeply pondering the future of PGXN and the broader extension ecosystem, and want to start to nail down some decisions. To that end, I’d like to propose an update to extension namespacing.

Status Quo

There are currently three ways in which an extension is considered unique:

Only one extension can have a given name within a single Postgres cluster. Names are defined by the name of the control file. It is therefore not possible to have two extensions with the same name in the same Postgres cluster. PGXN follows this pattern: Only one extension can have a given name in the PGXN registry. The first person to release an extension then “owns” its name, and no one else can release an extension with the same name.1 I think dbdev follows the same pattern. Other registries like trunk and pgxman define an extension by the distribution name, at least for the purposes of selecting a binary to install. Thus when you trunk install postgis, you get all of the extensions included, as you’d expect, while trunk install address_standardizer wouldn’t work at all. In the few places that trunk supports installation by extension name, it prompts the user to use the appropriate package name if there’s a conflict. A Modest Proposal

I’d like to propose the following changes to the PGXN Meta Spec to start to move away from extension uniqueness in the broader extension ecosystem and more toward package name.

Add a new field, call it module_path, project_path, project_uri, that, if present, uniquely identifies an extension project and all of its parts. It should be to a Go-style module path (or URI) that identifies the project repository path where a META.json file lives. Retain the provides object where keys identify extensions, but those keys will no longer be globally unique to the registry. In other words, the combination of module_path and extension name uniquely identifies an extension, including an empty module_path. How it Works

Some examples. Let’s say there is an existing extension named pair, included in the distribution named pg_pair:

{ "name": "pg_pair", "version": "1.2.3", "provides": { "pair": { "file": "pair.sql", "version": "1.2.0" } } }

The extension name pair is unique, and pgxn install pair will download the pg_pair v1.2.3 bundle and compile and install pair v1.2.0.

Now someone else comes along and wants to make their own pair with this metadata:

{ "name": "my_pair", "version": "0.2.3", "provides": { "pair": { "file": "pair.sql", "version": "0.2.3" } } }

Just like today, this upload would be rejected, because there is already a registered pair extension. Under my proposal, they can disambiguate by providing a module_path:

{ "name": "my_pair", "module_path": "github/example/pair", "version": "0.2.3", "provides": { "pair": { "file": "pair.sql", "version": "0.2.3" } } }

This upload would be allowed. With these two releases, someone attempting to install pair would see something like this:

$ pgxn install pair ERROR: Duplicate extension name “pair”. Install one of these instead: * pgxn.org/dist/pair * github/example/pair

Note the the module path pgxn.org/dist/pair in the the first option. This is the default module path for distributions without a module path.2 But now the user can select the proper one to install:

$ pgxn install pgxn.org/dist/pair INFO: latest version: pgxn.org/dist/pair@1.2.3 INFO: building extension INFO: installing extension INFO: done!

Furthermore, the PGXN client will prevent the user from later installing a conflicting extension. The failure would look something like:

$ pgxn install github/example/pair INFO: latest version: pgxn.org/dist/pair@0.2.3 ERROR: Cannot install extension “pair” from pgxn.org/dist/pair: ERROR: A conflicting extension named “pair” is already installed ERROR: from pgxn.org/dist/pair Features with Benefits

I see a number of benefits to this change:

Compatibility with the v1 metadata spec, so that no data migration or distribution indexing is required. It loosens up extension namespacing (or name registration, if you prefer) while adding additional metadata to help users evaluate the quality of an extension. For example, does it come from a well-known developer? You can see it right in the module path. It creates a pattern to eventually allow auto-indexing of extensions. For example, if you run pgxn install github.com/example/pew, and PGXN doesn’t have it, it can look for a META.json file in that repository and, if it exists, and there’s a semver release tag, it could try to index it and let the user install it. There are ownership issues to be worked out, but it has possibilities. It preserves the Postgres core concept of extension identity while putting in place a well-established (by Go modules and widespread use of URIs in general) that the Postgres core could eventually adopt to allow more flexible extension namespacing. Request for Comments

What do you think? Good idea? Terrible idea? Please hit me with your thoughts on Mastodon, or via the #extensions channel on the Postgres Slack. I’d like to get this decision (and a few others, stay tuned!) nailed down soon and start development, so don’t hesitate? I need your help to prevent me from making a huge mistake.

Unless the owner would like to share ownership with someone else, in which case they can email me to request that another user be granted “co-ownership”. They can also request to transfer ownership to another user, after which the original owner will no longer be able to release the extension. ↩︎

Or, if the META.json file has a repository resource with a URL, PGXN could index it as the implied module path. Or, failing that, maybe it should fall back on the distribution name instead of a pgxn.org path, and prompt with pg_pair/pair↩︎

More about… Postgres PGXN PGXN v2 Extensions Go Namespacing Registry

@_Nat Zone

XNS 技術仕様書1.0版(2002-07-09)が出てきたので貼っておく

ファイルサーバをほっていたら、XNS Tecnical Specs version 1.0(2002年7月9日版) が出てきたのでここに貼っておきます。 XDI.orgのサイトが生きていればそこからダウンロードできればよかったのですが、サイトが放棄されてしまっているので、わたしが…

ファイルサーバをほっていたら、XNS Tecnical Specs version 1.0(2002年7月9日版) が出てきたのでここに貼っておきます。

XDI.orgのサイトが生きていればそこからダウンロードできればよかったのですが、サイトが放棄されてしまっているので、わたしが代わりに置き場所を提供します。

XNSは後にXRI/XDIになり、OpenID 2.0のコア仕様となった技術です。現在のDIDもその延長線上にあります。わりと参考になることが色々あると思います。昨今分散IDとかいう人はこれは読んでおいても良い仕様かとおもいます。

XNS_Technical_Specsダウンロード

The Pragmatic Engineer

Senior-and-Above Compensation in Tech

How well does tech pay, really? A deep look into almost 1,000 data points sent in by engineering managers, senior+ engineers, VP, and C-level folks in tech, mostly working in software engineering

In April 2023, I invited senior-and-above readers to share their compensation numbers, to research an accurate account of pay at startups, scaleups, and publicly traded companies. The survey was a collaboration with VC firm Creandum, which created an interesting report about early-stage compensation in Europe. Creandum shared a report, based on this data last year: Compensation Decoded: Tech leader salaries at early-stage startups.

I’m delighted to share what – I hope – is the most detailed-possible tech compensation report based on latest-available figures. I’ve taken time to slice and dice the data; so strap in because today we’re talking numbers – lots of them! We cover:

Overview of sources. 951 data points, mainly from engineering managers and senior+ engineers in the US, EU and UK.

Base salary, cash compensation, total compensation. A summary of how these compare at the 75th and 90th median percentiles.

Regional pay comparison. Unsurprisingly, the US pays best by quite a distance. However, other regions pay $200K+ in total compensation at the 75th percentile.

Compensation by seniority. How seniors with 5-10 years of experience, those with 10+ years, and executives, are paid across the globe: US, UK, EU, India, Asia, Australia and New Zealand, Latin America, Africa.

Compensation by funding stage. Pay at bootstrapped, early-stage, mid-stage, late-stage and public companies in the US, EU and UK.

A closer look at the 90th percentile, by funding stage. Something odd is happening to executive packages at late-stage companies: it’s extremely high!

Outliers. The top 3 data points from each category of founders, non-founder executives, and publicly traded companies execs. It’s interesting a principal engineer takes home more in cash compensation than most executives filling out the survey, in illiquid stock.

Interesting findings. Bootstrapped companies pay surprisingly well in cash terms, and neither sector nor engineering team size make a difference to how much you earn.

See also these articles related to this topic:

Compensation at publicly traded tech companies

A CTO’s guide to remote compensation strategies

1. Overview of sources

After cleaning the data, the report has 951 data sources. Around two-thirds of respondents work in engineering management or engineering leadership, and nearly a third in software engineering:

Responses by field. Engineering leads product, design, data engineering, and others Of 951 data points, 550 are engineering managers, senior+ engineers & directors-and-above

Region-wise, most of the data is from US, EU, and UK readers, but responses came in from every continent on earth – except Antarctica!

Data from around the world

Top 20 countries by response:

Data submitted per country 2. Base salary, cash compensation, total compensation

Let’s get into the numbers, first by comparing base salary data across all regions. Base salary is the part of your compensation that’s guaranteed, no matter performance or other factors. In the US, Canada, Australia and most of Latin America this is usually paid fortnightly, while in most of the EU and the UK it’s often once per month.

Here’s a cross-region comparison, with the median, 75th and 90th percentiles in US dollars:

Base salaries across regions in US dollars. Browse it in a Google Doc

Total cash compensation is base salary plus cash bonus (usually at the end of the year) and any sign-on bonus. You can’t guarantee to definitely get your total cash compensation, but knowing some numbers might help in negotiations:

Total cash compensation across regions. All numbers are in US dollars

Total compensation is cash total compensation, plus equity grant. This equity can be illiquid at privately traded startups and scaleups, or liquid at public companies. In context, ‘liquidity’ refers to an asset’s ability to become cash. For liquid equity, there’s the option of selling equity on the stock market for cash after it vests – assuming there’s no “blackout” period. Illiquid equity can be sellable at a liquidation event like an acquisition, going public, or a secondary equity sale.

Breakdown by region:

Total cash compensation across regions in US dollars. Note, the larger differences in the 90th percentile: that’s stock-heavy compensation 3. Regional pay comparison

Let’s see all these numbers in a single chart for easy comparison:

Read more

Monday, 18. March 2024

@_Nat Zone

NHKクロニクルが便利〜過去の番組の検索

NHKの放送した番組というのは国民の財産なわけですが、過去に放映した番組がわからないと、その財産のありかもわからないことになります。そのためには、番組についてのメタデータが検索できるようになっているのが望ましいですが、その第一歩として、過去の番組表が検索できるだけでもだいぶ違いま…

NHKの放送した番組というのは国民の財産なわけですが、過去に放映した番組がわからないと、その財産のありかもわからないことになります。そのためには、番組についてのメタデータが検索できるようになっているのが望ましいですが、その第一歩として、過去の番組表が検索できるだけでもだいぶ違います。

現在のNHKのメインサイト「nhk.jp」ではサイトトップから番組表検索へのリンクがありますが、これでは直近分しか検索することができません。これに対してNHKクロニクルでは過去分を検索することができます。

【図表】NHKクロニクルでは過去の番組表を検索できる(出所)https://www.nhk.or.jp/archives/chronicle/

ためしに、サイトウ・キネンと入れて検索してみましょう。すると、どさっと大量の検索結果が逆時系列で出てきます。わたしが知りたかったのは2007年に放映されたことを覚えている番組なので、そのあたりまで時系列を遡っていきます。

【図表】14ページ目で2007年に到達

今回は右上のページ表示をクリックして徐々に遡っていきましたが、URLに「&page=14」のように記入することによってページを飛ぶこともできるようです。こうやっていって、わたしが知りたかった番組は2007年12月29日に放映されていたことがわかりました。

【図表】知りたかった番組は2007年12月29日のっぽい

といっても、上記の画像からわかるように、このページだけではどんな番組だったかわかりません。なので、「番組詳細」を順番にクリックしていってこの番組だとわかったのでした。

【図表】詳細をクリックしていって番組を発見 (出所)https://www.nhk.or.jp/archives/chronicle/detail/?crnid=A200712291800001301000

これ、実はサイトウ・キネン・フェスティバルでデュティユー自身も来日して「Le Temps l’Horloge」が初演された録画でとても貴重なものです。作曲者自身が拍手しているところが写っています。また、「瞬間の神秘」も録画されているのはなかなか珍しいのではないかと思います。

この詳細ページのURLは「https://www.nhk.or.jp/archives/chronicle/detail/?crnid=A200712291800001301000」となっています。このcrnid というのがこのコンテンツの識別子のようです。(←ちゃんとIDネタを振った!)

放送時期がよくわからなかったらどうするか:=>&検索もできる!

上記では、放送時期をわたしが覚えていたので遡っていって検索するという技がつかえました。これだと放送時期がわからないと検索がすごく大変になります。ここで「デュティユー サイトウ・キネン」のように&検索ができればぐっと対象を絞れます。なので、やってみましたらできました。半角スペースでキーワードを区切って検索すればよいようです。

【図表】&検索も可能 (出所)https://www.nhk.or.jp/archives/chronicle/search/?keyword=デュティユー%E3%80%80サイトウ・キネン

この年2回放映されていることがわかります。わたしが実際に見たのは9月8日の生中継の方ですね。実際に演奏された直後なので。番組詳細を見てみます。

【図表】こちらのほうが番組詳細が詳しい (出所)https://www.nhk.or.jp/archives/chronicle/detail/?crnid=A200709081900001301000

なんと、こちらのほうが番組詳細が詳しい!「瞬間の神秘」がそれだけでなく、正式タイトル「瞬間の神秘(24の弦楽器、ツィンバロムと打楽器のための)1」としてのっています。12月のは再放送なので内容は同じはずですが、番組情報には差があるのですね。一つ一つ見に行ったほうが良さそうです。

Google検索ではひっかからない

このように便利に検索できるNHKクロニクルですが、残念なことがひとつあります。それはGoogle検索でひっかからないということです。試しに「site:www.nhk.or.jp 瞬間の神秘(24の弦楽器、ツィンバロムと打楽器のための)」と検索してみて下さい。2024年3月18日現在、何も返ってきません。これは、robots.txt でクローリングが禁止されているからですね。

User-agent: *
Disallow: /*.cgi$
Disallow: /*.cgi?
Disallow: /*/api/
(中略)
Disallow: /chronicle/

【図表】chronicleがrobots.txtで禁止されている (出所)https://www.nhk.or.jp/robots.txt

国民の利便性という観点ではちょっと残念な気がします。ここはallowしてほしいところです。2

というわけで、NHKクロニクルが便利。でもGoogleからは検索できないので直接NHKクロニクルに行って検索しないとだめ、というお話でした。あと、こうして検索したものにどうやったらアクセスできるようになるかですね。デジタル化の進展だけでなく権利処理の話も関わってきますが、このあたり、アクセスしやすくなるとよいですね。


Just a Theory

Talk: Building Trunk

Ian Stanton will be giving a talk, “Building Trunk: A Postgres Extension Registry and CLI” on Wednesday at noon US Eastern / 16:00 UTC.

Update: 2024-03-22: Slides and video linked below.

This week, my fellow Tembonaut Ian Stanton will present the extension ecosystem mini-summit talk, “Building Trunk: A Postgres Extension Registry and CLI”. We felt it important to get some insight from a couple of the recently-developed Postgres extension registries: what problems they set out to solve, how the were built and operate, successes addressing their issues, and what issues remain, both for the projects and the ecosystem overall. Ian plans to give us the low-down on trunk.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-22: Such a great discussion, was great! Missed it? I got you. Links:

Video PDF Slides Notes More about… Postgres Trunk Extensions PGConf Summit

@_Nat Zone

デジタル社会の信頼を構築する「デジタルID」トラスト、日本のデジタルIDのこれから

わたしの顧問先のPwC Japanグループ主催のDigital Identity and Privacy Tech Forum 2024【配信期間:2024年3月11日(月)~2024年4月30日(火)】に出演しています。同フォーラムは、「デジタル化に伴うプライバシー保護領域の重…

わたしの顧問先のPwC Japanグループ主催のDigital Identity and Privacy Tech Forum 2024【配信期間:2024年3月11日(月)~2024年4月30日(火)】に出演しています。同フォーラムは、「デジタル化に伴うプライバシー保護領域の重要テーマであるデジタルアイデンティティとプライバシーテックの最新動向について詳しく解説」するオンライン・オンデマンドのフォーラムで、いくつかのセッションが提供されています。

デジタル社会の信頼を構築する「デジタルID」トラスト、日本のデジタルIDのこれから デジタルアイデンティティを支える技術―ソリューション導入の失敗事例と成功事例― プライバシーガバナンス最前線―データ利活用を促進していくために、いま企業に求められていること― プライバシーテックの活用と今後の展望

わたしが出演しているのは、このセッションを企画したPwCコンサルティング合同会社 ディレクター 柴田健久率いる、【1. デジタル社会の信頼を構築する「デジタルID」トラスト、日本のデジタルIDのこれから】です。

セッションの参加者

ご一緒させていただいたのは、

デジタル庁セキュリティアーキテクト 満塩尚史 OIX チーフ・アイデンティティ・ストラジテスト ニック・マザショウ PwCコンサルティング合同会社 ディレクター 柴田健久

の3氏に、パーソナリティとしてフォーブスジャパンの藤吉氏が出演しておられます。

内容は

「デジタルアイデンティティの概念」 「グローバルレベルでの、デジタルIDトラストの標準化の動き」 「日本の政府関係者のIDエコシステムへの期待」 「日本企業の実態と今後の活用に向けた議論」 「未来への展望」

といったことについて各人の見解と意見交換を行っています

収録はLEDで壁や床が作ってあるスタジオ1でで、なかなか良い経験2をさせていただきました。最近の視力低下でカンペが全く見えない〜それどころか次のトピックも見えない状況でアドリブ力全開でやっていますが、いかがでしょうか。ぜひご覧になってご感想をお聞かせ願えればと思います。

なお、他の3セッションもタイトルからして非常に興味を惹かれるものです。まだわたしも見ていませんが、ぜひ時間を作って視聴したいと思っています。

視聴申込はこちらのページよりお願いいたします。

https://www.pwc.com/jp/ja/seminars/digital-identity-forum2024.html

Friday, 15. March 2024

Just a Theory

Mini Summit One

A rough transcript of my talk “State of the Extension Ecosystem”, along with interspersed chat comments and appearances by Bagel.

Great turnout and discussion for the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28. Thank you!

The talk, “State of the Extension Ecosystem”, was followed by 15 minutes or so of super interesting discussion. Here are the relevant links:

Video Keynote PDF Slides

For posterity, I listened through my droning and tried to capture the general outline, posted here along with interspersed chat history and some relevant links. Apologies in advance for any inaccuracies or missed nuance; i’m happy to update these notes with your corrections.

And now, to the notes!

Introduction

Introduced myself, first Mini Summit, six leading up to the in-person summit on May 28 at PGConf.dev in Vancouver, Canada.

Thought I would get it things started, provide a bit of history of extensions and context for what’s next.

Presentation

Postgres has a long history of extensibility, originally using pure SQL or shared preload libraries. Used by a few early adopters, perhaps a couple dozen, including …

Explicit extension support added in Postgres 9.1 by Dimitri Fontaine, with PGXS, CREATE EXTENSION, and pg_dump & pg_restore support.

Example pair--1.0.0.sql:

-- complain if script is sourced in psql and not CREATE EXTENSION \echo Use "CREATE EXTENSION pair" to load this file. \quit CREATE TYPE pair AS ( k text, v text ); CREATE FUNCTION pair(text, text) RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;'; CREATE OPERATOR ~> (LEFTARG = text, RIGHTARG = text, FUNCTION = pair);

Bagel makes an appearance.

Example pair.control:

# pair extension comment = 'A key/value pair data type' default_version = '1.0' module_pathname = '$libdir/pair' relocatable = true

Example Makefile:

EXTENSION = pair MODULEDIR = $(EXTENSION) DOCS = README.md DATA = sql/pair--1.0.sql TESTS = test/sql/base.sql REGRESS = base REGRESS_OPTS = --inputdir=test MODULES = src/pair PG_CONFIG ?= pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS)

Build and Install:

$ make make: Nothing to be done for `all'. $ make install mkdir -p '/pgsql/share/extension' mkdir -p '/pgsql/share/pair' mkdir -p '/pgsql/share/doc/pair' install -c -m 644 pair.control '/pgsql/share/extension/' install -c -m 644 sql/pair--1.0.sql '/pgsql/share/pair/' install -c -m 644 README.md '/pgsql/share/doc/pair/' $ make installcheck # +++ regress install-check in +++ # using postmaster on Unix socket, default port ok 1 - base 15 ms 1..1 # All 1 tests passed.

CREATE EXTENSION:

$ psql -d try -c 'CREATE EXTENSION pair' CREATE EXTENSION $ pg_dump -d try -- -- Name: pair; Type: EXTENSION; Schema: -; Owner: - -- CREATE EXTENSION IF NOT EXISTS pair WITH SCHEMA public; -- -- Name: EXTENSION pair; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION pair IS 'A key/value pair data type';

Many of us saw opportunity in this new feature.

PostgreSQL today is not merely a database, it’s an application development platform.

— Me, 2010

Proposed to build PGXN. Raised funds to build it in late 2010. Launched site April 2011; Daniele Varrazzo released CLI, and Dickson Guedes released the dev CLI.

Problems PGXN set out to solve:

Source code distribution with user registration and namespacing Discovery: Search, read docs, brows tags Installation: CLI to compile and install using PGXS or Configure

PGXN Components:

Meta Spec Manager Root Mirror API Server Site Client Utils CLI CI/CD Image

Problems out of scope for PGXN:

Binary packaging and distribution Defer to apt/yum Developer tooling (though dev utils helped) Build tooling Defer to core (PGXS)

PGXN Shortcomings:

Little development since 2012 Search limitations Docs preferred Most distributions have few docs Some issues addressed in last few weeks Source of Record Minority of available extensions on PGXN Releases uneven or neglected

In classic SDLC fashion, PGXN POC shipped as an MVP and was neglected.

— Me, Just Now

Been peripheral to Postgres extensions for the last 10-12 years, but some things have happened.

Non-Core extension counts:

PGXN: 382 joelonsql/PostgreSQL-EXTENSIONs.md: 1,186

Daniele asks about that last source, which is just a list in a gist.

Joe Nelson links to the gist in Zoom chat. It is not his list, contrary to my off-the-cuff guess

Why haven’t extensions taken off?

Lost Opportunities

No one canonical source to discover and install extensions Difficult to find and discover extensions without canonical registry Most extensions are under-documented and difficult to understand They are also hard to configure and install; most people don’t want or need a compiler The maturity of extensions can be difficult to gauge, not systematized, must each be independently researched David Christensen in Chat “attention economy/awareness, NIH, etc” Jeremy S in chat: “Maybe some people don’t know they are using extensions (I think that’s possible to some degree)” There is no comprehensive binary packaging Centralized source distribution is insufficient (even if it were complete) jubilee in chat: Trust aspect? David Johnson in chat: To seem legit you need to touch the repo at least annually to ensure it works on the newest major release. Even if you just do your compile and update the readme. I mention using pgxn-utils and GitHub workflows to ensure my extensions continue working There is insufficient developer tooling; pgxn-utils not well-maintained, don’t build on recent Rubies, but pgrx has a lot of Rust-oriented tooling Eric in chat: ❤️ jubilee in chat: 🦀 mentioned!

Filling the Gaps

dbdev: “The Database Package Manager for Trusted Language Extensions”: Includes only TLEs, no binary extensions

trunk: “A Postgres Extension Registry”: Binary distribution of curated extensions, desires to be comprehensive and cross-platform

pgxman: “npm for PostgreSQL”: Binary Apt package distribution of curated extensions packaged with, desires to be comprehensive and cross-platform

Emphases: Ease of Use. Screenshot from pgxman:

$ curl -sfL https://install.pgx.sh | sh - 👏🎉 pgxman successfully installed $ pgxman install pgvector The following Debian packages will be installed: postgresql-14-pgxman-pgvector=0.5.1 Do you want to continue? [Y/n] y pgvector has been successfully installed.
Daniele in chat: “Missing a “curl | sudo sh” there…. 👀” Greg Mullane (CrunchyData) [he/him] in chat: “Really not a fan of that “pipe curl stuff from internet into sh” system.” Jeremy S in chat: “Someone recently reprimanded me for putting curl | psql in an extension README. From a security perspective it probably sets a better example to do curl >file.sql … psql file.sql (encourage users not to run from Internet but read/review first)” * jubilee in chat: “apt/yum install is just a better UI over curl | sh :^)” Jeremy S in chat: “Yes and once you’re to that point there’s already more supply chain verification happening” Jeremy S in chat: “It’s usually just the initial bootstrap into any system, if the setup wasn’t already in your distro”

Emphases: Platform neutrality. Screenshot from trunk:

Architecture x86-64 Operating system Debian/Ubuntu

Emphases: Stats. Screenshot from dbdev:

Downloads

20 all time downloads 0 downloads in last 30 days 1 download in last 90 days 0 downloads in last 180 days

Emphases: Curation. Screenshot from trunk:

Featured 7 Analytics 13 Auditing / Logging 7 Data Change Capture 6 Connectors 27 Data / Transformations 49
Damien Clochard in chat: gtg, see you later guys !

MVPs

trunk: Manual integration, Currently Debian-only

pgxman: Form-based submission, Currently Apt-only

dbdev: TLEs only, CLI publishing

David Christensen in chat: “go has a pretty good extensions infra, imho, wrt discovery/docs, etc. also has the benefit of the package names being the URL to access it, which is a nice convention.”

New Opportunities Today

What are the community opportunities for the extension ecosystem?

Some ideas:

Improved dev tools: More than pgxn-utils and pgrx Canonical registry: All publicly-available extensions in one pac3 Easy publishing: auto-discovery or CI/CD pipeline publishing Continuous Delivery: CI/CD pipeline publishing File-free installation: TLEs Documentation: Something like Go docs or Rust docs File management: Put all the files for an extension in one directory Improved metadata Library Dependencies: utilities used by extensions Platform Dependencies: system packages Build pipelines: PGXS, pgrx, make, cpan, pypi, etc. Artifacts: Binaries build on release Classification: Curated in addition to tags Extension Types: Extensions, apps, background workers, loadable libraries Derived Services Binary Packaging: Distributed binaries for many platforms Ratings & Reviews: Stars, thumbs, comments Aggregated Stats: Repository stats, etc. Smoke Testing: Matrix of multi-platform test results Security Scanning: Reporting vulnerabilities Badging & Curation: Third-party classification, badging various statuses

Extension Ecosystem Summit

Collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

🏔️ Your Summit Organizers

David Wheeler, Tembo, PGXN Jeremy Schneider, AWS, dsef David Christensen, Crunchy Data, pg_kaboom Keith Fiske, Crunchy Data, pg_partman Devrim Gündüz, EnterpriseDB, yum.postgresql.org

Devrim Gunduz in chat: Thanks David!

Schedule:

March 6: David Wheeler, PGXN: “State of the Extension Ecosystem” March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI” April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions” April 17: Jonathan Katz: “TLE Vision and Specifics” May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod” May 15: David Wheeler, PGXN: “Community Organizing Summit Topics”

Ultimately want to talk about what’s important to you, the members of the community to make extensions successful.

Discussion

Eric: I’m Eric Ridge, one of the developers behind pgrx, as you’re going through this process of building a modern extension ecosystem, let us know what we can do on the Rust side to help make your lives easier, we’re happy to help any way we can.

Steven Miller in chat:

These are some areas of interest we noticed building Tembo

Binary packaging / distribution:

Variable installation location System dependencies / uncommon system dependencies or versions Chip specific instructions (e.g. vector compiled with avx512) Extension-specific file types / extra data files (e.g. anonymizer .csv data)

Turning on extensions automatically

Different ways to enable extensions does it need load (shared_preload_libraries, session_… etc)? Does it use create extension framework? Does it require a specific schema? What about turning on in multiple DBs at the same time in the same cluster, with background worker? Disabling, what data will be lost? Validating safety / user feedback on upgrade?

In cloud / SaaS:

Installing + enabling extensions quickly, without restart Persisting extension files Extension-specific files (e.g. libraries) versus postgres’ libraries Updating Troubleshooting crashes / core dumps

Anyone else have similar problems / tips?

Steven Miller: These were just things I noted during the presentation. Curious if these are interesting to others on the call.

Daniele in chat: “Regards binary distributions, python wheels might be a useful reference.”

Steven Miller: That’s good point! What do people think of idea to just install extensions onto servers, not packages, persisted on the disk, next to PGDATA so they go into a single persistent volume, and the rest is managed by an immutable container.

Daniele: Had experience on Crunchy where we had to replace an image to get an extension. Looked for feature to have a sidecar or a volume with the extension.

Steven Miller: Didn’t have a separate directory just for extensions, it’s just pg_config --libdir fore everything. Had to persist entire directory, including those files form the base build, their internal files. Would have been nice to have a separate directory, extra-libdr or extra-sharedir, something like that.

Yurii Rashkovskii: I was working on a patch to do exactly that, but haven’t completed it. Was going to introduce additional directories to search for this stuff.

Steven Miller: That would be really awesome.

Jeremy S in chat: “Advantage of that is that a single image can be shared among systems with different needs”

Eric in chat: “Thoughts around “enterprise repositories” that could be self-hosted and disconnected from the internet?”

Ian Stanton in chat: “I’ll touch on this in the next talk, it’s crossed our minds when building the Trunk registry”

Steven Miller: I think that’s a great idea.

Bagel reappears.

David Wheeler: PGXN originally designed so anyone could run Manager and their own root mirror, and maybe rsync from the community one. Don’t know that anyone ever did, it’s a little complicated and most people don’t want to work with Perl. [Chuckles]. Definitely think there’s space for that. If you work with Java or Go or maybe Rust, lots of orgs like Artifactory that provide internal registries. Could be cool use case for Postgres extensions.

David Christensen in chat: “something that could support extension batches; like groups of related extensions that could be installed in bulk or loaded in bulk (so could accommodate the shared systems with different individual extension needs, but could be shared_preload_library configured)”

“Steven Miller” in chat: “Sounds familiar”

Greg Mullane (CrunchyData) [he/him] in chat: “All these items remind me of CPAN. We should see what things it (and other similar systems) get right and wrong. I’ve learned from CPAN that ratings, reviews, and badging are going to be very difficult.”

David Christensen in chat: “I’d assumed at the time that it was largely lifted (at least in philosophy/design) from CPAN. 🙂” David Wheeler (he/him) in chat: “yes”

Jeremy S: I think this is mostly focused on developers, but I had recent experience where multiple people in the past few months, new to Postgres, are trying to understand extensions. They install a version and then see there are like 15 versions installed, so confused. Goes back to the install file. Bit of UX angle where there are sharp edges where people trying to make sense of extensions, the flexibility makes it hard to understand. Some might be some nice guides, some architectural things explaining PGXS, or improvements to make to the design. Related, not main topic, but good to keep end user UX and devs building on Postgres but not Postgres developers, who run their businesses.

David Wheeler: Yeah all the files can be confusing, which is why I think trunk and pgxman trying to simplify: Just run this command and then you have it.

Steven Miller in chat: “I really agree with what Jeremy is saying. Right now PGXN and Trunk are taking and approach like “whatever an extension could do / how it may work, it should work on this registry”. But I think more standards / “what is a normal extension” would make the UX much easier.”

Jeremy S: Even with that the available extensions view is still there. Some of that is just schema management, and that’s how core is doing schema management.

Steven Miller in chat: I exactly agree about the concern about multiple extensions within a package. Also version and name mismatches

David Wheeler: And not everything is an extension, you just want to work, or your extension is just utility like pg_top you just want to use. Extensions I think were a tremendous contribution to Postgres itself, but a lot of it was wrangling the existing system for building Postgres itself to make it work for that. Could be very interesting, though quite long term — and I know Dimitri has tried this multiple times — to build a proper package management system within Postgres itself, to eas a lot of that pain and burden.

Tobias Bussmann in chat: “Thank you for the great overview and for taking this topic further! Unfortunately, I’ll not be able to join at pgConf.dev but will follow whatever results this will lead to. As a package maintainer, I am constantly looking in a easy way to allow users to use extensions without having to package everything ;)”

Steven Miller in chat: “Like auto explain for example right. i.e. a LOAD only “extension””

*Yurii Rashkovskii: An interesting topic, what extensions are capable of doing and how they can be more self-contained. Like Steven was saying in chat: how easy is it to load and unload extensions. Example: want an extension to hook into a part of Postgres: executor, planner, etc. How do you go about enabling them? How you unload them, introspect, list of hooks.

Omni extension provides a list of all hooks, and when you remove an extension it removes the hooks that provide the extension, but still not part of the core. Hooks one of the greatest ways to expand the functionality of Postgres, allows us to experiment with Postgres before committing to the full cycle of getting a patch into Postgres. Lets us get it to users today to try. if it makes a lot of sense and people want it, time to commit to the process of getting a patch committed. But if we don’t hve this venue, how to get extensions in, our ability to try things is limited.

jubilee in chat: Hmm. It seems my audio is not working.

David Wheeler: The next session is two weeks from today: Ian Stanton is going to talk about “Building Trunk: A Postgres Extension Registry and CLI”. Will be interesting because a number of people have decided to build a binary packaging system for extensions, just to air out what the challenges were, what problems they wanted to solve, what problems remain, and where they want to take it in the future.

Jeremy S* in chat: “Bagel clearly has something to say”

*David Wheeler: jubileee I see your audio issues, do you just want to type your question into chat? We can also discuss things in the #extensions channel on the Postgres Slack

David Wheeler: Thank you all for coming!

jubilee in chat: “I had a question which is about: Does Postgres actually support docs for extensions? Like, a lot of people don’t really WANT to read a README. Can you get docstrings for a function in psql?”

Ian Stanton in chat: “Thank you David!”

jubilee in chat: And if not, why not?

Post Presentation Discussion

From Slack:

David Wheeler: I see now that “jubilee” left their question in the Zoom chat. David Wheeler: The closest attempt at this I’ve seen is pg_readme, which will generate Markdown for an extension from comments in the catalog and write it out to a file. David G. Johnson: The comment on command adds in database comments that psql describe commands should display.

Also on Slack, Greg Sabino Mullane started a longish thread on the things we want to do and build.

More about… Postgres PGXN Extensions PGConf Summit Bagel

Thursday, 14. March 2024

The Pragmatic Engineer

The Pulse #85: The Pulse #85: is the “AI developer”a threat to jobs – or a marketing stunt?

One startup released “the first AI software engineer,” while another aims to build a “superhuman software engineer.” As intimidating as these sound: what if it’s more marketing than reality?

👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Apple unbans Epic’s developer account in just one day; The two, distinct types of layoffs happening across the industry; OpenAI’s CTO unwilling to answer what videos their Sora video product was trained on; Reddit prepares to go public; and more.

Is the “AI developer” a threat to jobs – or a marketing stunt? Magic.dev raised $100M to build “a superhuman software engineer,” and Cognition Labs released what they refer to as “the first AI software engineer.” Looking closer, Cognition Labs’ product looks more similar to an AI coding assistant. However, Microsoft has already captured the space of “AI copilots,” – and new startups like Cognition Labs are forced to make bold claims to capture attention.

A trend of lower equity grants, with forward-heavy vesting? DoorDash is offering 40% lower equity grants – but by changing equity vesting to be forward-heavy, the first year’s compensation remains unchanged. We will probably see more companies reaching for creative solutions to both reduce equity compensation but to also reward top performers more efficiently. And to not have to worry about the “four-year-cliff” anymore.

1. Industry pulse Apple unbans Epic’s developer account, for now

A day after we discussed the news of Apple blocking Epic from launching an alternative App Store in Europe, the Coupertion tech giant restored Epic’s developer account access. This came after the European Commission requested more information from Apple. As I originally wrote, I did not understand what Apple hoped to gain with this ban:

“I expect Epic Games will either sue Apple with the EU commission, then win, and be allowed to run an alternative App Store. Alternatively, the EU commission will order Apple to restore Epic’s account.”

Indeed, Apple had no reasonable choice but to restore this account, or invite further fines.

All correspondence between Epic and Apple is now public, and it’s not pretty for Apple. It reveals that It asked Epic to confirm the games company will act in good faith, and comply with Apple’s regulations, back at the start of this episode. Epic CEO, Tim Sweeney, replied in an email, writing:

“Epic and its subsidiaries are acting in good faith and will comply with all terms of current and future agreements with Apple, and we’ll be glad to provide Apple with any specific further assurances on the topic that you’d like.”

Apple responded by letter, banning Epic and referencing a tweet from Epic’s CEO that lightly criticized Apple. Apple wrote:

“Mr Sweeney’s response to [the request asking for why Apple should trust Epic] was wholly insufficient and not credible. It boiled down to an unsupported, “trust us.” History shows, however, that Epic is verifiably untrustworthy, hence the request for meaningful commitments. And the minimal assurances in Mr Sweeney’s curt response were swiftly undercut by a litany of public attacks on Apple’s policies, compliance plan and business model. As just one example [link to this tweet by Epic’s CEO]”

This ban was indefensible to the EU Commission. It also reinforced that Apple sees it as their right to ban any developer, and that social media posts are grounds for doing so. In the EU, developers like Epic can turn to the EU Commission. Outside of this jurisdiction, it’s still tough luck if you criticize Apple and get removed from their platform as a result.

I am appalled that Apple makes no attempt to hide that they act as overlords to developers on the iOS platform. But, at the end of the day, it’s their platform until regulators impose rules on it. The worse Apple treats developers, the more of them will join Epic and Spotify in lobbying lawmakers to force Apple’s hand in their favor. I think Apple would do itself a favor by treating app developers less condescendingly.

Cuts are still happening, here and there

Read more

Tuesday, 12. March 2024

The Pragmatic Engineer

What Changed in 50 Years of Computing: Part 1

How has the classic book on software engineering, ‘The Mythical Man Month,’ aged with time, and is it still relevant half a century on – or does it belong in a museum, alongside floppy discs?

‘The Mythical Man Month’ by Frederick P. Brooks, Jr. is a bona fide classic in the software industry. First published in 1975, and with an updated edition in 1995, the text is almost 50 years old. Lots of time has passed, yet the book is still relevant.

The title of the book takes aim at the “myth” that software development can be measured in “man months,” which Brooks disproves in the pages that follow:

“Cost [of the software project] does indeed vary as the product of the number of men and the number of months. Progress does not. Hence the man-month as a unit for measuring the size of a job is a dangerous and deceptive myth.”

I’ve been making my way through this book from software’s earliest days and taking notes of what’s remained the same in the 50 years since Mythical Man Month came out, which predictions the book got right or wrong, and what’s different about engineering today.

The author worked at IBM as a project manager of the OS/360 operating system, one of the most complex software projects in the world, at the time. Possibly the book’s most famous insight is that adding manpower to a software project that’s already behind schedule will delay it even more – which likely comes from Brooks’ personal experience at a big company. This insight seems obvious to many of us today, but it was Brooks who saw it first, which speaks to why the book still matters.

In today’s issue, we cover:

A criticism of the book: gendered language

Joys and woes of programming, then and now

Why do we ship faster now than in 1975?

Does “Brooks’s law” still apply? 

Do we spend more time coding than 50 years ago?

The “10x engineer,” then and now

The Mythical Man Month: the 1995 edition 1. A criticism of the book: gendered language 

One thing modern readers will notice is that only the male pronoun is used throughout. Brooks was writing in the early-mid 1970s, when there were plenty of women programmers in the field. Indeed, it’s estimated that in the 1960s between 30-50% of all programmers were women, by the book Broad Band: The Untold Story of the Women Who Made the Internet by Claire L. Evans. 

Despite the reality, the Mythical Man Month exists in a workplace populated entirely by other men. Every engineer is a “he,” and “he” does the programming, testing, and documenting. The invisibility of women in the text seems a puzzling omission today; second-wave feminists might say it illustrates why their struggle against engrained discrimination and inequality had to happen. 

Women in software at the time included Jean E. Samme, who published the book Programming Languages: History and Fundamentals in 1969 – one of the most popular books in the field. Meanwhile, Margaret Hamilton was responsible for programming the onboard flight software for the Apollo space mission. Not only that, she’s also credited with coining the term, “software engineering.” 

The absence of women was partially rectified in the later edition, in which four new chapters were added. The 1995 volume uses “people” instead of the male form, and drops “he” as the default gender.

Now, let’s reflect on the rest of the work.

2. Joys and woes of programming, then and now

In Chapter 1, “The Tar Pit,” Brooks asks why programming is fun, and gives five reasons:

Read more


Timothy Ruff

The American General Data Protection Regulations (AGDPR)

AGDPR doesn’t exist, but it should. In 2012 the Europeans released the first proposal for their General Data Protection Regulations (GDPR), which ran a four-year political gauntlet and evolved into its final, official version in 2016. Because of the ghastly surveillance situation we now find ourselves in America and elsewhere, perpetrated by both private industry and government, I think Amer
AGDPR doesn’t exist, but it should.

In 2012 the Europeans released the first proposal for their General Data Protection Regulations (GDPR), which ran a four-year political gauntlet and evolved into its final, official version in 2016. Because of the ghastly surveillance situation we now find ourselves in America and elsewhere, perpetrated by both private industry and government, I think America also desperately needs comprehensive privacy rules, rules informed by now eight years of hindsight and infused with an American value not found in Europe’s GDPR: privacy from government.

If I could wave a magic wand, an American GDPR would improve on Europe’s GDPR in two fundamental ways:

1. Eliminate the idea that data about people can be “anonymized” or “de-identified”.

Data about people can no longer be effectively de-identified by removing personally identifying information (PII) from the dataset. Maybe this was generally still possible in 2016, but not any more.

GDPR set off a wave of innovation among data scientists who, like the old game “Name That Tune”, took the new regulation as a challenge and developed numerous ways to re-identify people using only a few data points that aren’t considered PII, techniques they’ve since made widely available. And that was before the recent explosion of AI tools which make re-identification easy without the direct use of such tools. Unfortunately, fully GDPR-compliant “de-identified” data sets are now trivially easy to re-identify for anyone with access to a computer.

(For more details on this topic, see this Harvard paper about “simple” attacks that defeat GDPR’s PSO Secure / “singling out” privacy protection provisions.)

If almost any data about a person is usable to re-identify them, then the whole concept PII is no longer relevant; any data about a person is PII. That is precisely how an AGDPR definition of “personal data” can improve upon the European one: any data about an person. It’s both simpler and more accurate.

2. Include protections for privacy *from government*, in addition to privacy from the private sector.

Europe’s GDPR specifically exempts government, AGDPR should do the opposite: it should bind government’s activities in relation to obtaining, using and retaining personal data.

There are obviously necessary and important reasons for government to obtain, use, and retain personal data, and those reasons should be explicitly enumerated and protected. But the U.S. federal government has gone absolutely bananas when it comes to how much of our personal data they are obtaining about us and our activities, through legal but largely unchallenged means. The degree to which it’s happening is way beyond what’s reasonably needed in a free country that espouses limited government, and that has a Fourth Amendment that secures:

“the right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures”.

How is data about us not our “effects”? How is government gathering our effects en masse without our explicit consent not “unreasonable”?

Journalist Byron Tau’s new book about U.S. government surveillance of law-abiding citizens, Means of Control, is outrageous. (Summary article here.) You cannot read it and not become deeply alarmed at the enormous, secret, ubiquitous surveillance state that already exists in the U.S. No doubt European governments are doing the same with their citizens’ data, but — and I’m generalizing here — they don’t have the same foundational distrust of government that the U.S. was born with and so may not be bothered enough to do anything about it, as evidenced by government’s total exemption from their GDPR. I hope, and believe, that once more Americans become aware of the extent of what’s happening, an American GDPR that clearly defines government’s role in privacy can become a real, politically viable possibility.

There is a model to follow. Christopher Bramwell, the Chief Privacy Officer of Utah, was the driving force behind a groundbreaking new piece of privacy legislation just enacted in Utah that constrains how personal data must be treated by government, without saying anything about private industry. As far as I know it’s the first of its kind, anywhere, and hopefully the first of many to follow in other states and countries. This bill gained a fortunate political boon as it advanced: it became a ‘vote against privacy’ to not support it; it passed unanimously.

Of course, privacy from big tech, surveillance capitalism, and private industry in general is a big deal, but private industry doesn’t have a monopoly on violence and taxes like government does; they mostly just want to sell me stuff. Still, the European GDPR shows an example of how to constrain private industry, and an American GDPR should learn from and incorporate those lessons.

The ideal outcome of an AGDPR would provide privacy both from private industry and from government, and finally put a halt to all federal activities of indiscriminate obtaining, using, and retaining of our personal digital “effects”, whether or not they’re considered PII.

Monday, 11. March 2024

Phil Windleys Technometria

Decentralizing Energy

The end of the Age of Oil

The Accidental Superpower discusses a post-Bretton Woods world where easy, safe shipping cannot be assumed. I think in this "scared new world," renewables change the dynamics of energy flows by moving production closer to energy use.

My wife, Lynne, recently gave me a copy of Peter Zeihan's book, The Accidental Superpower: Ten Years On. The book was originally published in 2014, but Zeihan has updated it by inserting chapters talking about what he got right in 2014, what he got wrong, and why. The focus of the book is geopolitics—how geography and demographics shapes the world order—and how Bretton Woods changed that in significant ways. The book makes the case that so much of what made Bretton Woods useful to the US and why the US engaged with the rest of the world for the 70 years following World War II is changing. As it changes the free trade system enabled by Bretton Woods is also changing. This will have significant impact on every country in the world.

Much of what changes has to do with energy. One of the things1 Zeihan got right was his assertion that unlike much of the rest of the developed world, the US doesn't need to import energy—specifically oil—we are a net energy importer. This changes the dynamic wherein the US is willing to be the protector of shipping lanes for the entire world. As a result, the future could see a US that has the luxury of ignoring events in the Middle East, Ukraine, and elsewhere, whereas Europe (to take just one example) cannot. The book is full of other interesting predictions and conclusions just like this one. I encourage you to read it if you find this as fascinating as I do.

Zeihan makes a big deal of shale oil production, which accounted for 66% of US production in 2022. But as I read this, I was thinking about renewables. As I wrote in 2020, I've gone in big on solar power at my house, love my EV, and have replaced most things in the house (like the furnaces) with versions that run on electricity.  I did this because it made my life easier and saves me money. The fact that it's good for the environment is a bonus.

But, solar and wind are not just renewable, they also allow energy production to be decentralized in ways oil and natural gas can't. Oil and natural gas deposits are where they are. Some countries are blessed with them and the others have to buy from those countries. And they're often far away, requiring shipping through potentially hostile waters. But that's not true of renewables. They can usually be built and located where ever the need is2. This changes geopolitical equation in significant ways, but areas of the world that are not energy independent, like Europe, are moving toward renewables too slowly to prevent future energy shocks. The problem with renewables is that they're long-lead items—they take years to plan and bring online.

Petroleum and Bretton Woods enabled the modern world, providing portable, storable sources of energy that could easily and safely move to where ever it was needed.3 If we are indeed at the end of the Bretton Woods era, the world is in for significant changes as it adjusts to a life where free trade, and easy access to petroleum-based energy, cannot be assumed. Moving energy production closer to the places it's used is one strategy for dealing with this world-altering disruption. Buckle up.

Notes

There are other things that are important to the books overall conclusion besides energy. I'm just cherry picking that because I was thinking about it. For example, the US is largely self-sufficient from an overall import/export standpoint. We don't import nearly as much as many other countries and could replace what we do import relatively easily.

It's not just renewables. Nuclear power can also be located closer to demand than an oil deposit. I started my career as a nuclear metallurgist, so I'm a fan. I think many countries are going to be sorry they've closed nuclear plants and made them too hard to construct profitably.

The feats of engineering that have enabled these energy flows is truly astounding.

Photo Credit: Oil Tanker at Sunset from Terski (Pixabay)

Sunday, 10. March 2024

Werdmüller on Medium

The internet, addiction, and me

Sometimes it’s not okay to look down from the world. Continue reading on Medium »

Sometimes it’s not okay to look down from the world.

Continue reading on Medium »

Friday, 08. March 2024

Werdmüller on Medium

Startup pitch: Social VIP

A way for organizations to more easily join the fediverse Continue reading on Medium »

A way for organizations to more easily join the fediverse

Continue reading on Medium »

Thursday, 07. March 2024

The Pragmatic Engineer

The Pulse #84: Why is Apple bullying its own developers?

In the 1990s, Microsoft was the company most developers hated with a passion. Today, Apple is working harder than any other organization to earn a similar reputation.

The Pulse is a series covering insights, patterns, and trends within Big Tech and at startups. Notice an interesting event or trend? Send me a message.

Todays topics are:

Industry pulse: OpenAI never intended to stay “open;” Planetscale’s focus shifts to profitability; Hopin enters liquidation but doesn’t go bankrupt; and more.

An engineering leader's job search algorithm. Kevin Conroy shared a 28-page guide for landing an engineering manager job at Meta.

Why is Apple bullying app developers? Apple has tried to make a public example of two companies that voiced criticism of its business practices, and the iPhone maker is acting in ways likely to anger the European regulator. So, Is Apple becoming the ‘90s Microsoft in how it treats developers?

1. Industry pulse OpenAI never intended to stay open?

Read more

Monday, 04. March 2024

Just a Theory

Talk: State of the Extension Ecosystem

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC.

Update: 2024-03-06: Slides and video linked below.

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC. This talk is the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28.

I plan to give a brief history of Postgres extension tools and distribution, the challenges encountered, recent developments, and opportunities for the future. It should take about 30 minutes, followed by discussion. Following this pattern for all the talks in the series, I hope to set up some engaging discussions and to surface significant topics ahead of the summit.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-06: Great turnout and discussion, thank you! Links:

Video Keynote PDF Slides Notes More about… Postgres PGXN Extensions PGConf Summit

Monday, 04. March 2024

Damien Bod

Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC

This article shows how to onboard different clients or organizations in an ASP.NET Core application to use separated Azure blob containers with controlled access using security groups and RBAC applied roles. Each user in a client group can only access a single blob storage and has no access to blob containers belonging to different clients. […]

This article shows how to onboard different clients or organizations in an ASP.NET Core application to use separated Azure blob containers with controlled access using security groups and RBAC applied roles. Each user in a client group can only access a single blob storage and has no access to blob containers belonging to different clients. Microsoft Entra ID is used to implement the blob storage access.

Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage

Blogs in this series Using Blob storage from ASP.NET Core with Entra ID authentication Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC

Security context diagram

The web application uses different Enterprise applications to access the different Azure APIs. One Enterprise application is used to implement the blob application contributor access which can only be used from the application. This is used when a user in the application needs to write a file to the blob through the application. A second Enterprise application is used to create the RBAC access for the blob container reader access to the files. This is used when creating a new client or new organization. A third Enterprise application is used to implement the web application OpenID Connect web client. This is created using an Azure app registration and only allows delegated permissions. The user App roles are defined in this application. The Microsoft Graph APIs can be implemented using delegated permissions or application permissions. If using Graph application permissions to create or remove the groups, a separate Enterprise application is used to create the groups. You can also used the app service managed identity and use the service principal instead of the 3 enterprise applications to assign the required permissions.

Multi client blob storage setup

The Blob account uses Microsoft Entra ID to access the blob containers. The application can write to all containers and a security group is given RBAC blob container reader access, one per security group. Users are added to the security groups per client or per organization. The setup for the client blob container is implemented in three steps:

Create a Microsoft Entra ID security group Create an Azure storage blob container Create an RBAC to give the security group Blob storage reader permissions

Create a Microsoft Entra ID security group

The CreateSecurityGroupAsync method creates a new security group in Microsoft Entra ID. This is created using Microsoft Graph and returns the group with the group ID. The service uses application permissions and is implemented in a separate Enterprise application.

using System.Text; using Microsoft.Graph.Models; namespace MultiClientBlobStorage.Providers.GroupUserServices; public class ApplicationMsGraphService { private readonly GraphApplicationClientService _graphApplicationClientService; public ApplicationMsGraphService( GraphApplicationClientService graphApplicationClientService) { _graphApplicationClientService = graphApplicationClientService; } public async Task<Group?> CreateSecurityGroupAsync(string group) { var graphServiceClient = _graphApplicationClientService .GetGraphClientWithClientSecretCredential(); var formatted = RemoveSpecialCharacters(group); var groupName = $"blob-{formatted.Trim()}-{Guid.NewGuid()}".ToLower(); var requestBody = new Group { DisplayName = groupName, Description = $"Security group for all users from {groupName}", MailEnabled = false, MailNickname = formatted, SecurityEnabled = true }; var result = await graphServiceClient.Groups.PostAsync(requestBody); return result; } private string RemoveSpecialCharacters(string str) { var sb = new StringBuilder(); foreach (var c in str) { if (c is >= '0' and <= '9' || c is >= 'A' and <= 'Z' || c is >= 'a' and <= 'z' || c == '.' || c == '_') { sb.Append(c); } } return sb.ToString(); } }

Create an Azure storage blob container

The CreateContainer method creates a new Azure blob container using the BlobServiceClient from the Azure.Storage.Blobs nuget package.

private async Task<BlobContainerClient> CreateContainer(string name) { try { var formatted = RemoveSpecialCharacters(name); string containerName = $"blob-{formatted.Trim()}-{Guid.NewGuid()}" .ToLower(); var storage = _configuration.GetValue<string>("AzureStorage:Storage"); var credential = _clientSecretCredentialProvider .GetClientSecretCredential(); if (storage != null && credential != null) { var blobServiceClient = new BlobServiceClient(new Uri(storage), credential); var metadata = new Dictionary<string, string?> { { "name", name }, }; // Create the root container var blobContainerClient = await blobServiceClient .CreateBlobContainerAsync( containerName, PublicAccessType.None, metadata); if (blobContainerClient.Value.Exists()) { Console.WriteLine( $"Created container: {name} {blobContainerClient.Value.Name}"); } return blobContainerClient.Value; } throw new Exception($"Could not create container: {name}"); } catch (RequestFailedException e) { Console.WriteLine("HTTP error code {0}: {1}", e.Status, e.ErrorCode); Console.WriteLine(e.Message); throw; } }

Create an RBAC to give the security group Blob storage reader permissions

The ApplyReaderGroupToBlobContainer method creates an RBAC for the security group on the blob container itself. The group and the container were created in the previous steps and this takes an unknown length of time. Polly is used to repeat until the group and the container are ready and it creates the assignment.

public async Task ApplyReaderGroupToBlobContainer( BlobContainerClient blobContainer, string groupId) { var maxRetryAttempts = 20; var pauseBetweenFailures = TimeSpan.FromSeconds(3); var retryPolicy = Policy .Handle<Exception>() .WaitAndRetryAsync(maxRetryAttempts, i => pauseBetweenFailures); await retryPolicy.ExecuteAsync(async () => { // RBAC security group Blob data read await _azureMgmtClientService .StorageBlobDataReaderRoleAssignment(groupId, blobContainer.AccountName, blobContainer.Name); // NOTE service principal blob write is configured on root }); }

Azure management REST API is used to create the RBAC. This rest API is implemented using a HttpClient and uses an Enterprise application to define the required permissions. This requires an administration Azure role and with this, you have full control of the Azure tenant.

using System.Net.Http.Headers; using System.Text.Json.Serialization; namespace MultiClientBlobStorage.Providers.Rbac; public class AzureMgmtClientService { private readonly AzureMgmtClientCredentialService _azureMgmtClientCredentialService; private readonly IHttpClientFactory _clientFactory; private readonly IConfiguration _configuration; private readonly ILogger<AzureMgmtClientService> _logger; public AzureMgmtClientService(AzureMgmtClientCredentialService azureMgmtClientCredentialService, IHttpClientFactory clientFactory, IConfiguration configuration, ILogger<AzureMgmtClientService> logger) { _azureMgmtClientCredentialService = azureMgmtClientCredentialService; _clientFactory = clientFactory; _configuration = configuration; _logger = logger; } /// <summary> /// Storage Blob Data Reader: ID: 2a2b9908-6ea1-4ae2-8e65-a410df84e7d1 /// Role assignment required for application in Azure on resource group /// https://learn.microsoft.com/en-us/rest/api/authorization/role-assignments/create-by-id?view=rest-authorization-2022-04-01&tabs=HTTP /// https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-rest /// </summary> public async Task StorageBlobDataReaderRoleAssignment(string groupId, string storageAccountName, string blobContainerName) { // The role ID: Storage Blob Data Reader var roleId = "2a2b9908-6ea1-4ae2-8e65-a410df84e7d1"; var roleNameUnique = $"{Guid.NewGuid()}"; // Must be a guid var subscriptionId = _configuration["AzureMgmt:SubscriptionId"]; // the service principal ID var servicePrincipalId = groupId; // the resource group name var resourceGroupName = _configuration["AzureMgmt:ResourceGroupName"]; var objectId = $"/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}/blobServices/default/containers/{blobContainerName}"; var url = $"https://management.azure.com{objectId}/providers/Microsoft.Authorization/roleAssignments/{roleNameUnique}?api-version=2022-04-01"; var client = _clientFactory.CreateClient(); var accessToken = await _azureMgmtClientCredentialService.GetAccessToken(); client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", accessToken); client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json")); var roleDefinitionId = $"{objectId}/providers/Microsoft.Authorization/roleDefinitions/{roleId}"; var PayloadRoleAssignment = new PayloadRoleAssignment { Properties = new Properties { RoleDefinitionId = roleDefinitionId, PrincipalId = servicePrincipalId, PrincipalType = "Group" } }; // view containers //var getRe = $"https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}/blobServices/default/containers?api-version=2023-01-01"; //var response = await client.GetAsync(getRe); //var test = await response.Content.ReadAsStringAsync(); var response = await client.PutAsJsonAsync(url, PayloadRoleAssignment); if (response.IsSuccessStatusCode) { var responseContent = await response.Content.ReadAsStringAsync(); _logger.LogInformation("Created RBAC for read group {blobContainerName} {responseContent}", blobContainerName, responseContent); return; } var responseError = await response.Content.ReadAsStringAsync(); _logger.LogCritical("Created RBAC for read group {blobContainerName} {responseError}", blobContainerName, responseError); throw new ApplicationException($"Status code: {response.StatusCode}, Error: {response.ReasonPhrase}, {responseError}"); } private class PayloadRoleAssignment { [JsonPropertyName("properties")] public Properties Properties { get; set; } = new(); } /// <summary> /// "properties": { /// "roleDefinitionId": /// "subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleDefinitions/ROLE_ID", /// "principalId": "SP_ID" /// } /// </summary> private class Properties { [JsonPropertyName("roleDefinitionId")] public string RoleDefinitionId { get; set; } = string.Empty; [JsonPropertyName("principalId")] public string PrincipalId { get; set; } = string.Empty; [JsonPropertyName("principalType")] public string PrincipalType { get; set; } = "Group"; } }

Putting it together

A Razor page can be used to create the new clients. This method takes an unknown length of time to run and the RBAC also take an unknown length of time to get applied.

[Authorize(Policy = "blob-admin-policy")] public class CreateClientModel : PageModel { private readonly ClientBlobContainerProvider _clientBlobContainerProvider; private readonly ApplicationMsGraphService _applicationMsGraphService; [BindProperty] public string ClientName { get; set; } = string.Empty; public CreateClientModel( ClientBlobContainerProvider clientBlobContainerProvider, ApplicationMsGraphService applicationMsGraphService) { _clientBlobContainerProvider = clientBlobContainerProvider; _applicationMsGraphService = applicationMsGraphService; } public void OnGet() { } public async Task<IActionResult> OnPostAsync() { if (ModelState.IsValid) { var group = await _applicationMsGraphService .CreateSecurityGroupAsync(ClientName); var blobContainer = await _clientBlobContainerProvider .CreateBlobContainerClient(ClientName); if(blobContainer != null && group != null && group.Id != null) { await _clientBlobContainerProvider .ApplyReaderGroupToBlobContainer(blobContainer, group.Id); } } return Page(); } }

Notes

This works well but requires that the application has high privileged access permissions. Most IT departments will not allow this and the creation of blob containers would have to use the IT preferred tools. This type of automation requires 2 different Azure APIs and is not well documented.

Links

Using Blob storage from ASP.NET Core with Entra ID authentication

Implementing secure Microsoft Graph application clients in ASP.NET Core

https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://github.com/AzureAD/microsoft-identity-web

Sunday, 03. March 2024

Foss & Crafts

62: Blender

Blender, the absolute powerhouse of FOSS 3d (and increasingly 2d) graphics! We give an overview of the software's history, some personal history of our relationships to the software, what it can do, and where we're excited to see it go! Links: Blender Blender history Grease pencil Some historical Blender videos from the NeoGeo and Not a Number days: Did It, Done It, Not a Number comme

Blender, the absolute powerhouse of FOSS 3d (and increasingly 2d) graphics! We give an overview of the software's history, some personal history of our relationships to the software, what it can do, and where we're excited to see it go!

Links:

Blender

Blender history

Grease pencil

Some historical Blender videos from the NeoGeo and Not a Number days: Did It, Done It, Not a Number commercial, Come and See

Elephants Dream, aka Project Orange

Big Buck Bunny

Previous episodes on blender:

Blender for open movie productions and education Sophie Jantak on pet portraits and Blender's Grease Pencil

Blender Conference videos mentioned:

Inklines Across The Spiderverse My Journey Across the Spider-Verse: from Hobbyist to Hollywood Forensic Architecture - spatial analysis for human rights cases

The MediaGoblin campaign video (well, the second one)

14th anniversary animation gift to Morgan

In Unexpected Places

Seams to Sewing Pattern (a Blender plugin for making clothes and stuffed animals!) (could we make Free Soft Wear patterns with it?)

Wing It!

Wing It! Production Logs and Blenderheads

Episodes about lisp, because obviously Blender needs more lisp (who's going to do it):

What is Lisp?

Lisp but Beautiful, Lisp for Everyone


Heres Tom with the Weather

Internationalizing the Fediverse

Yesterday, a request was made to like a post from an account with unicode in the username. As Terence Eden notes in Internationalise The Fediverse Mastodon (the largest ActivityPub service) doesn’t allow Unicode usernames and has resisted efforts to change. So, I tried with Irwin on otisburg.social and the code threw an exception when I tried to interact with the account (URI::Inva

Yesterday, a request was made to like a post from an account with unicode in the username. As Terence Eden notes in Internationalise The Fediverse

Mastodon (the largest ActivityPub service) doesn’t allow Unicode usernames and has resisted efforts to change.

So, I tried with Irwin on otisburg.social and the code threw an exception when I tried to interact with the account

(URI::InvalidURIError) "URI must be ascii only \"https://i18n.viii.fi/@\\u4F60\\u597D\""

This is not an acceptable uri to pass to URI.parse(). In this case, I found some help from a stackoverflow post and fixed the code so I could interact with the account and like the post.

From the comments in Terence’s blog post, I saw Allowed characters in preferredUsername has also been identified as an issue with webfinger.


Wrench in the Gears

Leap Year At Camp Hero – Lessons From Inside The Big Duck

Arrive at the appointed place and time. Swim the waters. Intuit the rules even as you recognize their fluidity. We are agents, of which Conway would approve. Choose and choose again. Those around you choose. From choices trees unfurl. Expansive arbors lick at the horizon. Pixels, neighbors, nodes blink off and on. Life as pattern [...]

Arrive at the appointed place and time.

Swim the waters.

Intuit the rules even as you recognize their fluidity.

We are agents, of which Conway would approve.

Choose and choose again.

Those around you choose.

From choices trees unfurl.

Expansive arbors lick at the horizon.

Pixels, neighbors, nodes blink off and on.

Life as pattern in motion.

Ulam and von Neumann’s cellular automata.

Signals synchronize.

Stigmergic swarms generate exquisite gliders and spaceships to speed across crystalline lattice.

A note in ballpoint on the bottom of a pencil drawer in the Hans Bethe house on Bathtub Row: “A mathematician does something on a piece of paper, and then lo and behold a big explosion can occur. S Ulam – Phila. 1967.”

And that “explosion”

Hydrogen bomb?

Or superposition?

The grail…

Atoms broken?

Or linear time?

Consider consciousness harnessed.

Put to bed Georgia Guidestone fairytales.

“Useless eaters” are not so “useless” when networked towards the Omega point.

Biodigital convergent infrastructure maximizes return on humans tethered to sacred fields.

Liquid crystal fascial computing, soulful frequencies dance through entrained torus fields.

Electroceuticals, biofeedback, “white hat” nano join to decrypt secrets beyond human knowing.

Hypercubes architected for group-mind memory storage.

Manifestation protocols where pictographic tokens masquerade as community currency.

Sophisticated storylines weave consent for Santa Fe’s B game.

Human potential harvested from the cacophony of the wise guy commons.

Is the terrain dangerous or didactic?

The conducting programmer…

Benevolent? Malevolent? Indifferent? Curious? Bored?

Can you count the degrees of freedom?

Find your way to the edge?

Flip the board?

Become the governor?

An empire of one?

One man is an island…

Chaotic cycles churn through gap junctions.

Emotional distance offers fleeting grace, a respite of unexpected order.

Thoughtlessly step into the void.

Life’s obvious turning points are outnumbered by countless split paths traversed on autopilot.

Doors are opened, opportunities are taken off the table, outcomes obscured.

Five senses cannot perceive beyond the bend,

But probiotics on standby may orchestrate the holobiont stochastic symphony.

Even as a gut-led leap year duck visitation arose en-route to Montauk.

There, textured yarns were spun of nuclear estuary rituals entangled with digital totem poles, tree spirits, astral projection, and advice on finding your way home.

Where lighthouse and radar tower flank hoodoo bluffs.

Cobbled, glacial beaches kiss Atlantic waves, a naval domain of photonics and frequency.

Microwave ether claimed sodden oak, velvet moss, lichen and quartz.

Inner knowing dispells history’s iron curtain.

Are you the hero? 

Is a hero what’s needed? 

 

 

Interactive Map: https://embed.kumu.io/595f0be99ffa93a388addc2e1b262158#untitled-map?s=bm9kZS1Id1NxMmg4Zw%3D%3D

 

 

Tuesday, 27. February 2024

Just a Theory

Extension Ecosystem Summit 2024

Some pals and I organized a summit at PGConf.dev on May 28 to work together as a community toward comprehensive indexing, discovery, and binary distribution, as well as a speaker series “mini summit” leading up to the event.

I’m pleased to announce that some pals and I have organized and will host the (first annual?) Extension Ecosystem Summit at PGConf.dev in Vancouver (and more, see below) on May 28:

Enabling comprehensive indexing, discovery, and binary distribution.

Participants will collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!

Extension Ecosystem Mini-Summit

But if you can’t make it, that’s okay, because in the lead up to the Summit, to we’re hosting a series of six virtual gatherings, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 6 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup:

March 6: David Wheeler, PGXN: “State of the Extension Ecosystem” March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI” April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions” April 17: Jonathan Katz: “TLE Vision and Specifics” May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod” May 15: David Wheeler, PGXN: “Community Organizing Summit Topics”

Hit the event page for details. Many thanks to my co-organizers Jeremy Schneider, David Christensen, Keith Fiske, and Devrim Gündüz, as well as the PGConf.dev organizers for making this all happen!

Updates:

2024-03-06: Updated the talk schedule. 2024-04-24: Updated the last talk title. More about… Postgres PGXN Extensions PGConf Vancouver Summit

Tuesday, 27. February 2024

Just a Theory

RFC: Extension Metadata Typology

Thinking through the PostgreSQL extension metadata use cases and recognizing the types of information they need.

Lately I’ve been thinking a lot about metadata for Postgres extensions. Traditional use cases include control file metadata, which lives in .control files used by CREATE EXTENSION and friends, and PGXN metadata, which lives in META.json files used by PGXN to index and publish extensions. But these two narrow use cases for SQL behavior and source code distribution don’t provide the information necessary to enable other use cases, including building, installing, configuration, and more.

So I have also been exploring other metadata formats, including:

Go go.mod Cargo Manifest File npm package.json trunk.toml pgxman buildkit

These standards from neighboring communities reveal a great deal of overlap, as one might expect (everything has a name, a version, an author, license, and so on), but also types of metadata that had not occurred to me. As I took notes and gathered suggestions from colleagues and coworkers, I began to recognize natural groupings of metadata. This lead to the realization that it might be easier — and more productive — to think about these groupings rather than individual fields.

I therefore propose a typology for Postgres extension metadata.

Extension Metadata Typology Essentials

Essential information about the extension itself, including its name (or unique package name), version, list of authors, license, etc. Pretty much every metadata format encompasses this data. Ecosystem applications use it for indexing, installation locations, naming conventions, and display information.

Artifacts

A list of links and checksums for downloading the extension in one or more formats, including source code, binaries, system packages, and more. Apps use this information to determine the best option for installing an extension on a particular system.

Resources

External information about the extension, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Contents

A description of what’s included in the extension package. Often an “extension” consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more. Furthermore, some extensions are not CREATE EXTENSION-type extension at all, such as background workers, command-line apps, libraries, etc. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Prerequisites

A list of external dependencies required to configure, build, test, install, and run the extension. These include not only other extensions, but also external libraries and OS-specific lists of binary package dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

How to Build It

Metadata that apps use to determine how to build the extension. Does it use the PostgreSQL PGXS build pipeline? Or perhaps it needs the cargo-based pgrx toolchain. Maybe a traditional ./configure && make pattern? Perl, Ruby, Python, Go, Rust, or NPM tooling? Whatever the pattern, this metadata needs to be sufficient for an ecosystem app to programmatically determine how to build an extension.

How to Install It

Usually an extension of the build metadata, the install metadata describes how to install the extension. That could be PGXS or pgrx again, but could also use other patterns — or multiple patterns! For example, perhaps an extension can be built and installed with PGXS, but it might also be TLE-safe, and therefore provide details for handing the SQL files off to a TLE installer.

This typology might include additional data, such as documentation files to install (man pages anyone?), or directories of dependent files or libraries, and the like — whatever needs to be installed for the extension.

How to Run It

Not all Postgres extensions are CREATE EXTENSION extensions. Some provide background workers to perform various tasks; others simply provide Utility applications like pg_top and pg_repack. In fact pg_repack provides both a command-line application and a CREATE EXTENSION extension in one package!

This metadata also provides configuration information, both control file parameters like trusted, superuser, and schema, but also load configuration information, like whether an extension needs its libraries included in shared_preload_libraries to enable LOAD or requires a cluster restart. (Arguably this information should be in the “install” typology rather than “run”.)

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field. Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions often used together or for various workloads or tasks.

Metrics and Reports

This final typology differs from the others in that its metadata derives from third party sources rather than the extension developer. It includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security scan results, vulnerability detection, quality metrics and user ratings, and more.

In the broader ecosystem, it would be the responsibility of the root registry to ensure such data in the canonical data for each extension comes only from trusted sources, although applications downstream of the root registry might extend metrics and reports metadata with their own information.

What More?

Reading through various metadata standards, I suspect this typology is fairly comprehensive, but I’m usually mistaken about such things. What other types of metadata do you find essential for the use cases you’re familiar with? Do they fit one of the types here, or do they require some other typology I’ve failed to imagine? Hit the #extensions channel on the Postgres Slack to contribute to the discussion, or give me a holler on Mastodon.

Meanwhile, I’ll be refining this typology and assigning all the metadata fields to them in the coming weeks, with an eye to proposing a community-wide metadata standard. I hope it will benefit us all; your input will ensure it does.

More about… Postgres Extensions Metadata Classification Categories RFC

Monday, 26. February 2024

Mike Jones: self-issued

COSE “typ” (type) Header Parameter Specification Addressing IETF Last Call Feedback

Orie Steele and I have updated the COSE “typ” (type) Header Parameter Specification to address feedback received during IETF Last Call. No normative changes were made. Thanks to those that reviewed the specification! The specification is available at: https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-03.html Besides the spec being useful on its own, it’s worth noting that the CBO

Orie Steele and I have updated the COSE “typ” (type) Header Parameter Specification to address feedback received during IETF Last Call. No normative changes were made.

Thanks to those that reviewed the specification!

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-03.html

Besides the spec being useful on its own, it’s worth noting that the CBOR Web Token (CWT) Claims in COSE Headers specification references this spec, and so won’t exit the RFC Editor queue as an RFC until this one also does.


Damien Bod

Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication

This article shows how an ASP.NET Core application can control the write access to an Azure blob storage container using an application app registration. Microsoft Entra ID is used to control the user access and to implement the authentication of the web application. Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage Blogs in this series The solution provides a secure upload […]

This article shows how an ASP.NET Core application can control the write access to an Azure blob storage container using an application app registration. Microsoft Entra ID is used to control the user access and to implement the authentication of the web application.

Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage

Blogs in this series Using Blob storage from ASP.NET Core with Entra ID authentication Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC

The solution provides a secure upload and secure download using Microsoft Entra ID. Users are authenticated using Microsoft Entra ID. The blob storage containers use Microsoft Entra ID security groups to control the read access. The upload access uses the an Enterprise application for the contributor role. This is used from the ASP.NET Core application for file upload.

In Azure, only the application can upload files and the users of the groups can only read the files.

The RBAC are setup as shown:

Blob storage application write access

The application can upload files to Azure blob storage. Client secrets are used to acquire the access token using the client credentials OAuth flow. This can be further improved by using a certificate and using the client assertions or even a managed identity from the host service. For development, I normally use a secret for simplicity.

using Azure.Identity; namespace DelegatedReadAppWriteBlobStorage.FilesProvider.AzureStorageAccess; public class ClientSecretCredentialProvider { private readonly IConfiguration _configuration; public ClientSecretCredentialProvider(IConfiguration configuration) { _configuration = configuration; } public ClientSecretCredential GetClientSecretCredential() { // Values from app registration var tenantId = _configuration.GetValue<string>("AzureAd:TenantId"); var clientId = _configuration.GetValue<string>("ApplicationClient:ClientId"); var clientSecret = _configuration.GetValue<string>("ApplicationClient:ClientSecret"); var options = new ClientSecretCredentialOptions { AuthorityHost = AzureAuthorityHosts.AzurePublicCloud }; // https://docs.microsoft.com/dotnet/api/azure.identity.clientsecretcredential var clientSecretCredential = new ClientSecretCredential( tenantId, clientId, clientSecret, options); return clientSecretCredential; } }

The BlobApplicationUploadProvider uses the token and makes it possible to upload files to the Azure Blob storage. The provider uses the IFormFile as a parameter for the file data. Meta data is persisted in the local SQL database.

using Azure.Storage.Blobs; using Azure.Storage.Blobs.Models; namespace DelegatedReadAppWriteBlobStorage.FilesProvider.AzureStorageAccess; public class BlobApplicationUploadProvider { private readonly IConfiguration _configuration; private readonly ClientSecretCredentialProvider _clientSecretCredentialProvider; public BlobApplicationUploadProvider(ClientSecretCredentialProvider clientSecretCredentialProvider, IConfiguration configuration) { _configuration = configuration; _clientSecretCredentialProvider = clientSecretCredentialProvider; } public async Task<string> AddNewFile(BlobFileUploadModel blobFileUpload, IFormFile file) { try { return await PersistFileToAzureStorage(blobFileUpload, file); } catch (Exception e) { throw new ApplicationException($"Exception {e}"); } } private async Task<string> PersistFileToAzureStorage( BlobFileUploadModel blobFileUpload, IFormFile formFile, CancellationToken cancellationToken = default) { var storage = _configuration.GetValue<string>("AzureStorage:StorageAndContainerName"); var fileFullName = $"{storage}/{blobFileUpload.Name}"; var blobUri = new Uri(fileFullName); var blobUploadOptions = new BlobUploadOptions { Metadata = new Dictionary<string, string?> { { "uploadedBy", blobFileUpload.UploadedBy }, { "description", blobFileUpload.Description } } }; var blobClient = new BlobClient(blobUri, _clientSecretCredentialProvider.GetClientSecretCredential()); var inputStream = formFile.OpenReadStream(); await blobClient.UploadAsync(inputStream, blobUploadOptions, cancellationToken); return $"{blobFileUpload.Name} successfully saved to Azure Blob Storage Container"; } }

Blob storage delegated read access

The blob storage reader access is given to all users in the security group. This security group uses the app-role from the Azure app registration and the group can access the blob container, not the blob account. No SAS are used .

using Azure.Core; using Microsoft.Identity.Client; using Microsoft.Identity.Web; namespace DelegatedReadAppWriteBlobStorage.FilesProvider.AzureStorageAccess; public class DelegatedTokenAcquisitionTokenCredential : TokenCredential { private readonly ITokenAcquisition _tokenAcquisition; private readonly IConfiguration _configuration; public DelegatedTokenAcquisitionTokenCredential(ITokenAcquisition tokenAcquisition, IConfiguration configuration) { _tokenAcquisition = tokenAcquisition; _configuration = configuration; } public override AccessToken GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken) { throw new NotImplementedException(); } public override async ValueTask<AccessToken> GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken) { string[]? scopes = _configuration["AzureStorage:ScopeForAccessToken"]?.Split(' '); if (scopes == null) { throw new Exception("AzureStorage:ScopeForAccessToken configuration missing"); } AuthenticationResult result = await _tokenAcquisition .GetAuthenticationResultForUserAsync(scopes); return new AccessToken(result.AccessToken, result.ExpiresOn); } } Notes

Using an application for write access to the blob storage has both advantages and disadvantages in context of security. Removing the write access to all users is positive, but when using applications or if the application has weak security, this approach can be bad. Using a managed identity for the deployed service would improve the security and would remove the need for secrets or certificates to authenticate the application. This prepares the access for the next solution which would be to implement a multi-tenant or multi-client application using the Azure blob.

Links

Using Blob storage from ASP.NET Core with Entra ID authentication

Implementing secure Microsoft Graph application clients in ASP.NET Core

https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://github.com/AzureAD/microsoft-identity-web

Saturday, 24. February 2024

Heres Tom with the Weather

Morris Lukowich at Saint Arnold Brewing

Today I was fortunate to have my picture taken with Morris Lukowich during Hockey Day in Houston at Saint Arnold Brewing. The best hockey games I have ever watched in-person were the 1977-78 Houston Aeros. The WHA All-Star game was in Quebec City on January 17, 1978 and for the picture I believe it is Lukowich’s All-Star jersey from that game that I have borrowed.

Today I was fortunate to have my picture taken with Morris Lukowich during Hockey Day in Houston at Saint Arnold Brewing. The best hockey games I have ever watched in-person were the 1977-78 Houston Aeros. The WHA All-Star game was in Quebec City on January 17, 1978 and for the picture I believe it is Lukowich’s All-Star jersey from that game that I have borrowed.

Friday, 23. February 2024

Mike Jones: self-issued

Continued refinement: OpenID Federation draft 33 published

OpenID Federation draft 33 has been published at https://openid.net/specs/openid-federation-1_0-33.html and https://openid.net/specs/openid-federation-1_0.html. The working group continues refining the specification to make it more consistent and easier to read and implement. We published draft 33 now to get these improvements out to implementers. Per the history entries at https://openid.net/specs

OpenID Federation draft 33 has been published at https://openid.net/specs/openid-federation-1_0-33.html and https://openid.net/specs/openid-federation-1_0.html. The working group continues refining the specification to make it more consistent and easier to read and implement.

We published draft 33 now to get these improvements out to implementers. Per the history entries at https://openid.net/specs/openid-federation-1_0-33.html#name-document-history, a summary of changes made in -32 and -33 is:

-33:

Addressed #2111: The metadata_policy_crit claim MAY only appear in Subordinate Statements and its values apply to all metadata_policies found in the Trust Chain. Fixed #2096: Authorization Signed Request Object may contain trust_chain in its payload and should not in its JWS header parameters. Strengthen language requiring client verification with automatic registration. Fixed #2076: Promoted Trust Marks to be a top-level section. Added General-Purpose JWT Claims section. Moved Federation Endpoints section before Obtaining Federation Entity Configuration Information section. Fixed #2110: Explanation text when multiple entity_type parameters are provided in the Subordinate Listing endpoint. Fixed #2112, #2113, and #2114: Defined that client authentication is not used by default and that the default client authentication method, when used, is private_key_jwt. Specified that requests using client authentication use HTTP POST. Fixed #2104: Allow trust marks in Subordinate Statements for implementation profiles that might want this. Fixed #2103: Addressed ambiguities in the definition of constraints.

-32:

Tightened OpenID Connect Client Registration section. Tightened appendix examples. Fixed #2075: Trust Mark endpoint for the provisioning of the Trust Marks. Fixed #2085: Trust Marked Entities Listing, added sub URL query parameter. Made fetch issuer unambiguous by making the iss parameter REQUIRED. Introduced the term “Subordinate Statement” and applied it throughout the specification. Also consistently use the term “registration Entity Statement” for Explicit Client Registration results. Clarified where Entity Statement claims can and cannot occur. Renamed policy_language_crit to metadata_policy_crit. Fixed #2093: Numbered the list defining the order policy operators are applied in.

Special thanks to Stefan Santesson for his thorough review of the specification in the context of the Swedish Federation deployment!


Phil Windleys Technometria

Identity Metasystems and Lessons from Building the Sovrin Foundation

I recently spoke with Riley Hughes of Trinsic on his Future of Identity podcast about the birth of Sovrin Foundation, it's inevitable growing pains, self-sovereign identity, identity metasystems, and adoption. Give it a listen. I'm grateful to Riley for having me on as a guest.

I recently spoke with Riley Hughes of Trinsic on his Future of Identity podcast about the birth of Sovrin Foundation, it's inevitable growing pains, self-sovereign identity, identity metasystems, and adoption. Give it a listen.

I'm grateful to Riley for having me on as a guest.

Some of the ideas in this podcast are explored more deeply in my latest book, Learning Digital Identity from O’Reilly Media. You can support my writing by buying a copy!

If you enjoyed this, please consider sharing it with a friend or twenty.


Heres Tom with the Weather

Doctors Without Borders urgent update on #Gaza to UN Security Council

MSF to UN Security Council: The people of Gaza need an immediate and sustained ceasefire now

Thursday, 22. February 2024

Justin Richer

Applying RAR in OAuth 2 (and GNAP)

The Rich Authorization Request extension to OAuth 2, or RAR, is a way to talk about access in the OAuth space beyond what scopes allow, and it is defined in RFC9396. One of the key motivations behind RAR was admitting, as a community, that a lot of times you need more than a set of scope values to properly describe access to an API. A Larger Scope Scopes work pretty well in OAuth, but

The Rich Authorization Request extension to OAuth 2, or RAR, is a way to talk about access in the OAuth space beyond what scopes allow, and it is defined in RFC9396. One of the key motivations behind RAR was admitting, as a community, that a lot of times you need more than a set of scope values to properly describe access to an API.

A Larger Scope

Scopes work pretty well in OAuth, but they’re ultimately just a set of strings. It’s worlds better than the all-or-nothing access that OAuth 1 or API keys give you, and if you have an API, they allow you to easily separate read and write access. But what if you want to be more specific? What if you want to, say, let someone access a specific account for certain features for a certain amount of time? Or read in one place but write to another? And what if you want to compose that set of features differently, such as any account for an unlimited time but only if the user’s online? The ways of describing API access are as varied as the APIs being accessed.

When faced with this problem, the first thing that many people realize is that because scopes are just strings, their own API can impose semantics and syntax on those strings. So people add parameters, or define composition rules, or even define a whole query language embedded into the scope string. But those approaches require a lot of complex processing and lack interoperable structure that would let you easily combine protection of multiple kinds of APIs.

What RAR gives you, instead, is an array of JSON objects to describe your API access. So instead of trying to cram everything into a single string, you can now put together an object that defines exactly what you want.

[
{
"type": "payment_initiation",
"actions": [
"initiate",
"status",
"cancel"
],
"locations": [
"https://example.com/payments"
],
"instructedAmount": {
"currency": "EUR",
"amount": "123.50"
},
"creditorName": "Merchant A",
"creditorAccount": {
"iban": "DE02100100109307118603"
},
"remittanceInformationUnstructured": "Ref Number Merchant"
}
]

This object is specific to the API in question and carries with it all the detail that is needed for successful processing. Each kind of API can define its own type value, which in turn defines what’s allowed to go into the rest of the object. And if you need more than one view of things, like read access to A but write access to B, then you can pass in multiple objects in the same structure.

One important question arises out of this: who needs to know this level of detail?

Who Cares About RAR

In this regard, RAR really is built on top of the concept of a scope. In an OAuth delegation, there are four parties. The client, resource owner, authorization server (AS), and resource server (RS).

The four parties in OAuth delegation, illustration from OAuth 2 In Action

These parties have particular relationships, and each of them might care about a RAR object or a scope in a slightly different way. However, the more important question is about which relationship is in play.

Client -> AS: Requesting Access

When requesting an access token, the client needs to be able to describe to the AS what it wants. RAR allows the client to get VERY specific, if the client knows what details it wants ahead of time. Maybe the client has prompted the resource owner for an account identifier, or has learned through some other protocol where the target system is located, or it’s just been configured to know that it needs to ask for specific objects in order to do specific things. In all of these cases, the client can send RAR objects to the AS just like it would a scope, in the hopes of getting an access token that can do what it asks for.

AS -> Client: Granting Access

When the access token is granted, the AS can tell the client which RAR objects have been applied to the token. While this information is no substitute for an API discovery protocol, this approach can let the client differentiate what an access token is good for in different dimensions. For example, a client can ask for a token for an available signing service, and then be granted a token for use at a specific signing service, indicated through the locations field in the RAR object.

AS -> Resource owner: Gathering Authorization

During the delegation process, the AS often needs to prompt the resource owner to see if they’re OK with what’s being delegated. While this starts as the Client->AS request, RAR gives the AS an opportunity to fine-tune the access by asking the resource owner to be specific, or even filling in values that get put into the resulting RAR object. Maybe the client is asking for account access but the resource owner stipulates that it’s only good for the next five minutes. This does come at a usability cost, since it’s much easier to display a list of scope strings with checkboxes. But experience has shown that this list is not a great security measure anyway, since most users won’t change the checkboxes, and often don’t understand the differentiated access being granted.

AS -> RS: Describing Access

The access token itself represents a certain set of rights that have been granted. These can be described in the metadata of the token, available through either a structured token field or an introspection response. In this way, the RS can learn what an access token is good for, and apply its policies appropriately. Does the token grant access for the HTTP GET command on the resource at /photo/123-fda1d? Is this token even good at this specific RS, or is it meant for somewhere else? The RAR object can be used to describe all of this.

Not Everything Has To Match

Finally, it’s important to note that the all of these different branches need not match each other in a single transaction. In one of the applications where I’ve personally deployed RAR, the client never sees the RAR objects. The client knows to ask for a specific scope, and the AS knows that when it sees that scope, the resulting token needs to apply to a whole set of things represented by the current user’s access within the system. The downstream APIs know nothing about users or accounts, but they do know the resources they protect.

As a consequence, the AS translates the client’s incoming scope request to a set of RAR objects that the APIs understand. The APIs never see or care about the scope, and the client never sees or cares about the RAR. In this way, internal API details stay internal and do not leak unnecessarily into the wider system.

However, a different client in this same ecosystem does have insight into the details of the API structure, and therefore its requests do specify RAR objects that target the APIs. These objects are processed in exactly the same way by the API servers, which gives us a powerful parallelism and profound code reuse in production.

GNAP Native

In GNAP, one of our main goals was to see what an OAuth-style system would look like without the constraints and history of OAuth, and one such constraint includes scopes. Consequently, GNAP’s native access rights is an array of objects that look suspiciously like RAR objects. This design is, of course, intentional, and in many ways RAR is the backport of GNAP’s access rights system to work on top of OAuth 2. While GNAP doesn’t have scopes in the same way, GNAP’s reference-based approach to its API design does allow for the use of a simple string to stand in for the objects in question, allowing a request to have both shortcut and fully specified items in the same request.

"access": [
{
"type": "photo-api",
"actions": [
"read",
"write"
],
"locations": [
"https://server.example.net/",
"https://resource.local/other"
],
"datatypes": [
"metadata",
"images"
],
"geolocation": [
{ lat: -32.364, lng: 153.207 },
{ lat: -35.364, lng: 158.207 }
]
},
{
"type": "financial-transaction",
"actions": [
"withdraw"
],
"identifier": "account-14-32-32-3",
"currency": "USD"
},
"dolphin-metadata",
"some other thing"
] How Can I Use RAR?

RAR support is starting to show up across different vendors, though it’s not universal yet. One of the companies I work for, Authlete, supports RAR natively. Other products can often have RAR grafted on top, since it takes the form of an extra parameter to be processed by an extension or module.

The real value is that we are starting to see API access defined in terms of RAR objects, replacing the awkward and error-prone string composition practices of the past. RAR may seem complex, but when you look at how APIs are defined and scopes are used, the power of that complexity really starts to show its value.


Just a Theory

The History and Future of Extension Versioning

What versioning standard should be used for Postgres extension distribution? Some context from PostgreSQL and PGXN, a survey of the version standard landscape today, and a recommendation.

Every software distribution system deals with versioning. Early in the design of PGXN, I decided to require semantic versions (SemVer), a clearly-defined and widely-adopted version standard, even in its pre-1.0 specification. I implemented the semver data type that would properly sort semantic versions, later ported to C by Sam Vilain and eventually updated to semver 2.0.0.

As I’ve been thinking through the jobs and tools for the Postgres extension ecosystem, I wanted to revisit this decision, the context in which it was made, and survey the field for other options. Maybe a “PGXN v2” should do something different?

But first that context, starting with Postgres itself.

PostgreSQL Extension Version Standard

From the introduction extensions in PostgreSQL 9.1, the project side-stepped the need for version standardization and enforcement by requiring extension authors to adopt a file naming convention, instead. For example, an extension named “pair” must have a file with its name, two dashes, then the version as listed in its control file, like so:

pair--1.1.sql

As long as the file name is correct and the version part byte-compatible with the control file entry, CREATE EXTENSION will find it. To upgrade an extension the author must provide a second file with the extension name, the old version, and the new version, all delimited by double dashes. For example, to upgrade our “pair” extension to version 1.2, the author supply all the SQL commands necessary to upgrade it in this file:

pair--1.1--1.2.sql

This pattern avoids the whole question of version standards, ordering for upgrades or downgrades, and all the rest: extension authors have full responsibility to name their files correctly.

PGXN Versions

SemVer simplified a number of issues for PGXN in ways that the PostgreSQL extension versioning did not (without having to re-implement the core’s file naming code). PGXN wants all metadata for an extension in its META.json file, and not to derive it from other sources that could change over time.

Following the CPAN model, PGXN also required that extension releases never decrease the version.1 The well-defined sortability of semantic versions made this validation trivial. PGXN later relaxed enforcement to allow updates to previously-released versions. SemVer’s clearly specified sorting made this change possible, as the major.minor.patch precedence intuitively compare from left to right.

In other words, if one had previously released version 1.2.2, then released 1.3.0, a follow-up 1.2.3 is allowed, increasing the 1.2.x branch version, but not, say, 1.2.1, which decreases the 1.2.x branch version.

Overall, semantic versions have been great for clarity of versioning of PGXN extensions. The one bit of conflict comes from extensions that use some other other version standard in the control file, usually a two-part x.y version not allowed by SemVer, which requires x.y.z (or, more specifically, major.minor.patch).

But such versions are usually compatible with SemVer, and because PGXN cares only about the contents of the META.json, they’re free to use their own versions in the control file, just as long as the META.json file uses SemVers.

For example, the recent nominatim_fdw v1.0.0 release, which of course lists "version": "1.0.0" in its META.json file, sticks to its preferred default_version = '1.0' in its control file. The extension author simply appends .0 to create a valid SemVer from their preferred version, and as long as they never use any other patch number, it remains compatible.

Versioning Alternatives

Surveying the versioning landscape in 2024 yields a number of approaches. Might we prefer an alternative for future extensions distribution? Let’s look at the possibilities.

Ad Hoc Versions

As described above, the Postgres file naming convention allows ad hoc versions. As far as I can tell, so does the R Project’s CRAN. This approach seems fine for systems that don’t need to follow version changes themselves, but much trickier for systems that do. If I want to install the latest version of an extension, how does the installer know what that latest version is?

The answer is that the extension author must always release them in the proper order. But if someone releases 1.3.1 of an extension, and then 1.2.1, well then 1.2.1 is the latest, isn’t it? It could get confusing pretty quickly.

Seems better to require some system, so that download and install clients can get the latest version — or the latest maintenance version of an earlier release if they need it.

User Choice

Quite a few registries allow users to choose their own versioning standards, but generally with some very specific recommendations to prevent confusion for users.

Python Packaging is fairly liberal in the versions it allows, but strongly recommends semantic versioning or calendar versioning (more on that below). CPAN (Perl) is also fairly liberal, due to its long history of module distribution, but currently requires “Decimal versions”, which are evaluated as floating-point numbers, or dotted integer versions, which require three dot-separated positive integers and must begin with the letter v. RubyGems does not enforce a versioning policy, but warns that “using an ‘irrational’ policy will only be a disservice to those in the community who use your gems.” The project therefore urges developers to follow SemVer.

These three venerable projects date from an earlier period of registration and distribution, and have made concessions to times when no policies existed. Their solutions either try to cover as many legacy examples as possible while recommending better patterns going forward (Python, Perl), or simply make recommendations and punt responsibility to developers.

SemVer

More recently-designed registries avoid this problem by requiring some level of versioning standard from their inception. Nearly all use SemVer, including:

Go Modules, where “Each version starts with the letter v, followed by a semantic version.” Cargo (Rust), which “uses SemVer for specifying version numbers. This establishes a common convention for what is compatible between different versions of a package.” npm, where the “version must be parseable by node-semver, which is bundled with npm as a dependency.” CalVer

CalVer eschews context-free incrementing integers in favor of semantically-meaningful versions, at least for some subset of a version string. In other words: make the version date-based. CalVer-versioned projects usually include the year and sometimes the month. Some examples:

Ubuntu uses YY.0M.MICRO, e.g., 23.04, released in April 2023, and 23.10.1, released in October 2023 Twisted uses YY.MM.MICRO, e.g., 22.4.0, released in April 2022

Ultimately, adoption of a CalVer format is a more choice about embedding calendar-based meaning into a version more than standardizing a specific format. One can of course use CalVer semantics in a semantic version, as in the Twisted example, which is fully-SemVer compliant.

In other words, adoption of CalVer need not necessitate rejection of SemVer.

Package Managers

What about package managers, like RPM and Apt? Some canonical examples:

RPM packages use the format:

<name>-<version>-<release>.<architecture>

Here <version> is the upstream version, but RPM practices a reasonable (if baroque) version comparison of all its parts. But it does not impose a standard on upstream packages, since they of course vary tremendously between communities and projects.

Apt packages use a similar format:

[epoch:]upstream_version[-debian_revision]

Again, upstream_version is the version of the upstream package, and not enforced by Apt.

APK (Alpine Linux) packages use the format

{digit}{.digit}...{letter}{_suf{#}}...{-r#}

I believe that {digit}{.digit}...{letter} is the upstream package version.

This pattern makes perfect sense for registries that repackage software from dozens of upstream sources that may or may not have their own policies. But a system that defines the standard for a specific ecosystem, like Rust or PostgreSQL, need not maintain that flexibility.

Recommendation

Given this survey, I’m inclined to recommend that the PostgreSQL community follow the PGXN (and Go, and Rust, and npm) precedent and continue to rely on and require semantic versions for extension distribution. It’s not perfect, given the contrast with the core’s lax version requirements. CalVer partisans can still use it, though with fewer formatting options (SemVer forbids leading zeros, as in the Ubuntu 23.04 example).

But with its continuing adoption, and especially its requirement by more recent, widely-used registries, and capacity to support date-based semantics for those who desire it, I think it continues to make the most sense.

Wrong!

I’m probably wrong. I’m often mistaken in one way or another, on the details or the conclusion. Please tell me how I’ve messed up! Find me on the #extensions channel on the Postgres Slack or ping me on Mastodon.

Why? Because every module on CPAN has one and only one entry in the index file. Ricardo Signes explains↩︎

More about… Postgres Extensions PGXN Versioning SemVer CalVer

Wednesday, 21. February 2024

David Kelts on ID

“Ripping” IDs Should Go the Way of CDs…

The Solvable Problems of Mobile KYC Remember endless hours ripping your CD collection to digital…why are we spending endless hours scanning analog ID documents over and over again? Alternatives exist. Let’s think differently about how Mobile KYC fails customers A recent thread on a work-oriented social platform caught my attention. Some people I know to be the people they are could not veri
The Solvable Problems of Mobile KYC

Remember endless hours ripping your CD collection to digital…why are we spending endless hours scanning analog ID documents over and over again? Alternatives exist. Let’s think differently about how Mobile KYC fails customers

A recent thread on a work-oriented social platform caught my attention. Some people I know to be the people they are could not verify their profiles using a mobile KYC solution. This article isn’t to bash those solutions, or the myriad improvements made, but to urge our industry to understand the complexity we’ve foisted on our users and incite action to do different.

Analog to Digital Conversion, anyone? “Ripping IDs” to confirm identity should go the way of Ripping CDs… How do I know the problems that exist?

In December 2014, before a presentation to the Identity Ecosystem Steering Group about how to “Bring the trust of the Driver’s license online”, MorphoTrust USA filed a provisional application that became US Patent 10,678,939. The front-back-selfie wave was born, admittedly without the system of record connection described in that patent text. I spent several years leading a smart team improving an identity verification product line.

I’ve watched thousands of people try to run these selfie-to-DL matching steps in various forms and helped debug their issues with smiles, vision acuity, phone models, and lighting. I’ve spent weekends helping hundreds of great Utahns use mobile registration live in a credit union lobby.

Visually impaired people cannot get over these hurdles. It takes too many hands for these solutions to be inclusive. Back-facing phone camera quality varies greatly with how much you can spend. These solutions are not inclusive by any stretch of the imagination, and the fallback is to go to a physical location.

The Places Within the Selfie-ID Verification Process that Fail Manually taking a quality, flat, well-lit, high-res picture of a card is hard Taking an ICAO-quality, well-lit, machine-matchable portrait is difficult. Everybody wants to smile (or make duck lips …idk…) in a selfie Webcams are very low resolution compared to phones and vary widely PDF417 Barcodes from the back of DL/ID cards are easy to generate and nearly every Fake ID has the same information on the back and front Connections to databases that would verify the authenticity and validity of a DL/ID are expensive or restricted to law enforcement (makes sense) Most hard-to-forge security features of an ID card require multi-spectral light, are tactile, or require magnification. Phones capture visible light Cropping the card portrait results in a small, low-res photo for machine-matching that has security lines, holograms, and indicia through it Cropping a portrait from a poorly captured, low-res ID card will not give sufficient resolution to measure the facial features for matching You need cloud/phone processing power to determine that a human is live and real for the probe image — called Presentation Attack Detection I have to ID myself repeatedly with each new service that wants (notice I didn’t say “needs”) ID and each KYC system differs, creating doubt

Can I stop here? We haven’t even gotten to the accuracy of biometric matching algorithms (new entries, if used, have solved any racial/gender bias, so lighting is the major hurdle to accurate matching). People think biometrics are 100% when a human visual verificdation is only ~97% (per a 2017 Google study that I cannot find).

Oh, did I mention that people end up with images of their IDs in photo rolls and synchronized to every app they grant photo library permission?

Humans are humans and will do what humans do using technology.

You simply cannot account for the variations people and machines will cause to happen (credit Jurassic Park)

If there are this many variables to a process that seems conceptually simple, one or more of those variables will go wrong.

We are turning away good potential customers

There are tremendous capture, scan, photo, and biometric matching systems out there from excellent, reputable vendors. The improvements in the decade since selfie matching started are incredible and laudable. This process is significantly easier and more accurate than it ever was.

Still, people sitting at their desks get turned away from having “verified profiles”. Now try taking a picture of your ID with one hand, holding your phone in the other, on a sidewalk balancing your potential rental scooter against your hip in bright, vacation sunlight. Ya, no!

Let’s go Digital

Mobile Driver’s Licenses (mDL) are now available to greater than 17% of the US cardholding population. Adoption isn’t awesome yet because people don’t know they have existed in production for nearly 2 years and in concept for four-plus years. ISO/IEC 18013–5 is solid and functional. The data is digitally signed by your Issuer but you control it. It provides a level of control over data and transparency into data-storage that users haven’t experienced for decades. There are verifier apps and equipment from multiple vendors. Places do accept mDL.

In the coming months of 2024, ISO/IEC 18013–7 will add consented, selective-disclosure, digitally-signed mDL sharing to apps on the same device AND to websites over the Internet. (This will be the topic of a forthcoming post of mine.) So imagine sharing just your age to be granted entry to that website you don’t want your name associated with… or accurately filing for benefits without typing into a form. During the final approval stages, the ISO/IEC 18013–7 Technical Specification will be available for implementation.

Let’s Go Digital to Digital

“The first bank to replace their Mobile KYC with a connection to a Mobile Driver’s License is going to garner attention from the press and obtain new customers.”

It’s time.

The mDL provides digitally signed, matchable portraits upon consent as well as only the digitally signed data relevant for the approval. ISO/IEC 18013–7 will allow app-to-app consent and retrieval of mDL data similar to the way your email launches your airline application. It will also present a subset of mDL data to a website after consent for uses like age verification.

The challenge is made: Let’s short-circuit problems 1, 3, 4, 5, 6, 7, 8, and 10 above. Let’s make better self-registration and selfie-KYC solutions. Let’s work together on mDL acceptance. Who will sign up:

Relying Party (bank): _____________________________________

Biometric Match and Liveness Vendor: ______________________

mDL Solution Provider: ____________________________________

Test Subject mDL Holders: _________________________________

David can be reached for comments and feedback at david at decipher dot id

Wednesday, 21. February 2024

Just a Theory

Extension Ecosystem Jobs to be Done

The challenges of the current Postgres extension ecosystem and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea. We begin with a survey of the jobs to be done by extensions packaging and distribution.

Over on the Tembo blog I’ve published a thinking-through of what all the jobs to be done of the ideal Postgres extension ecosystem might be:

These challenges and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea of the PostgreSQL extension ecosystem: to work though the jobs to be done, specify the tools to do those jobs, and outline a plan for the broader Postgres community to design and build them.

Future posts will dream up the tools and make the plan; today we begin with the jobs.

🎬 Let’s get started.

More about… PGXN Extensions Metadata Jobs to be Done

Monday, 19. February 2024

Heres Tom with the Weather


Damien Bod

Using a CSP nonce in Blazor Web

This article shows how to use a CSP nonce in a Blazor Web application using the InteractiveServer server render mode. Using a CSP nonce is a great way to protect web applications against XSS attacks and other such Javascript vulnerabilities. Code: https://github.com/damienbod/BlazorServerOidc Notes The code in this example was built using the example provided by […]

This article shows how to use a CSP nonce in a Blazor Web application using the InteractiveServer server render mode. Using a CSP nonce is a great way to protect web applications against XSS attacks and other such Javascript vulnerabilities.

Code: https://github.com/damienbod/BlazorServerOidc

Notes

The code in this example was built using the example provided by Javier Calvarro Nelson.

https://github.com/javiercn/BlazorWebNonceService

Services and middleware

The Blazor Web application is implemented using the AddInteractiveServerComponents for the InteractiveServer server render mode. The nonce can be used by implementing a nonce service using the CircuitHandler. The nonce service is a scoped service.

builder.Services.AddRazorComponents() .AddInteractiveServerComponents(); builder.Services.TryAddEnumerable(ServiceDescriptor .Scoped<CircuitHandler, BlazorNonceService>(sp => sp.GetRequiredService<BlazorNonceService>())); builder.Services.AddScoped<BlazorNonceService>();

The headers are implemented using the NetEscapades.AspNetCore.SecurityHeaders package. The headers are added to the Blazor nonce service using the NonceMiddleware middleware.

app.UseSecurityHeaders(SecurityHeadersDefinitions.GetHeaderPolicyCollection( app.Environment.IsDevelopment(), app.Configuration["OpenIDConnectSettings:Authority"])); app.UseMiddleware<NonceMiddleware>(); Setup Security headers

The security headers CSP script tag is setup as best possible for a Blazor Web application. A CSP nonce is used as well as the fallback definitions for older browsers.

.AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self().From(idpHost); builder.AddFontSrc().Self(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); builder.AddStyleSrc() .UnsafeInline() .Self(); // due to Blazor builder.AddScriptSrc() .WithNonce() .UnsafeEval() // due to Blazor WASM .StrictDynamic() .OverHttps() .UnsafeInline(); // fallback for older browsers when the nonce is used }) Setup Middleware to add the nonce to the state

The NonceMiddleware uses the nonce header created by the security headers package and sets the Blazor nonce service with the value. This is updated on every request.

namespace BlazorWebFromBlazorServerOidc; public class NonceMiddleware { private readonly RequestDelegate _next; public NonceMiddleware(RequestDelegate next) { _next = next; } public async Task Invoke(HttpContext context, BlazorNonceService blazorNonceService) { var success = context.Items.TryGetValue( "NETESCAPADES_NONCE", out var nonce); if (success && nonce != null) { blazorNonceService.SetNonce(nonce.ToString()!); } await _next.Invoke(context); } } Using the nonce in the UI

The BlazorNonceService can be used from the Blazor components in the InteractiveServer render mode. The nonce is applied to all script tags. If the script does not have the correct nonce, it will not be loaded. The GetNonce method reads the nonce value from the BlazorNonceService service.

@inject IHostEnvironment Env @inject BlazorNonceService BlazorNonceService @using System.Security.Cryptography; <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <base href="/" /> <link rel="stylesheet" href="css/bootstrap/bootstrap.min.css" /> <link href="css/site.css" rel="stylesheet" /> <link href="BlazorWebFromBlazorServerOidc.styles.css" rel="stylesheet" /> <HeadOutlet @rendermode="InteractiveServer" /> </head> <body> <Routes @rendermode="InteractiveServer" /> http://_framework/blazor.web.js </body> </html> @code { /// <summary> /// Original src: https://github.com/javiercn/BlazorWebNonceService /// </summary> [CascadingParameter] HttpContext Context { get; set; } = default!; protected override void OnInitialized() { var nonce = GetNonce(); if (nonce != null) { BlazorNonceService.SetNonce(nonce); } } public string? GetNonce() { if (Context.Items.TryGetValue("nonce", out var item) && item is string nonce and not null) { return nonce; } return null; } } Notes

Nonces can be applied to Blazor Web using the server rendered mode and the BlazorNonceService which implements the CircuitHandler. Thanks the Javier Calvarro Nelson for providing a solution to this. Next would be to find a solution for the AddInteractiveWebAssemblyComponents setup. You should always use a CSP nonce on a server rendered application and only load scripts with the CSP nonce applied to it.

Links

https://github.com/javiercn/BlazorWebNonceService

https://github.com/andrewlock/NetEscapades.AspNetCore.SecurityHeaders

Friday, 16. February 2024

IdM Thoughtplace

Regarding the recent SAP IDM Announcement

 “Life begins like a dream, becomes a little real, and ends like a dream.” ― Michael Bassey Johnson, The Oneironaut’s Diary As many of you already know, SAP has made public its plans on how SAP IDM will be retired as a supported offering. I’ve been stewing on this for a bit as I try to figure out exactly how I feel about this and what needs to happen next. To be fair, I haven’t

 “Life begins like a dream, becomes a little real, and ends like a dream.” ― Michael Bassey Johnson, The Oneironaut’s Diary

As many of you already know, SAP has made public its plans on how SAP IDM will be retired as a supported offering. I’ve been stewing on this for a bit as I try to figure out exactly how I feel about this and what needs to happen next.

To be fair, I haven’t worked with the product much for just over four years, and even then, I was working more with Version 7 than with Version 8. My opinions are completely my own and do not represent my current employer, any previous employer, or SAP.

While IDM is certainly showing its age, there are some very good things about it that I would love to see as an open-source offering. First is the Batch Processing capabilities of IDM, based on the old MaXware Data Synchronization Engine/MetaCenter solutions. It features some powerful functionality to synchronize and cleanse data. It sets up fairly easily and is quite easy to configure. I’m sure the open-source community could do well with maintaining the UI (It definitely should be JAVA-based rather than the old Windows MMC) that will fit better in today’s Enterprise setting. Also, easy integration with SaaS services is a needed upgrade.

The other thing that should be released into the wild is the Virtual Directory. It also provides powerful functionality for several use cases, from pass-through authentication to assisting in M&A use cases. It’s the perfect example of a “Black Box” offering that just works. It also makes it much easier to synchronize and cleanse data by representing many different back ends via the easy-to-consume LDAP standard.

It saddens me that SAP is choosing to move away from IDM, as one of the key selling points of SAP IDM is its ability to integrate seamlessly with the SAP ecosystem. I hope SAP will help all LCM/IGA vendors connect more easily with systems. SaaS integration should be easy or standards-based, but we still need to be concerned for organizations still using on-premises SAP tools.

SAP has indicated that Microsoft’s Entra ID will be the main partner in the future, but I hope they make this information open to all vendors and that there will be continuing support of standard protocols. This article gives me some hope, but actions speak louder than words. I do have some concerns that SAP, known as a vast software ecosystem that supports itself and tends to ignore the enterprise, is handing off to another large software provider whose management tools tend to support their software ecosystem first and consider the enterprise second. Let’s face it: most of Microsoft’s Identity and Access Management efforts have been about supporting the Office 365 family of products. Don’t get me wrong; it’s better than SAP in this regard, but it’s not that high of a level to meet. For what it’s worth, I am guardedly optimistic, but I always try to remain hopeful.

Finally, I think it’s important to thank the IDM team in Sofia for all their hard work over the years, which, of course, would not have been possible without the vision and effort of the original MaXware team based in Trondheim, Norway, and associated teams in the UK, Australia, and the US. The production from these small teams helped define what Identity Management is to this day.

Will this be my last blog entry on the topic of SAP IDM? I don’t know. Part of it will depend on if there are any moves towards the Open Source world. There have been at least three times in my life when I thought I was done with this tool, and deep down, I’m pretty sure there is a little more in my future. 

In the meantime, I hope to resume blogging more regarding the Identity and Access Management field in the near future. Time will tell.




Kent Bull

CESR enters provisional status in IANA Media Type Registry

Registration of the composable event streaming representation (CESR) format in the IANA Media Type Registry shows a recent development of the key event receipt infrastructure (KERI) and authentic chained data containers (ACDC) space and how the space is growing. See the following link for the official entry: IANA Media Type […]

Registration of the composable event streaming representation (CESR) format in the IANA Media Type Registry shows a recent development of the key event receipt infrastructure (KERI) and authentic chained data containers (ACDC) space and how the space is growing.

See the following link for the official entry: IANA Media Type Registry entry for CESR (application/cesr)

Thursday, 15. February 2024

Jon Udell

Creating a GPT Assistant That Writes Pipeline Tests

Here’s the latest installment in the series on working with LLMS: Creating a GPT Assistant That Writes Pipeline Tests. Once you get the hang of writing these tests, it’s mostly boilerplate, so I figured my team of assistants could help. I recruited Cody, GitHub Copilot, and Unblocked — with varying degrees of success. Then I … Continue reading Creating a GPT Assistant That Writes Pipeline Tests

Here’s the latest installment in the series on working with LLMS: Creating a GPT Assistant That Writes Pipeline Tests.

Once you get the hang of writing these tests, it’s mostly boilerplate, so I figured my team of assistants could help. I recruited Cody, GitHub Copilot, and Unblocked — with varying degrees of success. Then I realized I hadn’t yet tried creating a GPT. As OpenAI describes them, “GPTs are custom versions of ChatGPT that users can tailor for specific tasks or topics by combining instructions, knowledge, and capabilities.”

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

Tuesday, 13. February 2024

Moxy Tongue

Own Your Own AI

Working on it.. OYO AI by kidOYO® Learning Software, Education Services

Working on it.. OYO AI by kidOYO® Learning Software, Education Services

Monday, 12. February 2024

Moxy Tongue

Sovereign AI

In 2024, the utility of words yielding philosophical clarity that becomes embedded into the design of systems being deployed globally, and Nationally, yields methods that must be structured accurately in order to abide by the Sovereign systems they serve. In America, people own root authority, or the Sovereign infrastructure does not confer accuracy for, of, by human use. Data is the life blood o

In 2024, the utility of words yielding philosophical clarity that becomes embedded into the design of systems being deployed globally, and Nationally, yields methods that must be structured accurately in order to abide by the Sovereign systems they serve.

In America, people own root authority, or the Sovereign infrastructure does not confer accuracy for, of, by human use. Data is the life blood of AI systems. Data structure yields Sovereign results, and across our fast advancing world, inaccuracy deconstructs faster than accuracy builds accurately. The time has come for open transparent accuracy in the data structure of Soveriegnty itself to be laid bare, enabling the development of "Sovereign AI" upon foundations that serve people.

Many moons ago, this structural conversation began in the world of identity management. Professionally-deployed systems were challenged to confront inaccuracies in their modeling of human identity. Ubiquitously, people were no longer being conveyed structural constraints ensuring the structural accuracy of their root administrative authority over data systems of ultimate importance to their Sovereign participation and administration under well-founded laws that were crafted pre-tcp/ip, pre-digital data.

Identity systems have been challenged now for over 20 years to align their practices to the people they service. The work is not done. Self-Sovereign ID principles that emerged here on this blog, led to decentralized identity methods and practices advancing for developer use, and into general awareness by a population that is intensely interested in digital frontiers where their lives meet opportunity, security, and civil system integrity. The fire walls of Sovereign integrity, having been breached many times in consequential ways, started exposing their own structural deficiencies.

Enter AI: human identity that primarily exists in a database-driven system, and is founded on an old-era of physical presence, is now the domain of AI. Human beings can not compete structurally here, as AI derives utility that people provide, and far too often, provide ignorantly, without much personal insight or accountability for the structural choices conveyed upon them. Laws, as dependencies function, evolve at a much slower pace, and seem to lack insight into the structural underpinnings of identity silos that tcp/ip was advanced to break down and add utility too. Unfortunately, protections were not advanced with the same insight, and civil society is finding itself in a reactive mode, reacting to change like a surfer riding a wave, rather than a break wall securing civil participation in an AI-enabled society.

This is the moment. AI Sovereignty has a basic and tremendously important dependency in American civil society: people own root. 

If the data structure of human participation in America does not convey this basic structural reality, then people do not exist in a civil society, as defined by founding documents, intent, and Constitutional reach. Work is underway on this vector, and as always, the resulting choices and structures advanced will yield the results being pursued. The question on the table being asked is simple: do innovators understand what it means in structural Terms to ensure that people own root authority? 

"Own Your Own AI"


Damien Bod

Using Blob storage from ASP.NET Core with Entra ID authentication

This article shows how to implement a secure upload and a secure download in ASP.NET Core using Azure blob storage. The application uses Microsoft Entra ID for authentication and also for access to the Azure Blob storage container. Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage Blogs in this series Security architecture The application is setup to store the file uploads […]

This article shows how to implement a secure upload and a secure download in ASP.NET Core using Azure blob storage. The application uses Microsoft Entra ID for authentication and also for access to the Azure Blob storage container.

Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage

Blogs in this series Using Blob storage from ASP.NET Core with Entra ID authentication Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC Security architecture

The application is setup to store the file uploads to an Azure Blob storage container. The authentication uses delegated only flows. A user can authenticate into the application using Microsoft Entra ID. The Azure App registration defines App roles to use for access authorization. The roles are used in the enterprise application. Security groups link the users to the roles. The security groups are used in the Azure Blob container where the RBAC is applied using the groups. A SQL database is used to persist the meta data and integrate into the other parts of the application.

Setting up Azure Blob storage

Two roles were created in the Azure App registration. The roles are assigned to groups in the Enterprise application. The users allowed to used to Azure Blob storage are assigned to the groups.

The groups are then used to apply the RBAC roles in the Azure Blob container. The Storage Blob Data Contributor and the Storage Blob Data Reader roles are used.

Authentication

Microsoft Entra ID is used for authentication and implemented using the Microsoft.Identity.Web Nuget packages. The is a standard implementation. Two policies were created to validate the two different roles used in this solution.

string[]? initialScopes = configuration.GetValue<string> ("AzureStorage:ScopeForAccessToken")?.Split(' '); services.AddMicrosoftIdentityWebAppAuthentication(configuration) .EnableTokenAcquisitionToCallDownstreamApi(initialScopes) .AddInMemoryTokenCaches(); services.AddAuthorization(options => { options.AddPolicy("blob-one-read-policy", policyBlobOneRead => { policyBlobOneRead.RequireClaim("roles", ["blobonereadrole", "blobonewriterole"]); }); options.AddPolicy("blob-one-write-policy", policyBlobOneRead => { policyBlobOneRead.RequireClaim("roles", ["blobonewriterole"]); }); }); services.AddRazorPages().AddMvcOptions(options => { var policy = new AuthorizationPolicyBuilder() .RequireAuthenticatedUser() .Build(); options.Filters.Add(new AuthorizeFilter(policy)); }).AddMicrosoftIdentityUI();

Upload

The application uses the IFormFile interface with the file payload and uploads the file to Azure Blob storage. The BlobClient is setup to use Microsoft Entra ID and the meta data is added to the blob.

public BlobDelegatedUploadProvider(DelegatedTokenAcquisitionTokenCredential tokenAcquisitionTokenCredential, IConfiguration configuration) { _tokenAcquisitionTokenCredential = tokenAcquisitionTokenCredential; _configuration = configuration; } [AuthorizeForScopes(Scopes = ["https://storage.azure.com/user_impersonation"])] public async Task<string> AddNewFile(BlobFileUploadModel blobFileUpload, IFormFile file) { try { return await PersistFileToAzureStorage(blobFileUpload, file); } catch (Exception e) { throw new ApplicationException($"Exception {e}"); } } private async Task<string> PersistFileToAzureStorage( BlobFileUploadModel blobFileUpload, IFormFile formFile, CancellationToken cancellationToken = default) { var storage = _configuration.GetValue<string>("AzureStorage:StorageAndContainerName"); var fileFullName = $"{storage}/{blobFileUpload.Name}"; var blobUri = new Uri(fileFullName); var blobUploadOptions = new BlobUploadOptions { Metadata = new Dictionary<string, string?> { { "uploadedBy", blobFileUpload.UploadedBy }, { "description", blobFileUpload.Description } } }; var blobClient = new BlobClient(blobUri, _tokenAcquisitionTokenCredential); var inputStream = formFile.OpenReadStream(); await blobClient.UploadAsync(inputStream, blobUploadOptions, cancellationToken); return $"{blobFileUpload.Name} successfully saved to Azure Blob Storage Container"; }

The DelegatedTokenAcquisitionTokenCredential class is used to get access tokens for the blob upload or download. This uses the existing user delegated session and creates a new access token for the blob storage access.

using Azure.Core; using Microsoft.Identity.Client; using Microsoft.Identity.Web; namespace DelegatedEntraIDBlobStorage.FilesProvider.AzureStorageAccess; public class DelegatedTokenAcquisitionTokenCredential : TokenCredential { private readonly ITokenAcquisition _tokenAcquisition; private readonly IConfiguration _configuration; public DelegatedTokenAcquisitionTokenCredential(ITokenAcquisition tokenAcquisition, IConfiguration configuration) { _tokenAcquisition = tokenAcquisition; _configuration = configuration; } public override AccessToken GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken) { throw new NotImplementedException(); } public override async ValueTask<AccessToken> GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken) { string[]? scopes = _configuration["AzureStorage:ScopeForAccessToken"]?.Split(' '); if (scopes == null) { throw new Exception("AzureStorage:ScopeForAccessToken configuration missing"); } AuthenticationResult result = await _tokenAcquisition .GetAuthenticationResultForUserAsync(scopes); return new AccessToken(result.AccessToken, result.ExpiresOn); } }

Download

The download creates a BlobClient using the user delegated existing session. The file is downloaded directly.

using Azure.Storage.Blobs; using Azure.Storage.Blobs.Models; using Microsoft.Identity.Web; namespace DelegatedEntraIDBlobStorage.FilesProvider.AzureStorageAccess; public class BlobDelegatedDownloadProvider { private readonly DelegatedTokenAcquisitionTokenCredential _tokenAcquisitionTokenCredential; private readonly IConfiguration _configuration; public BlobDelegatedDownloadProvider(DelegatedTokenAcquisitionTokenCredential tokenAcquisitionTokenCredential, IConfiguration configuration) { _tokenAcquisitionTokenCredential = tokenAcquisitionTokenCredential; _configuration = configuration; } [AuthorizeForScopes(Scopes = ["https://storage.azure.com/user_impersonation"])] public async Task<Azure.Response<BlobDownloadInfo>> DownloadFile(string fileName) { var storage = _configuration.GetValue<string>("AzureStorage:StorageAndContainerName"); var fileFullName = $"{storage}/{fileName}"; var blobUri = new Uri(fileFullName); var blobClient = new BlobClient(blobUri, _tokenAcquisitionTokenCredential); return await blobClient.DownloadAsync(); } } Notes

The architecture is simple and has the base features required for a secure solution. Data protection and virus scanning needs to be applied to the files and this can be configured in the Azure Blob storage. The access is controlled to the users in the group. If this needs to be controlled more, the write access can be removed from the users and switched to a service principal. This can have both security advantages and disadvantages. Multiple clients might also need access to files in this solution and the security needs to be enforced. This requires further architecture changes.

Links

https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://github.com/AzureAD/microsoft-identity-web

Sunday, 11. February 2024

Werdmüller on Medium

A creative process

No apps; no frameworks; just space. Continue reading on Medium »

No apps; no frameworks; just space.

Continue reading on Medium »

Saturday, 10. February 2024

Heres Tom with the Weather

Phishing Mitigation for Mastodon.social

When a person is already logged into a mastodon instance, if they visit some pages on their instance associated with a user from another server, they are not redirected to the remote server because it is easier to interact with the remote user with their existing local session. However, if a person without an account is just visiting or they have an account but are logged out, mastodon redirects

When a person is already logged into a mastodon instance, if they visit some pages on their instance associated with a user from another server, they are not redirected to the remote server because it is easier to interact with the remote user with their existing local session. However, if a person without an account is just visiting or they have an account but are logged out, mastodon redirects them to the remote server presumably because mastodon doesn’t know whether they have a local account and visiting the remote server will have the complete and authoritative data for that remote user.

A welcome update to mastodon.social (included in 4.3.0-nightly) is a warning presented to visitors or logged out users before mastodon redirects them to a remote server for the original page. The code for Add confirmation when redirecting logged-out requests to permalink is particularly relevant to mastodon.social compared to other fediverse instances as mastodon.social has become a relatively big target for phishing. It’s a good bet that if someone is navigating the fediverse that their account is on mastodon.social. So, if an arbitrary victim is logged out of their mastodon.social account and visits a mastodon.social page belonging to the attacker, prior to this mitigation, mastodon.social would automatically redirect the victim to the attacker’s page which might be a fake login form to trick the victim into submitting their login credentials to the attacker’s site. Unfortunately, a significant percentage of people will submit the form.

One could imagine mastodon.social maintaining a list of trusted servers for automatic redirects but that would be an undesirable hornet’s nest and it’s not a bad thing when web surfers are conscious of the trust boundaries on the web.

Friday, 09. February 2024

Phil Windleys Technometria

Zero Trust with Zero Data

The physical world is full of zero trust examples, but they gather attributes for the access control decisions in a very different way than we're used to online. Presenting your ID to buy beer is used so often as an example of how verifiable credentials work that it's cliche. Cliche or not, there's another aspect of using an ID to buy beer that I want to focus on: it's an excellent example of

The physical world is full of zero trust examples, but they gather attributes for the access control decisions in a very different way than we're used to online.

Presenting your ID to buy beer is used so often as an example of how verifiable credentials work that it's cliche. Cliche or not, there's another aspect of using an ID to buy beer that I want to focus on: it's an excellent example of zero trust

Zero Trust operates on a simple, yet powerful principle: “assume breach.” In a world where network boundaries are increasingly porous and cyber threats are more evasive than ever, the Zero Trust model centers around the notion that no one, whether internal or external, should be inherently trusted. This approach mandates continuous verification, strict access controls, and micro-segmentation, ensuring that every user and device proves their legitimacy before gaining access to sensitive resources. If we assume breach, then the only strategy that can protect the corporate network, infrastructure, applications, and people is to authorize every access.

From Zero Trust
Referenced 2024-02-09T08:25:55-0500

The real world is full of zero trust examples. When we're controlling access to something in the physical world—beer, a movie, a boarding gate, points in a loyalty program, prescriptions, and so on—we almost invariably use a zero trust model. We authorize every access. This isn't surprising, the physical world is remarkably decentralized and there aren't many natural boundaries to exploit and artificial boundaries are expensive and inconvenient.

The other thing that's interesting about zero trust in the physical world is that authorization is also usually done using Zero Data. Zero data is a name StJohn Deakin gave to the concept of using data gathered just in time to make authorization and other decisions rather than relying on great stores of data. There are obvious security benefits from storing less data, but zero data also offers significantly greater convenience for people and organizations alike. To top all that off, it can save money by reducing the number of partner integrations (i.e., far fewer federations) and enable applications that have far greater scale.

Let's examine these benefits in the scenario I opened with. Imagine that instead of using a credential (e.g., driver's license) to prove your age when buying beer, we ran convenience stores like a web site. Before you could shop, you'd have to register an account. And if you wanted to buy beer, the company would have to proof the identity of the person to ensure they're over 21. Now when you buy beer at the store, you'd log in so the system could use your stored attributes to ensure you were allowed to buy beer.

This scenario is still zero trust, but not zero data. And it's ludicrous to imagine anyone would put up with it, but we do it everyday online. I don't know about you, but I'm comforted to know that every convenience store I visit doesn't have a store of all kinds of information about me in an account somewhere. Zero data stores less data that can be exploited by hackers (or the companies we trust with it).

The benefit of scale is obvious as well. In a zero data, zero trust scenario we don't have to have long-term transactional relationships with every store, movie, restaurant, and barber shop we visit. They don't have to maintain federation relationships with numerous identity providers. There are places where the ability to scale zero trust really matters. For example, it's impossible for every hospital to have a relationship with every other hospital for purposes of authorizing access for medical personal who move or need temporary access. Similarly, airline personal move between numerous airports and need access to various facilities at airports.

How do we build zero data, zero trust systems? By using verifiable credentials to transfer attributes about their subject in a way that is decentralized and yet trustworthy. Zero data aligns our online existence more closely with our real-world interactions, fostering new methods of communication while decreasing the challenges and risks associated with amassing, storing, and utilizing vast amounts of data.

Just-in-time, zero data, attribute transfer can make many zero trust scenarios more realizable because it's more flexible. Zero trust with zero data, facilitated by verifiable credentials, represents a pivotal transition in how digital identity is used in authorization decisions. By minimizing centralized data storage and emphasizing cryptographic verifiability, this approach aims to address the prevalent challenges in data management, security, and user trust. By allowing online interactions to more faithfully follow established patterns of transferring trust from the physical world, zero trust with zero data promotes better security with increased convenience and lower cost. What's not to like?

You can get more detail on many of the concepts in this post like verifiable credentials in my new book Learning Digital Identity from O'Reilly Media.

Photo Credit: We ID Everyone from DALL-E (Public Domain) DALL-E apparently thinks a six-pack has 8 bottles but this was the best of several attempts.

Monday, 05. February 2024

Damien Bod

Secure an ASP.NET Core Blazor Web app using Microsoft Entra ID

This article shows how to implement an ASP.NET Core Blazor Web application using Microsoft Entra ID for authentication. Microsoft.Identity.Web is used to implement the Microsoft Entra ID OpenID Connect client. Code: https://github.com/damienbod/Hostedblazor8MeID Note: I based this implementation on the example provided by Tomás López Rodríguez and adapted it. Setup The Blazor Web application is an

This article shows how to implement an ASP.NET Core Blazor Web application using Microsoft Entra ID for authentication. Microsoft.Identity.Web is used to implement the Microsoft Entra ID OpenID Connect client.

Code: https://github.com/damienbod/Hostedblazor8MeID

Note: I based this implementation on the example provided by Tomás López Rodríguez and adapted it.

Setup

The Blazor Web application is an OpenID Connect confidential client (code flow, PKCE) which uses Microsoft Entra ID for authentication. An Azure App registration (Web configuration) is used to create the client and only delegated scopes are used. A secret is used to authenticate the application in development. Client assertions can be used in production deployments. NetEscapades.AspNetCore.SecurityHeaders is used to implement the security headers as best possible for Blazor Web. No identity management or user passwords are handled in the application.

The client part of the Blazor Web application can use the PersistentAuthenticationStateProvider class to read the user profile data.

This uses data from the server part implemented in the PersistingRevalidatingAuthenticationStateProvider class. See the code in the github repo.

OpenID Connect confidential client

The AddMicrosoftIdentityWebAppAuthentication method is used to implement the client authentication using the Microsoft.Identity.Web packages. I use a downstream API to force that the client uses code flow with PKCE instead of the implicit flow. Microsoft Graph is only requesting delegated user profile data.

// Add authentication services var scopes = builder.Configuration.GetValue<string>("DownstreamApi:Scopes"); string[] initialScopes = scopes!.Split(' '); builder.Services.AddMicrosoftIdentityWebAppAuthentication(builder.Configuration) .EnableTokenAcquisitionToCallDownstreamApi(initialScopes) .AddMicrosoftGraph("https://graph.microsoft.com/v1.0", scopes) .AddInMemoryTokenCaches();

The client automatically reads from the AzureAd configuration. This can be changed if you would like to update the product name. The client uses the standard Microsoft Entra ID setup. You need to add the permissions in the Azure App registration created for this application.

"AzureAd": { "Instance": "https://login.microsoftonline.com/", "Domain": "[Enter the domain of your tenant, e.g. contoso.onmicrosoft.com]", "TenantId": "[Enter 'common', or 'organizations' or the Tenant Id (Obtained from the Azure portal. Select 'Endpoints' from the 'App registrations' blade and use the GUID in any of the URLs), e.g. da41245a5-11b3-996c-00a8-4d99re19f292]", "ClientId": "[Enter the Client Id (Application ID obtained from the Azure portal), e.g. ba74781c2-53c2-442a-97c2-3d60re42f403]", "ClientSecret": "[Copy the client secret added to the app from the Azure portal]", "ClientCertificates": [ ], // the following is required to handle Continuous Access Evaluation challenges "ClientCapabilities": [ "cp1" ], "CallbackPath": "/signin-oidc" }, "DownstreamApi": { "Scopes": "User.ReadBasic.All user.read" },

Login and Logout

An AuthenticationExtensions class was used to implement the login and the logout for the application. The Login method is an HTTP GET request which redirects to the OpenID Connect server. The Logout method is an authentication HTTP POST request which requires CSRF protection and accepts no parameters. The return URL to the unauthenticated signed out page is fixed and so no open redirect attacks are possible. The login cleans up the local cookies as well as a redirect to the identity provider to logout on Microsoft Entra ID.

using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.OpenIdConnect; using Microsoft.AspNetCore.Authentication; namespace BlazorWebMeID; public static class AuthenticationExtensions { public static WebApplication SetupEndpoints(this WebApplication app) { app.MapGet("/Account/Login", async (HttpContext httpContext, string returnUrl = "/") => { await httpContext.ChallengeAsync(OpenIdConnectDefaults.AuthenticationScheme, new AuthenticationProperties { RedirectUri = !string.IsNullOrEmpty(returnUrl) ? returnUrl : "/" }); }); app.MapPost("/Account/Logout", async (HttpContext httpContext) => { var authenticationProperties = new AuthenticationProperties { RedirectUri = "/SignedOut" }; await httpContext.SignOutAsync(OpenIdConnectDefaults.AuthenticationScheme, authenticationProperties); await httpContext.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme); }).RequireAuthorization(); return app; } }

Security headers

The security headers are used to protect the session. When using AddInteractiveWebAssemblyComponents mode, the script CSP header is really weak and adds little protection leaving the application open to numerous XSS, Javascript attacks. It is not possible to use CSP nonces with Blazor Web using the InteractiveWebAssemblyComponents mode, or I have not found a way to do this, as the Blazor Web components cannot read the HTTP headers in the response. A Blazor WASM hosted in an ASP.NET Core application can use CSP nonces and is a more secure application.

namespace HostedBlazorMeID.Server; public static class SecurityHeadersDefinitions { public static HeaderPolicyCollection GetHeaderPolicyCollection(bool isDev, string? idpHost) { ArgumentNullException.ThrowIfNull(idpHost); var policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self().From(idpHost); builder.AddFontSrc().Self(); builder.AddStyleSrc().Self(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); // due to Blazor Web, nonces cannot be used with AddInteractiveWebAssemblyComponents mode. // weak script CSP.... builder.AddScriptSrc() .Self() // self required .UnsafeEval() // due to Blazor WASM .UnsafeInline(); // only a fallback for older browsers when the nonce is used }) .RemoveServerHeader() .AddPermissionsPolicy(builder => { builder.AddAccelerometer().None(); builder.AddAutoplay().None(); builder.AddCamera().None(); builder.AddEncryptedMedia().None(); builder.AddFullscreen().All(); builder.AddGeolocation().None(); builder.AddGyroscope().None(); builder.AddMagnetometer().None(); builder.AddMicrophone().None(); builder.AddMidi().None(); builder.AddPayment().None(); builder.AddPictureInPicture().None(); builder.AddSyncXHR().None(); builder.AddUsb().None(); }); if (!isDev) { // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(); } policy.ApplyDocumentHeadersToAllResponses(); return policy; } }

Notes

I am starting to understand how Blazor Web works and have difficultly with the session state and sharing this between different components. Some basic browser security cannot be used, i.e. CSP nonces. The mixed mode has strange UI effects which I could not clean up.

There are now four types of Blazor applications.

Blazor WASM hosted in an ASP.NET Core application Blazor Server Blazor Web Blazor WASM standalone

Blazor WASM hosted in an ASP.NET Core application and Blazor Server can be secured in a good way using the recommended security best practices (OpenID Connect confidential client). Blazor Web can implement a confidential client but is missing the recommend script session protection. Blazor WASM standalone cannot implement the recommended authentication as it is a public application and should no longer be used in secure environments.

Links

https://github.com/CrahunGit/Auth0BlazorWebAppSample/tree/master/BlazorApp4

https://github.com/dotnet/blazor-samples/tree/main/8.0/BlazorWebAppOidc

https://github.com/AzureAD/microsoft-identity-web

https://github.com/andrewlock/NetEscapades.AspNetCore.SecurityHeaders

Friday, 02. February 2024

Just a Theory

Presentation: Introduction to the PGXN Architecture

I made a presentation on the PGXN architecture for the Tembo team.

As I started digging into the jobs and tools for the Postgres extension ecosystem as part of my new gig, I realized that most people have little knowledge of the PGXN architecture. I learned a lot designing PGXN and its services, and am quite pleased with where it ended up, warts and all. So I thought it worthwhile to put together a brief presentation on the fundamental design principals (static REST file API), inter-related services (root mirror, manager, API, site) and tools (CLI, CI/CD).

Yesterday, the Tembo blog published the presentation, including the video and slides, along with a high-level architecture diagram. I hope it’s a useful point of reference for the Postgres community as we look to better distribute extensions in the future.

More about… PGXN Software Architecture REST JSON Tembo

Thursday, 01. February 2024

Just a Theory

Contemplating Decentralized Extension Publishing

The Go package ecosystem uses distributed publishing to release modules without authentication or uploads. Could we do something similar for Postgres extensions?
TL;DR

As I think through the future of the Postgres extension ecosystem as a key part of the new job, I wanted to understand how Go decentralized publishing works. In this post I work it out, and think through how we might do something similar for Postgres extension publishing. It covers the Go architecture, namespacing challenges, and PGXS abuse; then experiments with URL-based namespacing and ponders reorganizing installed extension files; and closes with a high-level design for making it work now and in the future.

It is, admittedly, a lot, mainly written for my own edification and for the information of my fellow extension-releasing travelers.

I find it fascinating and learned a ton. Maybe you will too! But feel free to skip this post if you’re less interested in the details of the journey and want to wait for more decisive posts once I’ve reached the destination.

Introduction

Most language registries require developers to take some step to make releases. Many automate the process in CI/CD pipelines, but it requires some amount of effort on the developer’s part:

Register for an account Learn how to format things to publish a release Remember to publish again for every new version Create a pipeline to automate publishing (e.g., a GitHub workflow) Decentralized Publishing

Go decentralized publishing has revised this pattern: it does not require user registration or authentication to to publish a module to pkg.go.dev. Rather, Go developers simply tag the source repository, and the first time someone refers to the tag in Go tools, the Go module index will include it.

For example, publishing v1.2.1 of a module in the github.com/golang/example repository takes just three commands:

git tag v1.2.1 -sm 'Tag v1.2.1' git push --tags go list -m github.com/golang/example@v1.2.1

After a few minutes, the module will show up in the index and then on pkg.go.dev. Anyone can run go get -u github.com/golang/example to get the latest version. Go developers rest easy in the knowledge that they’re getting the exact module they need thanks to the global checksum database, which Go uses “in many situations to detect misbehavior by proxies or origin servers”.

This design requires go get to understand multiple source code management systems: it supports Git, Subversion, Mercurial, Bazaar, and Fossil.1 It also needs the go.mod metadata file to live in the project defining the package.

But that’s really it. From the developer’s perspective it could not be easier to publish a module, because it’s a natural extension of the module development tooling and workflow of committing, tagging, and fetching code.

Decentralized Extension Publishing

Could we publish Postgres extensions in such a decentralized pattern? It might look something like this:

The developer places a metadata file in the proper location (control file, META.json, Cargo.toml, whatever — standard TBD) To publish a release, the developer tags the repository and calls some sort of indexing service hook (perhaps from a tag-triggered release workflow) The indexing service validates the extension and adds it to the index

Note that there is no registration required. It simply trusts the source code repository. It also avoids name collision: github.com/bob/hash is distinct from github.com/carol/hash.

This design does raise challenges for clients, whether they’re compiling extensions on a production system or building binary packages for distribution: they have to support various version control systems to pull the code (though starting with Git is a decent 90% solution).

Namespacing

Then there’s name conflicts. Perhaps github.com/bob/hash and github.com/carol/hash both create an extension named hash. By the current control file format, the script directory and module path can use any name, but in all likelihood the use these defaults:

directory = 'extension' module_pathname = '$libdir/hash'

Meaning .sql files will be installed in the Postgres share/extension subdirectory — along with all the other installed extensions — and library files will be installed in the library directory along with all other libraries. Something like this:

pgsql ├── lib │ └── hash.so └── share └── extension │ └── hash.control │   ├── hash--1.0.0.sql └── doc └── hash.md

If both projects include, say, hash.control, hash--1.0.0.sql, and hash.so, the files from one will stomp all over the files of the other.

Installer Abuse

Go avoids this issue by using the domain and path from each package’s repository in its directory structure. For example, here’s a list of modules from google.golang.org repositories:

$ ls -1 ~/go/pkg/mod/google.golang.org api@v0.134.0 api@v0.152.0 appengine@v1.6.7 genproto genproto@v0.0.0-20230731193218-e0aa005b6bdf grpc@v1.57.0 grpc@v1.59.0 protobuf@v1.30.0 protobuf@v1.31.0 protobuf@v1.32.0

The ~/go/pkg/mod directory has subdirectories for each VCS host name, and each then subdirectories for package paths. For the github.com/bob/hash example, the files would all live in ~/go/pkg/mod/github.com/bob/hash.

Could a Postgres extension build tool follow a similar distributed pattern by renaming the control file and installation files and directories to something specific for each, say github.com+bob+hash and github.com+carol+hash? That is, using the repository host name and path, but replacing the slashes in the path with some other character that wouldn’t create subdirectories — because PostgreSQL won’t find control files in subdirectories. The control file entries for github.com/carol/hash would look like this:

directory = 'github.com+carol+hash' module_pathname = '$libdir/github.com+carol+hash'

Since PostgreSQL expects the control file to have the same name as the extension, and for SQL scripts to start with that name, the files would have to be named like so:

hash ├── Makefile ├── github.com+carol+hash.control └── sql └── github.com+carol+hash--1.0.0.sql

And the Makefile contents:

EXTENSION = github.com+carol+hash MODULEDIR = $(EXTENSION) DATA = sql/$(EXTENSION)--1.0.0.sql PG_CONFIG ?= pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS)

In other words, the extension name is the full repository host name and path and the Makefile MODULEDIR variable tells pg_config to put all the SQL and documentation files into a directories named github.com+carol+hash — preventing them from conflicting with any other extension.

Finally, the github.com+carol+hash.control file — so named becaus it must have the same name as the extension — contains:

default_version = '1.0.0' relocatable = true directory = 'github.com+carol+hash' module_pathname = '$libdir/github.com+carol+hash'

Note the directory parameter, which must match MODULEDIR from the Makefile, so that CREATE EXTENSION can find the SQL files. Meanwhile, module_pathname ensures that the library file has a unique name — the same as the long extension name — again to avoid conflicts with other projects.

That unsightly naming extends to SQL: using the URL format could get to be a mouthful:

CREATE EXTENSION "github.com+carol+hash";

Which is do-able, but some new SQL syntax might be useful, perhaps something like:

CREATE EXTENSION hash FROM "github.com+carol+hash";

Or, if we’re gonna really go for it, use slashes after all!

CREATE EXTENSION hash FROM "github.com/carol/hash";

Want to use both extensions but they have conflicting objects (e.g., both create a “hash” data type)? Put them into separatre schemas (assuming relocatable = true in the control file):

CREATE EXTENSION hash FROM "github.com/carol/hash" WITH SCHEMA carol; CREATE EXTENSION hash FROM "github.com/bob/hash" WITH SCHEMA bob; CREATE TABLE try ( h1 carol.hash, h2 bob.hash );

Of course it would be nice if PostgreSQL added support for something like Oracle packages, but using schemas in the meantime may be sufficient.

Clearly we’re getting into changes to the PostgreSQL core, so put that aside and we can just use long names for creating, modifying, and dropping extensions, but not necessarily otherwise:

CREATE EXTENSION "github.com+carol+hash" WITH SCHEMA carol; CREATE EXTENSION "github.com+bob+hash" WITH SCHEMA bob; CREATE EXTENSION "gitlab.com+barack+kicker_type"; CREATE TABLE try ( h1 carol.hash, h2 bob.hash kt kicker ); Namespacing Experiment

To confirm that this approach might work, I committed 24134fd and pushed it in the namespace-experiment branch of the semver extension. This commit changes the extension name from semver to github.com+theory+pg-semver, and follows the above steps to ensure that its files are installed with that name.

Abusing the Postgres extension installation infrastructure like this does work, but suffers from a number of drawbacks, including:

The extension name is super long, as before, but now so too are the files in the repository (as opposed to the installer renaming them on install). The shared library file has to have the long name, so therefore does the .c source file. The SQL files must all start with github.com+theory+pg-semver, although I skipped that bit in this commit; instead the Makefile generates just one from sql/semver.sql. Any previous installation of the semver type would remain unchanged, with no upgrade path. Changing an extension’s name isn’t a great idea.

I could probably script renaming and modifying file contents like this and make it part of the build process, but it starts to get complicated. We could also modify installers to make the changes, but there are a bunch of moving parts they would have to compensate for, and given how dynamic this can be (e.g., the semver Makefile reads the extension name from META.json), we would rapidly enter the territory of edge case whac-a-mole. I suspect it’s simply too error-prone.

Proposal: Update Postgres Extension Packaging

Perhaps the Go directory pattern could inspire a similar model in Postgres, eliminating the namespace issue by teaching the Postgres extension infrastructure to include all but one of the files for an extension in a single directory. In other words, rather than files distributed like so for semver:

pgsql ├── lib │ └── semver.so └── share └── extension │ └── semver.control │   ├── semver--0.32.1.sql │   ├── semver--0.32.0--0.32.1.sql └── doc └── semver.md

Make it more like this:

pgsql └── share └── extension └── github.com └── theory └── pg-semver └── extension.control └── lib │ └── semver.so └── sql │ └── semver--0.32.1.sql │ └── semver--0.32.0--0.32.1.sql └── doc └── semver.md

Or perhaps:

pgsql └── share └── extension └── github.com └── theory └── pg-semver └── extension.control └── semver.so └── semver--0.32.1.sql └── semver--0.32.0--0.32.1.sql └── semver.md

The idea is to copy the files exactly as they’re stored in or compiled in the repository. Meanwhile, the new semver.name file — the only relevant file stored outside the extension module directory — simply points to that path:

github.com/theory/pg-semver

Then for CREATE EXTENSION semver, Postgres reads semver.name and knows where to find all the files to load the extension.

This configuration would require updates to the control file, now named extension.control, to record the full package name and appropriate locations. Add:

name = 'semver' package = 'github.com/theory/pg-semver'

This pattern could also allow aliasing. Say we try to install a different semver extension from github.com/example/semver. This is in its extension.control file:

name = 'semver' package = 'github.com/example/pg-semver'

The installer detects that semver.name already exists for a different package and raises an error. The user could then give it a different name by running something like:

make install ALIAS_EXTENSION_NAME=semver2

This would add semver2.name right next to semver.name, and its contents would contain github.com/example/semver, where all of its files are installed. This would allow CREATE EXTENSION semver2 to load the it without issue (assuming no object conflicts, hopefully resolved by relocate-ability).

I realize a lot of extensions with libraries could wreak some havoc on the library resolver having to search so many library directories, but perhaps there’s some way around that as well? Curious what techniques experienced C developers might have adopted.

Back to Decentralized Publishing

An updated installed extension file structure would be nice, and is surely worth a discussion, but even if it shipped in Postgres 20, we need an updated extension ecosystem today, to work well with all supported versions of Postgres. So let’s return to the idea of decentralized publishing without such changes.

I can think of two pieces that’d be required to get Go-style decentralized extension publishing to work with the current infrastructure.

Module Uniqueness

The first is to specify a new metadata field to be unique for the entire index, and which would contain the repository path. Call it module, after Go (a single Git repository can have multiple modules). In PGXN Meta Spec-style JSON it’d look something like this:

{ "module": "github.com/theory/pg-semver", "version": "0.32.1", "provides": { "semver": { "abstract": "A semantic version data type", } } }

Switch from the PGXN-style uniqueness on the distribution name (usually the name of the extension) and let the module be globally unique. This would allow another party to release an extension with the same name. Even a fork where only the module is changed:

{ "module": "github.com/example/pg-semver", "version": "0.32.1", "provides": { "semver": { "abstract": "A semantic version data type", } } }

Both would be indexed and appear under the module name, and both would be find-able by the provided extension name, semver.

Where that name must still be unique is in a given install. In other words, while github.com/theory/pg-semver and github.com/example/pg-semver both exist in the index, the semver extension can be installed from only one of them in a given Postgres system, where the extension name semver defines its uniqueness.

This pattern would allow for much more duplication of ideas while preserving the existing per-cluster namespacing. It also allows for a future Postgres release that supports something like the flexible per-cluster packaging as described above.2

Extension Toolchain App

The second piece is an extension management application that understands all this stuff and makes it possible. It would empower both extension development workflows — including testing, metadata management, and releasing — and extension user workflows — finding, downloading, building, and installing.

Stealing from Go, imagine a developer making a release with something like this:

git tag v1.2.1 -sm 'Tag v1.2.1' git push --tags pgmod list -m github.com/theory/pg-semver@v1.2.1

The creatively named pgmod tells the registry to index the new version directly from its Git repository. Thereafter anyone can find it and install it with:

pgmod get github.com/theory/pg-semver@v1.2.1 — installs the specified version pgmod get github.com/theory/pg-semver — installs the latest version pgmod get semver — installs the latest version or shows a list of matching modules to select from

Any of these would fail if the cluster already has an extension named semver with a different module name. But with something like the updated extension installation locations in a future version of Postgres, that limitation could be loosened.

Challenges

Every new idea comes with challenges, and this little thought experiment is no exception. Some that immediately occur to me:

Not every extension can be installed directly from its repository. Perhaps the metadata could include a download link for a tarball with the results of any pre-release execution? Adoption of a new CLI could be tricky. It would be useful to include the functionality in existing tools people already use, like pgrx. Updating the uniqueness constraint in existing systems like PGXN might be a challenge. Most record the repository info in the resources META.json object, so it would be do-able to adapt into a new META format, either on PGXN itself or in a new registry, should we choose to build one. Getting everyone to standardize on standardized versioning tags might take some effort. Go had the benefit of controlling its entire toolchain, while Postgres extension versioning and release management has been all over the place. However PGXN long ago standardized on semantic versioning and those who have released extensions on PGXN have had few issues (one can still use other version formats in the control file, for better or worse). Some PGXN distributions have shipped different versions of extensions in a single release, or the same version as in other releases. The release version of the overall package (repository, really) would have to become canonical.

I’m sure there are more, I just thought of these offhand. What have you thought of? Post ’em if you got ’em in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email.

Or does it? Yes, it does. Although the Go CLI downloads most public modules from a module proxy server like proxy.golang.org, it still must know how to download modules from a version control system when a proxy is not available. ↩︎

Assuming, of course, that if and when the Postgres core adopts more bundled packaging that they’d use the same naming convention as we have in the broader ecosystem. Not a perfectly safe assumption, but given the Go precedent and wide adoption of host/path-based projects, it seems sound. ↩︎

More about… Postgres PGXN Extensions Go Packaging Distributed Publishing

Wednesday, 31. January 2024

Just a Theory

PGXN Tools v1.4

The pgxn-tools Docker image has seen some recent bug fixes and improvements.

Over on the PGXN Blog I’ve posted a brief update on recent bug fixes and improvements to the pgxn-tools Docker image, which is used fairly widely these days to test, bundle, and release Postgres extensions to PGXN. This fix is especially important for Git repositories:

v1.4.1 fixes an issue where git archive was never actually used to build a release zip archive. This changed at some point without noticing due to the introduction of the safe.directory configuration in recent versions of Git. Inside the container the directory was never trusted, and the pgxn-bundle command caught the error, decided it wasn’t working with a Git repository, and used the zip command, instead.

I also posted a gist listing PGXN distributions with a .git directory.

More about… Postgres PGXN Docker GitHub Workflow

Mike Jones: self-issued

Invited OpenID Federation Presentation at 2024 FIM4R Workshop

The OpenID Federation editors were invited to give a presentation on OpenID Federation at the 18th FIM4R Workshop, which was held at the 2024 TIIME Unconference. Giuseppe De Marco, Roland Hedberg, John Bradley, and I tag-teamed the presentation, with Vladimir Dzhuvinov also participating in the Q&A. Topics covered included motivations, architecture, design decisions, capabilities, use […]

The OpenID Federation editors were invited to give a presentation on OpenID Federation at the 18th FIM4R Workshop, which was held at the 2024 TIIME Unconference. Giuseppe De Marco, Roland Hedberg, John Bradley, and I tag-teamed the presentation, with Vladimir Dzhuvinov also participating in the Q&A. Topics covered included motivations, architecture, design decisions, capabilities, use cases, history, status, implementations, and people.

Here’s the material we used:

OpenID Federation 1.0: Shaping The Advanced Infrastructure of Trust

It was the perfect audience – chock full of people with practical federation deployment experience!


Fully-Specified Algorithms adopted by JOSE working group

The “Fully-Specified Algorithms for JOSE and COSE” specification has been adopted by the JOSE working group. See my original post about the spec for why fully-specified algorithms matter. Thanks to all who supported adoption and also thanks to those who provided useful detailed feedback that we can address in future working group drafts. The specification […]

The “Fully-Specified Algorithms for JOSE and COSE” specification has been adopted by the JOSE working group. See my original post about the spec for why fully-specified algorithms matter. Thanks to all who supported adoption and also thanks to those who provided useful detailed feedback that we can address in future working group drafts.

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-jose-fully-specified-algorithms-00.html

Tuesday, 30. January 2024

Jon Udell

How to Learn Unfamiliar Software Tools with ChatGPT

Here’s the latest installment in the series on working with LLMS: How to Learn Unfamiliar Software Tools with ChatGPT. Ideally, tools like GeoGebra and Metabase provide interfaces so intuitive that you rarely need to read the docs, and you can learn the software just by poking around in it. In reality, of course, we need … Continue reading How to Learn Unfamiliar Software Tools with ChatGPT

Here’s the latest installment in the series on working with LLMS: How to Learn Unfamiliar Software Tools with ChatGPT.

Ideally, tools like GeoGebra and Metabase provide interfaces so intuitive that you rarely need to read the docs, and you can learn the software just by poking around in it. In reality, of course, we need those docs — and they still need to be excellent. But now, we’ll extract a new benefit from them. When we can partner with machines that have read the docs, and can look over our shoulders as we try to do the things described in the docs, we’ll turbocharge our ability to dive into unfamiliar software tools and quickly learn how to use them.

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education


Michael Ruminer

Thoughts on Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy

Thoughts on Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy I ran across a paper named Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy. It was published mid-2022 so it is not the most up to date for the topic but it is very interesting and still of high value. It’s a meta-study of four research questions about self-sovereign identity. RQ-1: What Practi

Thoughts on Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy

I ran across a paper named Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy. It was published mid-2022 so it is not the most up to date for the topic but it is very interesting and still of high value. It’s a meta-study of four research questions about self-sovereign identity.

RQ-1: What Practical Problems Have Been Introduced and Solved? RQ-2: What Properties, Formal Definitions and Cryptographic Tools Have Been Used? RQ-3: What Conceptual Ideas Have Been Introduced or Refuted? RQ-4: When, Where, and by Whom Were SSI Studies Published?

It spends a lot of text before the research questions on how it built the study objectively, criteria for the data, and criteria for the inclusion of the other research papers. Though interesting, it was not what compelled me in the paper. As you might imagine, it was the red meat of the paper, the research questions, that I really found most interesting.

You’ll find in the research question sections that it does a nice inventory of various papers and a description of what they cover. I found RQ-1 to be the most interesting as it covers a lot of verifiable credentials and verifiable presentation topics.

Of RQ-1 I found section 6.2 to be of special interest. It covers:

The operational facet is divided into two facets: VC and VP.
They are a collection of concepts related to the functional aspects
of verifiable credentials and verifiable presentations.

And includes:

Revocation Decentralized Identifiers Issuer Authorization Delegation Backup and Recovery

RQ-3 is a short section and starts with an interesting statement that is probably less true today than when written but still holds a lot of truth.

...that there is currently no agreement on a definition of SSI...
Our third research question is answered by an examination of
the literature’s debates on the SSI definition.

Though I appreciate RQ-4 and it makes sense in the context of the paper, I found the least value in its presentation. It did remind me of a relationship graph I created a number of years back except that my graph was on the relationship of the specifications at the time. The header image of this post is a small rendering of that graph. You can find the useful version at Verifiable Credentials Specification Map. Reader beware that the specification map I list was last updated late May of 2021 so it is not an accurate source of the state of specifications for today though many of the relationships it does show are likely valid. This is really a topic for a different day.

All in all, despite the relative age of the paper, the other papers it refers to are often still valid today in their intent and basic questions, agreements, and refutations. I think it is well worth your time to look at the research questions portions if interested in self-sovereign identity (a phrase that seems to be moving more so out of popular use) and verifiable credentials.


Jon Udell

You say feature, I say bug: the enshittification of Microsoft Paint

I’ve happily used MS Paint as my basic bitmap editor since Windows 3, almost 25 years ago. Mostly I’ve used it to create images from screenshots, but that has suddenly become way harder. Formerly, when I’d cut a region, the now-empty region would display using the default white background. Now it displays a checkered background … Continue reading You say feature, I say bug: the enshittification of

I’ve happily used MS Paint as my basic bitmap editor since Windows 3, almost 25 years ago. Mostly I’ve used it to create images from screenshots, but that has suddenly become way harder. Formerly, when I’d cut a region, the now-empty region would display using the default white background. Now it displays a checkered background like so.

Here is the procedure to refill the white background:

Switch the foreground color to white Use the Fill tool to fill the checkered region Then switch the foreground back to black.

ARE YOU KIDDING ME?

Nope. It’s evidently an unintended consequence of a pair of new feature: layers and transparency.

To get started, click on the new Layers button in the toolbar, which will open a panel on the side of the canvas.”

Microsoft also revealed today that an upcoming Paint feature is support for image transparency, which will add the ability to open and save transparent PNG files.

During editing, users will notice a prominent checkerboard pattern displayed on the canvas, serving as a visual indicator and highlighting the transparent regions within the image.

This ensures that when content is erased from the canvas, it is completely removed, eliminating the need to cover unwanted regions of an image with white fill.

bleepingcomputer.com

I never asked for these “long-awaited” new features, Paint is (or was) useful to me precisely because it only does the kind of basic bitmap editing I need when compositing screenshots. But I can opt out, right?

Nope.

ARE YOU KIDDING ME?

Nope.

This feature (layers and image transparency) seems to be introduced in September 2023 and doesn’t actually allow to be turned off.

Doing what vengy proposes for each and every image being edited is a natural madness and will drive even the most sane person crazy.

What worked for me was to uninstall Paint and replace it with a classic version:

Uninstalling can be done by simply right-clicking Paint icon in Start Menu and selecting Uninstall from context menu. Classic Paint can be get from here or here.

Download and install it.

Go to Settings → Apps → Apps & Features → More settings → App execution aliases.

Toggle the switch to Off for mspaint.exe and pbrush.exe items.

superuser.com

Evidently people are willing to hack their systems in order to revert to a now-unsupported version that they prefer. As insane as it would be, I’m considering whether to become one of those people. Sigh. I guess 25 years was a pretty good run.


Just a Theory

PGXN Challenges

Some thoughts on the challenges for PGXN’s role in the ideal PostgreSQL extension ecosystem of the future.

Last week, I informally shared Extension Ecosystem: Jobs and Tools with colleagues in the #extensions channel on the Postgres Slack. The document surveys the jobs to be done by the ideal Postgres extension ecosystem and the suggests the tools and services required to do those jobs — without reference to existing extension registries and packaging systems.

The last section enumerates some questions we need to ponder and answer. The first one on the list is:

What will PGXN’s role be in this ideal extension ecosystem?

The PostgreSQL Extension Network, or PGXN, is the original extension distribution system, created 2010–11. It has been a moderate success, but as we in the Postgres community imagine the ideal extension distribution future, it’s worthwhile to also critically examine existing tools like PGXN, both to inform the project and to realistically determine their roles in that future.

With that in mind, I here jot down some thoughts on the challenges with PGXN.

PGXN Challenges

PGXN sets a lot of precedents, particularly in its decoupling of the registry from the APIs and services that depend on it. It’s not an all-in-one thing, and designed for maximum distributed dissemination via rsync and static JSON files.

But there are a number of challenges with PGXN as it currently stands; a sampling:

PGXN has not comprehensively indexed all public PostgreSQL extensions. While it indexes more extensions than any other registry, it falls far short of all known extensions. To be a truly canonical registry, we need to make it as simple as possible for developers to register their extensions. (More thoughts on that topic in a forthcoming post.)

In that vein, releasing extensions is largely a manual process. The pgxn-tools Docker image has improved the situation, allowing developers to create relatively simple GitHub workflows to automatically test and release extensions. Still, it requires intention and work by extension developers. The more seamless we can make publishing extensions the better. (More thoughts on that topic in a forthcoming post.)

It’s written in Perl, and therefore doesn’t feel modern or easily accessible to other developers. It’s also a challenge to build and distribute the Perl services, though Docker images could mitigate this issue. Adopting a modern compiled language like Go or Rust might increase community credibility and attract more contributions.

Similarly, pgxnclient is written in Python and the pgxn-utils developer tools in Ruby, increasing the universe of knowledge and skill required for developers to maintain all the tools. They’re also more difficult to distribute than compiled tools would be. Modern cross-compilable languages like Go and Rust once again simplify distribution and are well-suited to building both web services and CLIs (but not, perhaps native UX applications — but then neither are dynamic languages like Ruby and Python).

The PGXN Search API uses the Apache Lucy search engine library, a project that retired in 2018. Moreover, the feature never worked very well, thanks to the decision to expose separate search indexes for different objects — and requiring the user to select which to search. People often can’t find what they need because the selected index doesn’t contain it. Worse, the default index on the site is “Documentation”, on the surface a good choice. But most extensions include no documentation other than the README, which appears in the “Distribution” index, not “Documentation”. Fundamentally the search API and UX needs to be completely re-architected and -implemented.

PGXN uses its own very simple identity management and basic authentication. It would be better to have tighter community identity, perhaps through the PostgreSQL community account.

Given these issues, should we continue building on PGXN, rewrite some or all of its components, or abandon it for new services. The answer may come as a natural result of designing the overall extension ecosystem architecture or from the motivations of community consensus. But perhaps not. In the end, we’ll need a clear answer to the question.

What are your thoughts? Hit us up in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email. We expect to start building in earnest in February, so now’s the time!

More about… Postgres PGXN Extensions

Monday, 29. January 2024

Identity Woman

Event Reflection: Children’s Digital Privacy Summit 2024

Last week, I flew to LA To attend the Children’s Digital Privacy Summit hosted by Denise Tayloe and her team at Privo. I’ve known Denise since the early days of IIW, and it was great to meet her team for the first time.    They put on a great show what began with a talk […] The post Event Reflection: Children’s Digital Privacy Summit 2024 appeared first on Identity Woman.

Last week, I flew to LA To attend the Children’s Digital Privacy Summit hosted by Denise Tayloe and her team at Privo. I’ve known Denise since the early days of IIW, and it was great to meet her team for the first time.    They put on a great show what began with a talk […]

The post Event Reflection: Children’s Digital Privacy Summit 2024 appeared first on Identity Woman.


Phil Windleys Technometria

Acceptance Networks for Self-Sovereign Identity

We can't have broad adoption of verifiable credentials until we find a way to scale their presentation by providing tooling that credential verifiers can use to reduce their risk and gain confidence in the facts presented to them. When I hand a merchant in London a piece of plastic that I got from a bank in Utah to make a purchase, a tiny miracle happens. Despite the fact that the merchant has neve

We can't have broad adoption of verifiable credentials until we find a way to scale their presentation by providing tooling that credential verifiers can use to reduce their risk and gain confidence in the facts presented to them.

When I hand a merchant in London a piece of plastic that I got from a bank in Utah to make a purchase, a tiny miracle happens. Despite the fact that the merchant has never met me before and has no knowledge of my bank, she blithely allows me to walk out of the store with hundreds of dollars of merchandise, confident that she will receive payment. I emphasized the word confident in the last sentence because it's core to understanding what's happened. In the past, these kinds of transactions required that the merchant trust me or my bank. But in the modern world, trust has been replaced by confidence.

We often mix these concepts up and I'm as guilty as anyone. But trust always involves an element of risk, whereas confidence does not. These are not binary, but rather represent a spectrum. In the scenario I paint above, the merchant is still taking some risk, but it's very small. Technology, processes, and legal agreements have come together to squeeze out risk. The result is a financial system where the risk is so small that banks, merchants, and consumers alike have confidence that they will not be cheated. There's a name in the financial services industry for the network that reduces risk so that trust can be replaced with confidence: an acceptance network.

Acceptance Networks

An acceptance network is the network of merchants or service providers that accept a particular form of payment, usually credit or debit cards, from a particular issuer or payment network. The term refers to a broad ecosystem that facilitates these transactions, including point-of-sale terminals, online payment gateways, and other infrastructure. Each component of the acceptance network plays a crucial role in ensuring that transactions are processed efficiently, securely, and accurately. This drives out risk and increases confidence. Acceptance networks are foundational components of modern payment ecosystems and are essential to the seamless functioning of digital financial transactions. Visa, Mastercard, American Express, and Discover are all examples of acceptance networks.

Before the advent of acceptance networks, credit was a spotty thing with each large merchant issuing it's own proprietary credit card—good only at that merchant. My mom and dad had wallets full of cards for JC Penney, Sears, Chevron, Texaco, and so on. Sears trusted its card. Chevron trusted its card. But it was impossible to use a Chevron card at Sears. They had limited means to verify if it was real and no way to clear the funds so that Chevron could pay Sears for the transaction.

That scenario is similar to the state of digital identity today. We have identity providers (IdPs) like Google and Apple who control a closed ecosystem of relying parties (with a lot of overlap). These relying parties trust these large IdPs to authenticate the people who use their services. They limit their risk by only using IdPs they're familiar with and only accepting the (usually) self-asserted attributes from the IdP that don't involve much risk. Beyond that they must verify everything themselves.

Fixing this requires the equivalent of an acceptance network for digital identity. When we launched Sovrin Foundation and the Sovrin network1 in 2016, we were building an acceptance network for digital identity, even though we didn't use that term to describe it. Our goal was to create a system of protocols, processes, technology and governance that would reduce the risk of self-sovereign identity and increase confidence in an identity system that let the subjects present verifiable credentials that carried reliable attributes from many sources.

I've written previously about identity metasystems that provide a framework for how identity transactions happen. Individual identity systems are built according to the architecture and protocols of the metasystem. Acceptance networks are an instantiation of the metasystem for a particular set of users and types of transactions. A metasystem for self-sovereign identity might have several acceptance networks operating in it to facilitate the operation of specific identity systems.

Problems an Acceptance Network Can Solve

To understand why an acceptance network is necessary to reduce risk and increase confidence in identity transactions, let's explore the gaps that exist without it. The following diagram shows the now familiar triangle of verifiable credential exchange. In this figure, issuers issue credentials to holders who may or may not be the subject of the credentials. The holder presents cryptographic proofs that assert the value of relevant attributes using one of more of the credentials that they hold. The verifier verifies the proof and uses the attributes.

Verifiable Credential Exchange

Let's explore what it means for the verifier to verify the proof. The verifier wants to know a number of things about the credential presentation:

Were the credentials issued to the entity making the presentation?

Have any of the credentials been tampered with?

Have any of the credentials been revoked?

What are the schema for the credentials (to understand the data in them)?

Who issued the credentials in the proof?

The first four of these can be done cryptographically to provide confidence in the attestation. The technology behind the credential presentation is all that's necessary. They can be automated as part of the exchange. For example, the proof can contain pointers (e.g., DIDs) to the credential definitions. These could contain public keys for the credential and references to schema.

The last one—who issued the credential—is not a technical matter. To see why, imagine that Alice (as holder and subject) has been issued a credential from her university (the issuer) giving information about her educational experiences there. She's applying for a job and wants to present the credential to a prospective employer (the verifier). How does the employer know that Alice didn't just make the credential herself or buy it from a diploma mill?

Knowing who issued the credential is not something that can be done solely with technology (although it can help). The employer in this scenario wants more than an identifier for the issuer. And they want to know that the public key really does belong to the university. In short, the employer wants to resolve the identifier to other information that tells them something about the university and the credential. There are lots of ways to do that—people have been doing this sort of thing for centuries: states keep registries of businesses (universities are businesses), accreditation organizations keep registries of schools they've accredited, the Department of Education has registries of various institutions of higher education in the US, and so on.

The employer could make use of these by building its own database of university identifiers it trusts. And every time a new one shows up, they could investigate and add it to their registry (or not)2. But going back to the magic of the credit card scenario that I opened this article with, if every merchant had to keep their own registry of banks, the experience wouldn't be magical for me or the merchant. The financial acceptance network makes it easy for the merchant to have confidence that they'll be paid because they have not only technology, but processes, protocols, governance, and legal agreements that make the verification process automatable.

Acceptance Networks for Digital Identity

For some use cases, keeping your own registry of the issuers you trust works. But for many, it's just too much work and makes it difficult to make use of a variety of credentials. This kind of "localized trust" is unwieldy in an identity system that might involve millions of issuers and identifiers and credentials for billions or even trillions of subjects. I've written extensively about identity metasystems and what they provide to help bridge the gap. This one, on how metasystems help provide life-like identity for digital systems is perhaps the most comprehensive. Acceptance networks implement metasystems.

An acceptance network for digital identity must have a number of important properties, including the following:

Credentials are decentralized and contextual—There is no central authority for all credentials. Every party can be an issuer, a holder (identity owner), or a verifier. Verifiable credentials can be adapted to any country, any industry, any community, or any set of trust relationships.

Credential issuers decide on what data is contained in their credentials—Anyone can create a credential schema for their use case. Anyone can create a credential definition based on any of these schemas.

Verifiers make their own trust decisions about which credentials to accept—There's no central authority who determines what credentials are important or which are used for what purpose. The acceptance network supplies the technical underpinnings for credential exchange and support protocols for automating the verification of credential issuers.

Credential verifiers don't need to have any specific technical, contractual, or commercial relationship with credential issuers—Verifiers do not need to contact issuers to perform verification.

Credential holders are free to choose which credentials to carry and what information to disclose—People and organizations are in control of the credentials they hold (just as they are with physical credentials) and determine what to share with whom.

You may be thinking "but these are mostly about decentralized decision making." While it would be easier to imagine the acceptance network as a big directory, that solution can't possible support all the different ways people and organizations might want to use credentials. That doesn't mean an acceptance network couldn't be run by a single organization, like some financial services networks. Just that it has to support a variety of credential ecosystems running common protocols. I also think that there will be more than one and most issuers and verifiers will be part of several (again, like in financial services).

Structure of an Acceptance Network

One of the things we can take away from the architecture of financial services acceptance networks is that they are built in layers. No one has thought more about how this can work than Drummond Reed and the Trust Over IP Foundation (ToIP).3 This figure, from ToIP, shows how such a stack works.

Trust Over IP Stack

The layers build on each other to provide something the lower level didn't. Layer 1 is the foundational functionality, like DID methods. Layer 2 builds on that to support creating digital relationships with anyone. Layer 3 uses those relationships to effect credential exchange. Layer 4 is the ecosystems that say things about the issuers for different use cases. The dual stack emphasizes the need for governance at every layer.

The acceptance network specifies the accepted protocols and technologies. The acceptance network also supports ecosystems, providing governance models and technology. The acceptance network is involved at each layer. Here are some examples of things an acceptance network might do at each layer:

Layer 1—limit the allowed DID methods and certify them.

Layer 2—require that wallets and agents using the network support specific versions of the DIDComm protocol. Provide a certification framework for wallet and agent vendors for security and interoperability.

Layer 3—require specific versions of the exchange protocols. Participate in protocol development. Provide a certification framework for specific implementations to aid with security and interoperability.

Layer 4—support the formation, certification, and discovery of credential ecosystem providers. Govern what is required to be a certified ecosystem provider and provide models for acceptable ecosystem governance.

As part of it's overall governance of the ecosystem, the acceptance network also provides model legal agreements for and between the various participants, trust mark rights (think of the Visa logo), and drives a uniform user experience.

The following diagram shows the credential exchange from the preceding figure with an acceptance network providing support to the verifier so that it can have confidence in the data the issuer has supplied through the holder.

Acceptance Network in Operation

Credential issuers who know their credential might be widely used would join one or more acceptance networks. They agree to follow the rules and regulations in the governance framework of the acceptance network. The acceptance network issues a credential to them that they can use to prove they are a member.4 The acceptance network maintains a registry—likely a registry of registries—that verifiers can use to discover information about the issuer of a credential that has been presented to them.

Using an Acceptance Network

Returning to our previous scenario, Alice holds a credential issued by her university. She presents it to a prospective employer who wants to know that the credential is from an accredited university. Alice's university has been accredited by an accreditation organization5. They have followed their process for accrediting Alice's university and issued it a credential. They have also added the university to their registry. The university and the accrediting organization are members of an acceptance network. The employer's systems know to automatically query the acceptance network when it received a credential proof from a issuer it does not know. Doing so provides the assurance that the issuer is legitimate. It could also provide information about the accreditation status of the university. This information reduces the risk that the employer would otherwise bear.

In this scenario, the employer is trusting the processes and structure of the acceptance network. The employer must decide which acceptance networks to use. This is much more scalable than having to make these determinations for every credential issuer. The acceptance network has allowed the verification process to scale and made the overall use of verifiable credentials easier and less risky.

A Note on Implementation

This discussion of acceptance networks has undoubtedly brought images to your mind about how it is structured or how to build one. The comparison to financial services acceptance networks points to a network run by an organization. And the term registrybrings to mind a database of some kind. Why these are certainly possibilities, I think it's also possible to imagine more decentralized solutions. For example, the registry could be a distributed ledger or blockchain. The governance is likely most easily done by an organization, but there are other options like a decentralized autonomous organization (DAO). The scenario I described above illustrates a federated system where certifying authorities for specific ecosystems determine their own methods, processes, and requirements, but link their registry to that of the acceptance network.

Conclusion

As I mentioned above, we've been solving the problem of how to know which institutions to trust for centuries. We have ways of knowing whether a university is accredited, whether a bank is real, whether a company is actually registered and what its reputation is. What is missing is an easy way to make use of this information digitally so that processes for reducing risk can be automated. Acceptance networks rationalize the process and provide the needed tooling to automate these checks. They reduce the many-to-many problem that exists when each verifier has to determine whether to trust each issuer with a more scalable many-to-several system. Acceptance networks allow credential presentation to scale by providing the needed infrastructure for giving verifiers confidence in the facts that holders present to them.

Notes

You can see in the linked post how we used trust to describe what we were building, even as we were reducing risk and inspiring confidence.

Note that this investigation could make use of technology. Knowing the universities name, they could look up a well known location on the universities web site to find the identifier. They could use PKI (digital certificates) to be sure they're talking to the right place. They could look up the university in an online registry of accredited universities.

Trust over IP isn't the only one working on this. Marie Wallace of Accenture and Stephen Wilson of Lockstep Partners have been writing about this idea.

Note that there could be different levels or types of members who perform different roles in the ecosystem and make different agreements.

An example is the Northwest Commission on Colleges and Universities.

Photo Credit: Data flowing over networks from DALL-e

Thursday, 25. January 2024

Mike Jones: self-issued

OAuth 2.0 Protected Resource Metadata draft addressing all known issues

Aaron Parecki and I have published a draft of the “OAuth 2.0 Protected Resource Metadata” specification that addresses all the issues that we’re aware of. In particular, the updates address the comments received during the discussions at IETF 118. As described in the History entry for -02, the changes were: Switched from concatenating .well-known to […]

Aaron Parecki and I have published a draft of the “OAuth 2.0 Protected Resource Metadata” specification that addresses all the issues that we’re aware of. In particular, the updates address the comments received during the discussions at IETF 118. As described in the History entry for -02, the changes were:

Switched from concatenating .well-known to the end of the resource identifier to inserting it between the host and path components of it. Have WWW-Authenticate return resource_metadata rather than resource.

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-oauth-resource-metadata-02.html

Wednesday, 24. January 2024

Jon Udell

Learn by Doing: How LLMs Should Reshape Education

Here’s the latest installment in the series on working with LLMS: Learn by Doing: How LLMs Should Reshape Education. If you’re teaching SQL, this article points to a pedagogical challenge/opportunity: How would I create a lesson that guides a student to an understanding of CROSS JOIN without ever mentioning or explicitly teaching anything about it? … Continue reading Learn by Doing: How LLMs Should

Here’s the latest installment in the series on working with LLMS: Learn by Doing: How LLMs Should Reshape Education.

If you’re teaching SQL, this article points to a pedagogical challenge/opportunity: How would I create a lesson that guides a student to an understanding of CROSS JOIN without ever mentioning or explicitly teaching anything about it?

If you’re teaching anything else, the same question could (I’ll argue should) apply. How to scaffold learning by doing?

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

Monday, 22. January 2024

Just a Theory

I’m a Postgres Extensions Tembonaut

Near year, new job. I accepted a new position at Tembo to work on improving the PostgreSQL extension ecosystem full time.

New year, new job.

I’m pleased to announce that I started a new job on January 2 at Tembo, a fully-managed PostgreSQL developer platform. Tembo blogged the news, too.

I first heard from Tembo CTO Samay Sharma last summer, when he inquired about the status of PGXN, the PostgreSQL Extension Network, which I built in 2010–11. Tembo bundles extensions into Postgres stacks, which let developers quickly spin up Postgres clusters with tools and features optimized for specific use cases and workloads. The company therefore needs to provide a wide variety of easy-to-install and well-documented extensions to power those use cases. Could PGXN play a role?

I’ve tended to PGXN’s maintenance for the last fourteen years, and thanks in no small part to hosting provided by depesz. As of today’s stats it distributes 376 extensions on behalf of 419 developers. PGXN has been a moderate success, but Samay asked how we could collaborate to build on its precedent to improve the extensions ecosystem overall.

It quickly became apparent that we share a vision for what that ecosystem could become, including:

Establishing the canonical Postgres community index of extensions, something PGXN has yet to achieve Improving metadata standards to enable new patterns, such as automated binary packaging Working with the Postgres community to establish documentation standards that encourage developers to provide comprehensive extension docs Designing and building developer tools that empower more developers to build, test, distribute, and maintain extensions

Over the the past decade I’ve have many ideas and discussion on these topics, but seldom had the bandwidth to work on them. In the last couple years I’ve enabled TLS and improved the site display, increased password security, and added a notification queue with hooks that post to both Twitter (RIP @pgxn) and Mastodon (@pgxn@botsin.space). Otherwise, aside from keeping the site going, periodically improving new accounts, and eyeing the latest releases, I’ve had little bandwidth for PGXN or the broader extension ecosystem.

Now, thanks to the vision and strategy of Samay and Tembo CEO Ry Walker, I will focus on these projects full time. The Tembo team have already helped me enumerate the extension ecosystem jobs to be done and the tools required to do them. This week I’ll submit it to collaborators from across the Postgres community1 to fill in the missing parts, make adjustments and improvements, and work up a project plan.

The work also entails determining the degree to which PGXN and other extension registries (e.g., dbdev, trunk, pgxman, pgpm (WIP), etc.) will play a role or provide inspiration, what bits should be adopted, rewritten, or discarded.2 Our goal is to build the foundations for a community-owned extensions ecosystem that people care about and will happily adopt and contribute to.

I’m thrilled to return to this problem space, re-up my participation in the PostgreSQL community, and work with great people to build out the extensions ecosystem for future.

Want to help out or just follow along? Join the #extensions channel on the Postgres Slack. See you there.

Tembo was not the only company whose representatives have reached out in the past year to talk about PGXN and improving extensions. I’ve also had conversations with Supabase, Omnigres, Hydra, and others. ↩︎

Never be afraid to kill your darlings↩︎

More about… Personal Work Tembo Postgres Extensions

Sunday, 21. January 2024

Mike Jones: self-issued

Celebrating Ten Years of OpenID Connect at the OpenID Summit Tokyo 2024

We held the first of three planned tenth anniversary celebrations for the completion of OpenID Connect at the OpenID Summit Tokyo 2024. The four panelists were Nov Matake, Ryo Ito, Nat Sakimura, and myself. We shared our perspectives on what led to OpenID Connect, why it succeeded, and what lessons we learned along the way. […]

We held the first of three planned tenth anniversary celebrations for the completion of OpenID Connect at the OpenID Summit Tokyo 2024. The four panelists were Nov Matake, Ryo Ito, Nat Sakimura, and myself. We shared our perspectives on what led to OpenID Connect, why it succeeded, and what lessons we learned along the way.

The most common refrain throughout our descriptions was the design philosophy to “Keep simple things simple”. I believe that three of the four of us cited it.

I recounted that we even had a thought experiment used to make the “Keep simple things simple” principle actionable in real time: the “Nov Matake Test”. As we considered new features, we’d ask ourselves “Would Nov want to add it to his implementation?” And “Is it simple enough that he could build it in a few hours?”

The other common thread was the criticality of interop testing and certification. We held five rounds of interop testing before finishing the specifications, with the specs being refined after each round based on the feedback received. The early developer feedback was priceless – much of it from Japan!

Our OpenID Connect 10th anniversary presentations were:

Remarks by Mike Jones Remarks by Nov Matake Remarks by Ryo Ito Remarks by Nat Sakimura

Thanks to the OpenID Foundation Japan for the thought-provoking and enjoyable OpenID Summit Tokyo 2024!

Friday, 19. January 2024

Mike Jones: self-issued

2024 OpenID Foundation Board Election Results

Thanks to those of you who elected me to a two-year term on the OpenID Foundation board of directors. This is an incredibly exciting time for the OpenID Foundation and for digital identity, and I’m thrilled to be able to contribute via the OpenID board. Thanks for placing your trust in me! I’d like to […]

Thanks to those of you who elected me to a two-year term on the OpenID Foundation board of directors. This is an incredibly exciting time for the OpenID Foundation and for digital identity, and I’m thrilled to be able to contribute via the OpenID board. Thanks for placing your trust in me!

I’d like to also take this opportunity to congratulate my fellow board members who were also elected: George Fletcher, Atul Tulshibagwale, and Mark Verstege. See the OpenID Foundation’s announcement of the 2024 election results.

My candidate statement was:

I am on a mission to build the Internet’s missing identity layer. OpenID specifications and initiatives are key to realizing that vision.

Widespread deployment of OpenID specifications has the potential to make people’s online interactions more seamless, secure, and valuable. I have been actively working since 2007 to make that an everyday reality.

2024 has huge potential for advances in digital identity. People are starting to have identity wallets holding digital credentials that they control. National and international federations are being established. Open Banking and Open Finance deployments are ongoing. Adoption of OpenID Connect (which we created a decade ago!) continues going strong. We’re on track to have OpenID Connect be published as ISO standards. OpenID specifications and programs are essential to all these outcomes.

While many of you know me and my work, here’s a few highlights of my contributions to the digital identity space and the OpenID community:

– I was primary editor of OpenID Connect, primary editor of the OAuth 2.0 bearer token specification [RFC 6750], and primary editor of the JSON Web Token (JWT) specification [RFC 7519] and the JSON Object Signing and Encryption (JOSE) specifications [RFCs 7515-7518], which are used by OpenID Connect. I was an editor of the Security Event Token specification [RFC 8417], which is used by Shared Signals and OpenID Connect. I’m an editor of the SIOPv2 specification and a contributor to the other OpenID for Verifiable Credentials specifications. I’m an editor of the OpenID Federation specification. The OAuth DPoP specification [RFC 9449] was my latest RFC. I’m an author of 32 RFCs and 17 final OpenID specifications, with more of each in the pipeline.

– I spearheaded creation of the successful OpenID Connect certification program and continue actively contributing to its success. Over 2,800 certifications have been performed and the pace keeps increasing! Certification furthers the Foundation’s goals of promoting interoperation and increasing the quality of implementations. It’s also become an important revenue stream for the Foundation.

– My contributions to the Foundation have included serving on the board since 2008, serving as board secretary during most of my tenure. I’ve helped organize numerous OpenID summits and working group meetings and regularly present there. I chaired the election committee that developed the Foundation’s election procedures and software. I co-chaired the local chapters committee that developed the policies governing the relationships with local OpenID chapters around the world. I serve on the liaison committee, facilitating our cooperation with other organizations. And way back in 2007, I worked with the community to create the legal framework for the OpenID Foundation, enabling both individuals and corporations to be full participants in developing OpenID specifications and ensuring that they can be freely used by all.

I’d like to continue serving on the OpenID board, because while the OpenID community is having notable successes, our work is far from done. Taking it to the next level will involve both additional specifications work and strategic initiatives by the Foundation. We need to continue building a broad base of supporters and deployers of OpenID specifications around the world. We need to continue fostering close working relationships with partner organizations. And we need to continue safeguarding OpenID’s intellectual property and trademarks, so they remain freely available for all to use.

I have a demonstrated track record of energetically serving the OpenID community and producing results that people actually use. I plan to continue taking an active role in making open identity solutions even more successful and ubiquitous. That’s why I’m running for a community board seat in 2024.

Mike Jones
michael_b_jones@hotmail.com
Blog: https://self-issued.info/
Professional Website: https://self-issued.consulting/


reb00ted

I can't remember any time when more spaces for innovation and entrepreneurship were wide open than now

Lenin supposedly said: There are decades where nothing happens, and there are weeks where decades happen. It’s the same in technology. I came to Silicon Valley in the mid-90’s, just in time to see the dot-com boom unfold. Lots happened very quickly in that time. There were a few more such periods of rapid change since, like when centralized social media got going, and when phones turned

Lenin supposedly said:

There are decades where nothing happens, and there are weeks where decades happen.

It’s the same in technology.

I came to Silicon Valley in the mid-90’s, just in time to see the dot-com boom unfold. Lots happened very quickly in that time. There were a few more such periods of rapid change since, like when centralized social media got going, and when phones turned into real computers. But for many years now, not much has happened: we got used to the idea that there’s a very small number of ever-larger tech giants, which largely release incremental products and that’s just that. Nothing much happens.

But over the last year or so, suddenly things are happening again. I think not only are the spaces for innovation and entrepreneurship now more open than they have been for at least a decade or more; it’s possible they have never been as open as they are now.

Consider:

Everybody’s favorite subject: machine learning and AI. I don’t believe in much of what most people seem to believe about AI these days. I’m not part of the hype train. However, I do believe that machine learning is a fundamental innovation that allows us to program computers in a radically different way than we have in the past 50 and more years: instead of telling the computer what to do, we let it observe how it’s done and have it copy what it saw. Most of what today’s AI companies use machine learning for, in my view, is likely not going to stand the test of time. However, I do believe that this fundamentally different way of programming a computer is going to find absolutely astounding and beneficial applications at some point. It could be today: the space for invention, innovation and entrepreneurship is wide open.

The end of ever-larger economies of scale and network effects in tech. The dominant tech companies are very close to having pretty much all humans on the planet as customers. The number of their users is not going to double again. So the cost structure of their businesses is not going to get reduced any more simply by selling the same product to more customers, nor is the benefit of their product going to grow through growing network effects as much as in the past. It’s like they are running into a physical limit to the size of many things they can do. This opens space for innovation and successful competition.

Most interesting, it allows the creation of bespoke products again; products that are optimized for particular markets, customer groups and use cases. Ever noticed that Facebook is the same product for everybody, whether you are rich or poor, whether you have lots of time, or none, whether you are CEO or a kid, whether you like in one place or another, whether you are interested in sports or not and so forth? It’s the same for products of the other big platform vendors. That is a side effect of the focus on economies of scale. All of a sudden, increased utility for the user will need to come from serving their specific needs, not insisting that all cars need to be black. For targeted products, large platforms have no competitive advantages over small organizations; in fact, they may be at a real disadvantage. Entrepreneurs, what are you waiting for?

The regulators suddenly have found their spine and aren’t kidding around, starting with the EU.

The Apple App Store got in the way of your business? They are about to force the App Store open and allow side loading and alternate app stores (although Apple is trying hard to impede this as much as possible; a fight is brewing; my money is on the regulators).

The big platforms hold all your data hostage? Well, in many jurisdictions around the world you now have the right to get all copy of all your data. Even better, the “continuous, real-time access” provision of the EU’s Digital Markets Act is about to come into force.

The platforms don’t let you interoperate or connect? Well, in the EU, a legal requirement for interoperability of messaging apps is already on the books, and more are probably coming. Meta’s embrace of ActivityPub as part of Threads is a sign of it.

Imagine what you can do, as an entrepreneur, if you can distribute outside of app stores, use the same data on the customer that the platforms have, and you can interoperate with them? The mind boggles … many product categories that previous were impossible to compete with suddenly are in play again.

Social networking is becoming an open network through the embrace of ActivityPub by Meta’s Threads. While nobody outside of Meta completely understands why they are doing this, they undoubtedly are progressing towards interoperability with the Fediverse. Whatever the reasons, chances are that they also apply to other social media products, by Meta and others. All of a sudden competing with compelling social media application is possible again because you have a fully built-out network with its network effects from day one.

Consumers know tech has a problem. They are more willing to listen to alternatives to what they know than they have in a long time.

And finally, 3D / Spatial Computing a la Apple. (I’m not listing Meta here because clearly, they don’t have a compelling vision for it. Tens of billions spent and I still don’t know what they are trying to do.)

Apple is creating an an entirely new interaction model for how humans can interact with technology. It used to be punch cards and line printers. Then we got interactive green-screen terminals. And then graphics displays, and mice. That was in the 1980’s. Over the next 40 years, basically nothing happened (except adding voice for some very narrow applications). By using the space around us as a canvas, Apple is making it possible to interact with computing in a radically different way. Admittedly, nobody knows so far how to really take advantage of the new medium, but once somebody does, I am certain amazing things will happen.

Again, an opportunity ripe for the taking. If it works, it will have the same effects on established vendors as the arrival of the web had on established vendors: some managed to migrate, or the arrival graphical user interfaces on the vendors of software for character terminals; most failed to make the switch. So this is another ideal entrepreneurial territory.

But here’s the kicker: what if you combined all of the above? What can you build if your primary interaction model is 3D overlayed over the real world, with bespoke experiences for your specific needs, assisted by (some) intelligence that goes beyond what computers typically do today, accomplished by some form of machine learning, all fed by personal data collected by the platforms, and distributed outside of the straightjacket and business strategies of app stores?

We have not seen as much opportunity as this in a long time; maybe ever.

Thursday, 18. January 2024

Heres Tom with the Weather

Winer's Law of the Internet

Something to keep in mind as big tech connects to the fediverse is Winer’s Law of the Internet which ends with The large companies always try to make the technology complicated to reduce competition to other organizations with large research and development budgets. This is 20 years old but it has stood the test of time.

Something to keep in mind as big tech connects to the fediverse is Winer’s Law of the Internet which ends with

The large companies always try to make the technology complicated to reduce competition to other organizations with large research and development budgets.

This is 20 years old but it has stood the test of time.

Tuesday, 16. January 2024

Phil Windleys Technometria

Exploring Digital Identity

I was recently on the Identity at the Center podcast speaking with hosts Jim McDonald and Jeff Steadman. We discussed my journey into the field of identity, Internet Identity Workshop, and my latest book "Learning Digital Identity." We also discussed the book writing process, key takeaways from the book, and the future of identity innovation. It was a fun conversation. I hope you enjoy it too.

I was recently on the Identity at the Center podcast speaking with hosts Jim McDonald and Jeff Steadman. We discussed my journey into the field of identity, Internet Identity Workshop, and my latest book "Learning Digital Identity." We also discussed the book writing process, key takeaways from the book, and the future of identity innovation. It was a fun conversation. I hope you enjoy it too.

Monday, 15. January 2024

Damien Bod

Migrate ASP.NET Core Blazor Server to Blazor Web

This article shows how to migrate a Blazor server application to a Blazor Web application. The migration used the ASP.NET Core migration documentation, but this was not complete and a few extra steps were required. The starting point was a Blazor Server application secured using OpenID Connect for authentication. The target system is a Blazor […]

This article shows how to migrate a Blazor server application to a Blazor Web application. The migration used the ASP.NET Core migration documentation, but this was not complete and a few extra steps were required. The starting point was a Blazor Server application secured using OpenID Connect for authentication. The target system is a Blazor Web application using the “InteractiveServer” rendermode.

History

2024-02-12 Updated to support CSP nonces

Code: https://github.com/damienbod/BlazorServerOidc

Migration

The following Blazor Server application was used as a starting point:

https://github.com/damienbod/BlazorServerOidc/tree/main/BlazorServerOidc

This is a simple application using .NET 8 and OpenID Connect to implement the authentication flow. Security headers are applied and the user can login or logout using OpenIddict as the identity provider.

As in the migration guide, steps 1-3, the Routes.razor was created and the imports were extended. Migrating the contents of the Pages/_Host.cshtml to the App.razor was more complicated. I have a Layout in the original application and this needed migration into the App file as well.

This completed Blazor Web App.razor file looked like this:

@inject IHostEnvironment Env <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <base href="/" /> <link rel="stylesheet" href="css/bootstrap/bootstrap.min.css" /> <link href="css/site.css" rel="stylesheet" /> <link href="BlazorWebFromBlazorServerOidc.styles.css" rel="stylesheet" /> <HeadOutlet @rendermode="InteractiveServer" /> </head> <body> <Routes @rendermode="InteractiveServer" /> <script nonce="@BlazorNonceService.Nonce" src="_framework/blazor.web.js"></script> </body> </html>

The App.razor uses the routes component. Inside the routes component, the CascadingAuthenticationState is used and a new component for the layout called the MainLayout.

@inject NavigationManager NavigationManager <CascadingAuthenticationState> <Router AppAssembly="@typeof(Program).Assembly"> <Found Context="routeData"> <AuthorizeRouteView RouteData="@routeData" DefaultLayout="@typeof(Layout.MainLayout)"> <NotAuthorized> @{ var returnUrl = NavigationManager.ToBaseRelativePath(NavigationManager.Uri); NavigationManager.NavigateTo($"api/account/login?redirectUri={returnUrl}", forceLoad: true); } </NotAuthorized> <Authorizing> Wait... </Authorizing> </AuthorizeRouteView> </Found> <NotFound> <LayoutView Layout="@typeof(Layout.MainLayout)"> <p>Sorry, there's nothing at this address.</p> </LayoutView> </NotFound> </Router> </CascadingAuthenticationState>

The MainLayout component uses two more new razor components, one for the nav menu and one for the login, logout component.

@inherits LayoutComponentBase <div class="page"> <div class="sidebar"> <NavMenu /> </div> <main> <div class="top-row px-4"> <LogInOrOut /> </div> <article class="content px-4"> @Body </article> </main> </div> <div id="blazor-error-ui"> An unhandled error has occurred. <a href="" class="reload">Reload</a> <a class="dismiss">🗙</a> </div>

The login, logout component uses the original account controller and improved the logout.

@inject NavigationManager NavigationManager <AuthorizeView> <Authorized> <div class="nav-item"> <span>@context.User.Identity?.Name</span> </div> <div class="nav-item"> <form action="api/account/logout" method="post"> <AntiforgeryToken /> <button type="submit" class="nav-link btn btn-link text-dark"> Logout </button> </form> </div> </Authorized> <NotAuthorized> <div class="nav-item"> <a href="api/account/login?redirectUri=/">Log in</a> </div> </NotAuthorized> </AuthorizeView>

The program file was updated like in the migration docs. Blazor Web does not support reading the HTTP headers from inside a Blazor component and so the security headers were weakened which is a very bad idea. CSP nonces are not supported and so a super web security feature is lost if updating to Blazor Web. I believe moving forward, the application should be improved.

using BlazorWebFromBlazorServerOidc.Data; using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.OpenIdConnect; using Microsoft.AspNetCore.Authorization; using Microsoft.AspNetCore.Mvc; using Microsoft.AspNetCore.Mvc.Authorization; using Microsoft.IdentityModel.JsonWebTokens; using Microsoft.IdentityModel.Protocols.OpenIdConnect; using Microsoft.IdentityModel.Tokens; namespace BlazorWebFromBlazorServerOidc; public class Program { public static void Main(string[] args) { var builder = WebApplication.CreateBuilder(args); builder.Services.TryAddEnumerable(ServiceDescriptor.Scoped<CircuitHandler, BlazorNonceService> (sp => sp.GetRequiredService<BlazorNonceService>())); builder.Services.AddScoped<BlazorNonceService>(); builder.Services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie() .AddOpenIdConnect(options => { builder.Configuration.GetSection("OpenIDConnectSettings").Bind(options); options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = "name" }; }); builder.Services.AddRazorPages().AddMvcOptions(options => { var policy = new AuthorizationPolicyBuilder() .RequireAuthenticatedUser() .Build(); options.Filters.Add(new AuthorizeFilter(policy)); }); builder.Services.AddRazorComponents() .AddInteractiveServerComponents(); builder.Services.AddSingleton<WeatherForecastService>(); builder.Services.AddControllersWithViews(options => options.Filters.Add(new AutoValidateAntiforgeryTokenAttribute())); var app = builder.Build(); JsonWebTokenHandler.DefaultInboundClaimTypeMap.Clear(); if (!app.Environment.IsDevelopment()) { app.UseExceptionHandler("/Error"); app.UseHsts(); } // Using an unsecure CSP as CSP nonce is not supported in Blazor Web ... app.UseSecurityHeaders( SecurityHeadersDefinitions.GetHeaderPolicyCollection(app.Environment.IsDevelopment(), app.Configuration["OpenIDConnectSettings:Authority"])); app.UseMiddleware<NonceMiddleware>(); app.UseHttpsRedirection(); app.UseStaticFiles(); app.UseRouting(); app.UseAuthentication(); app.UseAuthorization(); app.UseAntiforgery(); app.MapRazorPages(); app.MapControllers(); app.MapRazorComponents<App>() .AddInteractiveServerRenderMode().RequireAuthorization(); app.Run(); } }

With the weakened security headers the application works and the authentication flow works.

Conclusion

Blazor Web in the InteractiveServer mode can use CSP nonces and it is possible to implement a secure web application.

Links

https://learn.microsoft.com/en-us/aspnet/core/migration/70-80

Securing a Blazor Server application using OpenID Connect and security headers

https://github.com/dotnet/aspnetcore/issues/53192

https://github.com/dotnet/aspnetcore/issues/51374

https://github.com/javiercn/BlazorWebNonceService

Sunday, 14. January 2024

Jon Udell

7 Guiding Principles for Working with LLMs

Here’s the latest installment in the series on working with LLMS: 7 Guiding Principles for Working with LLMs. The rest of the series: 1 When the rubber duck talks back 2 Radical just-in-time learning 3 Why LLM-assisted table transformation is a big deal 4 Using LLM-Assisted Coding to Write a Custom Template Function 5 Elevating … Continue reading 7 Guiding Principles for Working with LLMs

Wednesday, 20. December 2023

Mike Jones: self-issued

Ten Years of OpenID Connect and Looking to the Future

Ten years ago today the drafts that would be approved as the final OpenID Connect specifications were published, as announced in my post Fourth and possibly last Release Candidates for final OpenID Connect specifications and Notice of 24 hour review period. The adoption of OpenID Connect has exceeded our wildest expectations. The vast majority of […]

Ten years ago today the drafts that would be approved as the final OpenID Connect specifications were published, as announced in my post Fourth and possibly last Release Candidates for final OpenID Connect specifications and Notice of 24 hour review period.

The adoption of OpenID Connect has exceeded our wildest expectations. The vast majority of federated signins to sites and applications today use OpenID Connect. Android, AOL, Apple, AT&T, Auth0, Deutsche Telekom, ForgeRock, Google, GrabTaxi, GSMA Mobile Connect, IBM, KDDI, Microsoft, NEC, NRI, NTT, Okta, Oracle, Orange, Ping Identity, Red Hat, Salesforce, Softbank, Symantec, T-Mobile, Telefónica, Verizon, Yahoo, and Yahoo! Japan, all use OpenID Connect, and that’s just the tip of the iceberg. While OpenID Connect is “plumbing” and not a consumer brand, it’s filling a need and doing it well.

It’s fitting that the second set of errata corrections to the OpenID Connect specifications were just approved, as described in the post Second Errata Set for OpenID Connect Specifications Approved. While we are proud of the quality of the final specifications, with 9 3/4 years of thousands of developers using and deploying the specifications, it’s unsurprising that issues would be found that needed clarification and correction.

The updated OpenID Connect specifications have just been submitted to the International Organization for Standardization (ISO) for Publicly Available Submission (PAS) status. Approved PAS submissions are published as ISO specifications. This will foster adoption in jurisdictions that require using standards that are published by organizations with international treaty status.

Celebrations of the tenth anniversary of the approval of OpenID Connect will occur worldwide in 2024. The first will be in Asia at the OpenID Summit Tokyo in January. The second will be in the Americas at Identiverse in May. The third will be in Europe at the European Identity and Cloud Conference in June. Join us at these events for the celebrations!

I can’t wait to see what the next decade brings for OpenID Connect!