kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)
[personal profile] kaberett
In which the Internet is creepy: I spent Thursday & Friday nights at facesfriend's place. At no point did I connect to an internet via my laptop; at no point did I search the 'net for directions; and my phone Doesn't Internet and is in no wise associated with either Google or FB accounts. Most of our IMing is via gchat or IRC. How, then, is it that when I rocked up on facebook a little while ago from the Oxford Tube, it asked me if I lived in Cambridge, London, or facesfriend's area of town? BECAUSE IT HAS NEVER ASKED ME THAT BEFORE and Thursday was at most the second time I have been to that neck of the woods in my life.

It's all a bit meta

Date: 2014-11-30 12:08 pm (UTC)
hairyears: Spilosoma viginica caterpillar: luxuriant white hair and a 'Dougal' face with antennae. Small, hairy, and venomous (Default)
From: [personal profile] hairyears

Interesting phrase, "Lucky guess".

If I had a hundred little snippets of unrelated information, each of which allowed me to guess 'X' with a 1% probability of being right, I might be lucky about 40% of the time.

If I had thousands of little snippets, and I went looking for patterns that show logical relationships, I might do a bit better.

If I had a complete list of the links between these snippets - and links-of-links, with a few snippets of hard data (A and B and C have followed links to adverts about Mothers' Day cards, D and E did so after an interaction in their immediate circle of links, F and G have identical patterns of communication but no adlink followup, assign probability X that they, too, purchased a card...)

...I might get lucky all the time. Did you buy your Mothers' Day card from John Lewis?

I magine a world in which there are tens of thousands of little snippets about you. And metadata - links-of-links-of-links to people you communicate with - extending to hundreds of people you know, and thousands of people who interact with them.

A tiny fraction of these links land on hard data, like an online purchase, that identifies a mother-daughter relationship at 90% probability. More complex or nebulous relationships - patient-to-therapist, or political associations - don't have that.

However...

Imagine looking for patterns of links, times, places, actions, extending out to see the ripples of links and times and actions in all those people you know, every time you interact online.

And every time your friends act or interact online; and their friends, too: because this is actually about you. Or rather, this is about generating large volumes of low-quality metadata that's 'about you' in the sense that it is slightly relevant to you and usable if it's available in large volumes.

Patterns of possibilities and overlaps and correlations, none of them much over 1% likely to be right, but tens thousands of them exist. Quite possibly, millions, just for you.

And, just once, this web of links and groups and overlaps and shapes and probabalities triggers a routine that says "Ooh that's nearly 0.01% of a match to the relationship pattern 'patient to therapist', look for corroborating evidence in the following related patterns of communication between known therapy patients in their metalinks".

...And there's a reference data set of 30,000 known patient-to-therapist relationships, for patients *in your exact demographic* with a list of all patterns visible in their metadata, to trigger that alert and 'score' the corroborating searches.



It's a staggering amount of data. The metadata analysis algorithms are inhumanly complex, and the hashing functions that permit rapid searching for matches are relatively new - 'new', in the sense that it's economical to run them for everyone, every time, millions of times - but this is the world that we live in right now.

It isn't even necessary to read your emails: arguably, it's less useful, because 'semantic' processing to extract meaning from natural language is much, much harder for a computer than pattern-matching in communications metadata.

Nevertheless, someone's reading your emails, even if it isn't Google. And Google don't throw anything away, ever.





*I'm guessing. 64 just happens to be a really convenient number to use for this kind of thing. And that's the least of the speculations and guesses in this reply.

Profile

kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)
kaberett

May 2025

M T W T F S S
    1 2 3 4
5 6 7 8 9 10 11
12 13 1415 16 17 18
19 20 21 22 23 2425
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Powered by Dreamwidth Studios