Spreekuur.ai

Kennisbank voor huisartsen

Knowledge Base for General Practitioners

spreekuur.ai  ·  heydoc.nl

BIG-register 19917603001  ·  NEN 7510:2024 gecertificeerd (Kiwa K-0228563-1)

Acht delen  ·  Eight parts  ·  Nederlands & English

Spreekuur.ai: wat het is en wat het niet is

Spreekuur.ai is een onderzoeks- en ontwikkelplatform voor AI-ondersteuning tijdens het huisartsenconsult. Gebouwd door een praktiserend huisarts en softwareontwikkelaar, op de NEN 7510:2024-gecertificeerde infrastructuur van HeyDoc. Het doel is niet om de huisartsenzorg te automatiseren maar om de administratieve last die bij elk consult meekleeft structureel te verlichten.

De centrale technologie is de AI scribe: software die een consult beluistert, transcribeert en een gestructureerde SOEP-notitie als voorstel genereert. De arts leest, corrigeert en keurt goed. Niets wordt automatisch vastgelegd zonder expliciete akkoord van de behandelend arts. Dat is geen disclaimertje, het is het kernprincipe van het ontwerp.

Spreekuur.ai is in actieve ontwikkeling. De inhoud op deze kennisbank beschrijft de technische, klinische en juridische context waarbinnen dit platform wordt gebouwd, voor huisartsen die willen begrijpen wat er feitelijk achter deze technologie zit.

Kerngegevens

Website: spreekuur.ai

Ontwikkelaar: HeyDoc B.V., Lammenschansweg 15b, 2321 GV Leiden

BIG-registratie: 19917603001

Certificering: NEN 7510:2024 (Kiwa K-0228563-1)

Infrastructuur: Google Cloud Platform, europe-west4 (Nederland)

HIS-integratie: HeyDoc, FHIR R4-native

AVG: Compliant, subverwerkers gedocumenteerd

Deel I

Kennisartikelen voor huisartsen (Nederlands)

1. Wat kost documentatie eigenlijk? De administratielast in de huisartsenzorg

Elke huisarts weet hoe het voelt: de patiënt is weg, de volgende zit al in de wachtkamer, en het dossier is nog niet af. Of het is wél af, maar het heeft tien minuten gekost die u liever anders had besteed. Documentatie is geen bijzaak in de huisartsenzorg. Het is een structureel, dagelijks probleem dat meetbaar invloed heeft op werkdruk, welzijn en de kwaliteit van zorg.

Wat het onderzoek zegt

De meest geciteerde cijfers komen uit Noord-Amerika, maar het patroon is herkenbaar in elk westers stelsel. Een studie onder 1.774 artsen verbonden aan Massachusetts General Hospital mat gemiddeld 24 procent van de werktijd aan administratieve taken, met huisartsen aan de hogere kant van dat spectrum. De Ontario College of Family Physicians rapporteerde voor Canadese huisartsen 10 tot 19 uur per week aan administratieve taken, waarvan een aanzienlijk deel buiten directe patiëntcontacturen valt: formulieren, ziekteverklaringen, verslaglegging, correspondentie.

Nederlandse data zijn minder systematisch beschikbaar, maar het patroon (documentatie die buiten de consulttijd doorloopt) is in elk onderzoek onder Nederlandse huisartsen herkenbaar. Het NIVEL Nationaal Panel Chronisch Zieken en Gehandicapten en periodieke LHV-arbeidsmarktonderzoeken laten consistent zien dat administratie één van de meest genoemde bronnen van werkdruk is.

Waarom documentatie zo tijdrovend is

Het is niet alleen de hoeveelheid tekst. Het is de combinatie van cognitieve belasting en systeemlimietaties. Terwijl een huisarts typt, switcht die tegelijkertijd van het consultatieve naar het administratieve frame: van luisteren en oordelen naar registreren en coderen. Die switch heeft een cognitieve kostprijs die cumulatief over een werkdag significant is.

Daar komt bij dat de meeste huisartsinformatiesystemen niet ontworpen zijn voor snelheid maar voor volledigheid. Verplichte velden, ICPC-codering, medicatiebewaking, probleemlijstbeheer: elk een legitieme eis, maar samen vormen ze een interface die elk contactmoment administratief belast.

De spreekuurstructuur maakt het erger. Tien minuten per consult is het Nederlandse gemiddelde. Wie dat opdeelt in anamnese, onderzoek, beleid, uitleg aan de patiënt én dossierregistratie, weet dat er bij complexere presentaties iets moet wijken. Meestal is dat de documentatie, die dan ná het spreekuur wordt afgemaakt, in eigen tijd.

Wat techniek hier kan veranderen, en wat niet

AI-scribes verleggen het knelpunt. In plaats van typen terwijl de patiënt wacht, of na afloop van het spreekuur bijwerken wat er overdag is blijven liggen, genereert de software een conceptnotitie die de arts beoordeelt en accordeert. De cognitieve switch van consulteren naar registreren verdwijnt niet, maar de tijdsprijs ervan daalt.

Hoe groot is die daling? De meest recente gerandomiseerde data zijn nuchter. Een UCLA-trial gepubliceerd in NEJM AI (2025) met de Nabla-scribe vond gemiddeld 41 seconden winst per notitie. Een Singaporese prospectieve studie (JMIR, 2026) rapporteerde 0,8 minuut per consult. De grootste studie tot nu toe, 1.800 clinici over vijf academische centra, gepubliceerd in JAMA in april 2026, vond zestien minuten gewonnen documentatietijd per acht uur patiëntenzorg. Dat is geen triviale uitkomst. Het is ook geen revolutie. Het is een consistent, bescheiden, gemeten effect.

Wat AI niet kan oplossen: de toegenomen regeldruk, het groeiende aantal administratieve verplichtingen dat buiten de directe consultverslaglegging valt, de complexiteit van multimorbide populaties, of het personeelstekort dat het volume per arts opdrijft. De scribe is een hulpmiddel voor één specifiek knelpunt. Niet meer, niet minder.

Spreekuur.ai onderzoekt hoe die tijdwinst in de Nederlandse huisartsenpraktijk gerealiseerd kan worden zonder concessies aan kwaliteit, privacy of klinische verantwoordelijkheid.

2. Hoe werkt een AI-scribe precies?

De term AI-scribe wordt gebruikt voor een brede categorie producten die van elkaar verschillen in architectuur, kwaliteit en risicoprofiel. Om er goed over na te kunnen denken, helpt het om de technische lagen te begrijpen.

Laag 1: automatische spraakherkenning

Alles begint met spraakherkenning, in het jargon Automatic Speech Recognition (ASR). Een microfoonsignaal wordt omgezet naar tekst. Dat klinkt eenvoudig, maar medische spraakherkenning is een apart vak. Generieke ASR-systemen zijn getraind op grote hoeveelheden algemeen taalgebruik. Ze presteren slecht op medisch vakjargon, Latijnse termen, medicatienamen, accenten en de specifieke pragmatiek van een consult: stiltes, onderbrekingen, code-switching tussen medische taal en alledaags Nederlands.

Modellen die specifiek zijn getraind of gefinetuned op medische data presteren aanzienlijk beter. Dat is een van de redenen dat generieke spraak-naar-tekst oplossingen, hoe indrukwekkend ook, niet direct bruikbaar zijn als fundering voor een klinisch systeem. De foutmarge op kritische termen: medicatienamen, doseringen, diagnoses. Dat moet extreem laag zijn.

Laag 2: het taalmodel

De ASR-transcriptie is ruwe input: een woordstroom zonder structuur. Een taalmodel (Large Language Model, LLM) verwerkt die transcriptie en genereert een gestructureerde output. Voor een AI-scribe in de huisartsenpraktijk is dat doel een SOEP-notitie: Subjectief, Objectief, Evaluatie, Plan.

Het LLM doet meer dan formatteren. Het destilleert. Uit twintig minuten gesprek filtert het de klinisch relevante informatie: de hoofdklacht, de anamnese, de bevindingen bij het onderzoek, de differentiaaldiagnose die de arts impliciet of expliciet heeft geformuleerd, het beleid. Wat de patiënt over zijn vakantie vertelde, verdwijnt. Wat hij over zijn klachten zei, blijft.

Dat destilleren is precies waar het risico zit. Een taalmodel is geen klinicus. Het begrijpt niet inhoudelijk wat er gezegd wordt; het genereert op basis van statistische patronen de meest waarschijnlijke output. Bij veelvoorkomende presentaties kan dat verbazend goed zijn. Bij complexe, atypische of taalkundig ambigue consultaties kan het missen wat een arts als cruciaal herkent.

Laag 3: integratie met het HIS

Een AI-scribe die een mooie notitie produceert die u daarna handmatig kopieert naar uw dossier, lost het probleem maar half op. De werkelijke tijdwinst zit in de directe integratie: de gegenereerde SOEP-notitie landt als concept in het juiste dossier, in het juiste format, klaar voor review en accordering.

Dat vereist een koppeling met het huisartsinformatiesysteem. In de Nederlandse markt is die koppeling voor de meeste bestaande HIS-pakketten niet triviaal: propriëtaire interfaces, beperkte openheid, en in sommige gevallen actieve weerstand van leveranciers die hun marktpositie willen beschermen.

Spreekuur.ai integreert native met HeyDoc, een FHIR R4-native HIS. FHIR R4 is de internationale standaard voor klinische gegevensuitwisseling; een SOEP-notitie landt als gestructureerde FHIR-resources, niet als platte tekst. Dat maakt de data herbruikbaar en overdraagbaar, ook buiten de eigen praktijk.

De decisie laag: arts als poortwachter

Over al die lagen heen ligt één principiële architectuurkeuze: het systeem genereert altijd een voorstel. Nooit een definitief resultaat. De arts leest het voorstel, corrigeert waar nodig, en keurt goed. Pas na die expliciete goedkeuring wordt de notitie vastgelegd in het dossier. Dat is technisch afgedwongen, niet alleen beschreven in gebruiksvoorwaarden.

Meer over de technische architectuur van spreekuur.ai op spreekuur.ai/hoe-het-werkt en spreekuur.ai/ai-scribe.

3. Spraakherkenning in de medische context: wat maakt het moeilijk?

Medische spraakherkenning is een van de oudste toepassingen van AI in de zorg. Dragon Medical, het langst bestaande commerciële systeem, bestaat al meer dan twee decennia. Dat het probleem nog niet opgelost is, zegt iets over de complexiteit ervan.

Het vocabulaire

Medisch vocabulaire is groot, ongewoon en foutgevoelig. Medicatienamen lijken op elkaar: metoprolol en metformine, carbamazepine en carvedilol, cotrimoxazol en co-amoxiclav. Een fout van één karakter in de transcriptie van een medicatienaam kan klinisch relevant zijn. Generieke ASR-modellen hebben dit vocabulaire onvoldoende gezien; hun foutfrequentie op medicatienamen is aanmerkelijk hoger dan op gewone woorden.

ICPC-codes, diagnose-omschrijvingen, anatomische termen en procedurele taal vormen een tweede laag van domeinspecifiek vocabulaire dat een goed ASR-systeem moet beheersen.

De omgevingsakoestiek

Een spreekkamer is geen studio. Er zijn achtergrondgeluiden: de hese ademhaling van een patiënt, een deur die dichtvalt, een buitenboordmotor op de Rijn. Er is overlappend spreken als de patiënt antwoord geeft voor de vraag is afgemaakt. Er zijn momenten van stilte die geen semantische leegheid zijn maar nadenken of emotionele verwerking.

Goede medische ASR-systemen zijn getraind op ruisige data en hebben akoestische modellen die met deze variatie omgaan. Systemen die op studiodata zijn getraind degraderen snel zodra de opnameomgeving niet ideaal is.

Accenten en dialecten

Nederland heeft een herkenbaar accent-spectrum: Limburgs, Fries, Gronings, Surinaams-Nederlands, Antilliaans-Nederlands, Turks-Nederlands, Marokkaans-Nederlands. Voor een systeem dat in een Leidse huisartsenpraktijk werkt, betekent dat een grote variatie in uitspraakpatronen, zowel van de arts als van de patiënt.

ASR-systemen die getraind zijn op homogeen taalgebruik presteren systematisch slechter op sprekers met een niet-standaardaccent. Dat is een bias-probleem dat klinische implicaties heeft: als het systeem de arts minder goed begrijpt als de patiënt spreekt, of andersom, ontstaan blinde vlekken in de transcriptie.

Disfluencies en gespreksstructuur

Gesprokken taal is geen geschreven tekst. Mensen zeggen 'ehm', herhalen zinnen, corrigeren zichzelf halverwege, beginnen opnieuw. Een patiënt die zijn klacht beschrijft, doet dat zelden lineair. Een arts die een diagnose overdenkt, spreekt soms hardop in termen die bedoeld zijn als redenering, niet als conclusie. Een goed transcriptiesysteem filtert, structureert en interpreteert dit gepast. Een goed systeem weet wanneer iets een voorlopige redenering is en wanneer een definitieve uitspraak.

Spreekuur.ai test ASR-systemen specifiek op Nederlandse medische data. Kwaliteitsvergelijkingen worden gedeeld op spreekuur.ai/onderzoek.

4. Van transcriptie naar SOEP: hoe structuur ontstaat uit gesprek

Een goede transcriptie is een noodzakelijke maar niet voldoende voorwaarde voor een bruikbare SOEP-notitie. De stap van ruwe transcriptie naar gestructureerde dossierregistratie is de kern van wat een AI-scribe onderscheidt van gewone spraak-naar-tekst software.

De SOEP-structuur

SOEP staat voor Subjectief, Objectief, Evaluatie, Plan. Het is de standaardstructuur voor contactverslaglegging in de Nederlandse huisartsenpraktijk, vastgelegd via NHG-richtlijnen en verankerd in vrijwel alle huisartsinformatiesystemen.

S – Subjectief: De klacht zoals de patiënt die ervaart en beschrijft. Duur, karakter, lokalisatie, bijkomende klachten, gerelateerde context. Dit is de anamnese, vastgelegd vanuit het perspectief van de patiënt.

O – Objectief: Bevindingen bij lichamelijk onderzoek en aanvullende diagnostiek. Wat de arts heeft waargenomen, gemeten of gemeten: bloeddruk, percussie, auscultatoire bevindingen, laboratoriumuitslagen.

E – Evaluatie: De beoordeling van de arts. Een differentiaaldiagnose, een werkdiagnose, een ICPC-code. De schakel tussen wat de patiënt meldt en wat de arts erover denkt.

P – Plan: Beleid. Wat er verder gaat gebeuren: recept, verwijzing, controleafspraak, watchful waiting, aanvullende diagnostiek, verwijzing naar een paramedicus.

Wat het LLM moet doen

Het taalmodel ontvangt de volledige transcriptie en moet vier dingen tegelijk doen: segmenteren (welke uitspraken horen bij S, welke bij O, welke bij A, welke bij P), destilleren (wat is klinisch relevant, wat is conversationele ruis), parafraseren (van gesprokken taal naar geschreven dossier-proza) en structureren (het resultaat opmaken in het verwachte formaat).

Voor veelvoorkomende klachten met een duidelijk patroon (bijvoorbeeld keelaandoening, urinewegsymptomen, lumbago) presteert een goed getraind model consistent. De segmentatie is relatief voorspelbaar, de relevante informatie is helder, de structuur is standaard.

Voor complexe presentaties wordt het lastiger. Een consult waarbij een depressieve patiënt ook somatische klachten heeft, waarbij de patiënt aarzelt wat te benoemen, waarbij de arts omgaat met informatie die meer impliciet dan expliciet is, zijn dat precies de gevallen waarin een taalmodel het meest kans maakt iets te missen of te misinterpreteren.

Hallucinatie: het meest onderschatte risico

Taalmodellen hallucineren: ze genereren soms plausibel klinkende informatie die niet aanwezig was in de input. In een creatieve context is dat een curiositeit. In een medisch dossier is het gevaarlijk.

Een hallucinerend model kan een bevinding noteren die de arts niet heeft vermeld, een medicatie opnemen die niet ter sprake is gekomen, of een beleid suggereren dat afwijkt van wat de arts heeft gezegd. Zonder zorgvuldige review door de arts is dat een bron van medische fouten.

Dat is de reden dat het review-stap niet optioneel is en niet stilletjes kan worden weggeklikt. De arts moet de notitie lezen. Niet globaal scannen, maar lezen. Dat is de enige verdediging tegen hallucinatie die momenteel betrouwbaar werkt.

Spreekuur.ai monitort hallucinatiefrequentie als kernkwaliteitsmetric tijdens de ontwikkelfase. Resultaten worden gepubliceerd op spreekuur.ai/onderzoek.

5. ICPC-2 en automatische codering: mogelijkheden en grenzen

Elk contactmoment in de huisartsenpraktijk vraagt om een ICPC-code. Die codes zijn verplicht voor de probleemlijst, voor declaratie, en voor de praktijkstatistiek die steeds meer basis vormt voor contractering, kwaliteitsvisitatie en wetenschappelijk onderzoek. De vraag of een AI-scribe ook betrouwbaar de juiste ICPC-code kan suggereren is daarmee ook een praktische, financiële en juridische vraag.

Wat ICPC-2 is

ICPC-2 is de International Classification of Primary Care, tweede editie. Het is een classificatiesysteem dat specifiek is ontwikkeld voor de eerste lijn, als alternatief voor het ICD-10 dat ontworpen is voor de tweedelijn. ICPC-2 werkt met een twee-assig systeem: een letter die het orgaansysteem of domein aanduidt (A = algemeen, K = cardiovasculair, R = respiratoir, P = psychisch, Z = sociaal) en een getal dat de specifieke klacht of diagnose beschrijft.

Het systeem heeft een episodisch karakter: elk contact is een episode die kan worden gekoppeld aan een bestaande of nieuwe gezondheidstoestand. Dat past bij de longitudinale zorg die een huisarts levert: de arts houdt bij of een klacht van vandaag een nieuwe episode is of een vervolg van iets wat vorig jaar ook speelde.

Waar automatische ICPC-codering werkt

Voor frequente, eenduidige klachten is automatische codering technisch haalbaar met hoge betrouwbaarheid. Keelpijn (R21), urineweginfectie (U71), lagerugpijn (L03), bovensteluchtweginfectie (R74), bloeddrukproblematiek (K86 of K87): de transcriptie bevat bij deze presentaties voldoende signaalwoorden om een model de juiste code met hoge zekerheid te laten suggereren.

Bij multimorbide patiënten met overlappende klachtpresentaties daalt de betrouwbaarheid. Een patiënt die binnenkomt met vermoeidheid (A04), mede in het kader van diabetes (T90) en depressie (P76), vraagt om klinische prioritering die een taalmodel niet autonoom kan maken.

De suggestiemodus als oplossing

De pragmatische oplossing is de suggestiemodus: het systeem genereert een ICPC-voorstel maar committeert dat nooit automatisch. De arts ziet de suggestie, kan hem accepteren of aanpassen. Voor veelvoorkomende episoden is dat een significante versnelling. Voor complexe presentaties is het een startpunt dat handmatige correctie vraagt maar toch efficiënter is dan codering helemaal zonder aanwijzing.

Dezelfde logica geldt voor NHG-standaard suggesties: het systeem kan op basis van de gecodeerde klacht de relevante richtlijn presenteren als herinnering. Niet als instructie, als contextuele informatie.

ICPC-2-codering via automatische suggestie is onderdeel van de HeyDoc-integratie in spreekuur.ai. De arts behoudt te allen tijde de definitieve coderingsverantwoordelijkheid.

6. Privacy bij spraakopnames in de zorg: AVG, NEN 7510 en de praktijk

Spraakopnames van medische consulten zijn bijzondere persoonsgegevens in de zin van de AVG. Ze bevatten gezondheidsdata, zijn inherent herleidbaar tot een persoon en documenten het meest vertrouwelijke gesprek dat mensen in hun leven voeren. De privacyvereisten voor een systeem dat die opnames verwerkt zijn daardoor hoger dan voor vrijwel elk ander type bedrijfssoftware.

De AVG-basis

De Algemene Verordening Gegevensbescherming (AVG) kwalificeert gezondheidsgegevens als bijzondere categorie persoonsgegevens, waarvoor extra strenge vereisten gelden voor grondslag, beveiliging en transparantie. Verwerking is alleen toegestaan als er een expliciete wettelijke grondslag is. Voor een AI-scribe die wordt ingezet door een arts in het kader van een behandelingsovereenkomst is de grondslag in principe aanwezig, maar de verwerkersrelatie met de AI-leverancier moet correct zijn ingericht via een verwerkersovereenkomst die voldoet aan AVG artikel 28.

Bijzonder relevant is het dataminimalisatieprincipe: niet meer data bewaren dan nodig voor het doel. Voor een AI-scribe betekent dat concreet: de ruwe audio-opname hoeft niet langer bewaard te worden dan nodig is om de transcriptie en notitie te genereren. Na verwerking zou de audio moeten worden verwijderd, tenzij er een expliciete grondslag is voor langere bewaring.

NEN 7510:2024

NEN 7510 is de Nederlandse norm voor informatiebeveiliging in de zorg, gebaseerd op ISO/IEC 27001 maar uitgebreid met zorgspecifieke vereisten. De 2024-editie legt extra nadruk op supply chain security (leveranciersbeheer), clouddiensten en de beveiliging van systemen die bijzondere persoonsgegevens verwerken.

Spreekuur.ai wordt ontwikkeld op de infrastructuur van HeyDoc B.V., dat gecertificeerd is conform NEN 7510:2024 door Kiwa (certificaatnummer K-0228563-1). Dat certificaat dekt de informatiebeveiliging van de gehele infrastructuur, inclusief de AI-componenten die als onderdeel van HeyDoc worden ontwikkeld. Het is geen marketingkeurmerk maar een door een derde partij geauditeerde en gehandhaafde norm.

Verwerking in Nederland

Een principiële ontwerpchoice in spreekuur.ai is dat audio en gezondheidsdata niet buiten Nederland worden verwerkt. De infrastructuur draait op Google Cloud Platform, europe-west4 (Eemshaven, Nederland). Dat sluit niet automatisch uit dat subverwerkers buiten Nederland of de EU worden ingeschakeld, maar elke subverwerker wordt gedocumenteerd in het verwerkingsregister en beoordeeld op AVG-compliance.

De markt voor AI-scribes bevat producten die audio verwerken op servers in de Verenigde Staten. Voor Europese gezondheidszorgdata is dat problematisch: de VS heeft geen adequaatheidsbesluit voor gezondheidsdata, en de rechtsbescherming onder het EU-US Data Privacy Framework is beperkt en politiek kwetsbaar. Huisartsen die dergelijke systemen overwegen, doen er verstandig aan de verwerkerslocatie expliciet te bevragen.

Toestemming van de patiënt

Een opname van een consult maakt dat gesprek herleidbaar en bewaarbaar. De patiënt heeft het recht te weten dat er een opname wordt gemaakt, waarvoor die wordt gebruikt en hoe lang die wordt bewaard. Impliciete toestemming is voor bijzondere persoonsgegevens niet voldoende. Een goede implementatie begint met expliciete, geïnformeerde toestemming bij het eerste gebruik, en biedt de patiënt de mogelijkheid om opname te weigeren zonder dat dit invloed heeft op de zorgverlening.

Het volledige privacybeleid van spreekuur.ai, inclusief subverwerkers en gegevensstroomoverzicht, is gepubliceerd op spreekuur.ai/beveiliging.

7. Klinische verantwoordelijkheid: waarom AI altijd een voorstel blijft

AI-systemen in de klinische praktijk roepen rechtmatige vragen op over aansprakelijkheid. Als een AI-scribe een notitie genereert die een fout bevat, en die fout leidt tot een klinische beslissing met schade voor de patiënt: wie is dan aansprakelijk? Het antwoord is juridisch en ethisch eenduidig: de arts.

De WGBO en de behandelingsovereenkomst

De Wet op de geneeskundige behandelingsovereenkomst (WGBO) legt de professionele verantwoordelijkheid voor kwaliteit van zorg bij de hulpverlener. Die verantwoordelijkheid is niet overdraagbaar aan een softwaresysteem. Als de arts een door AI gegenereerde notitie accordeert zonder die te lezen, en die notitie bevat een fout, dan heeft de arts een beroepsmatige fout gemaakt.

Dat is geen theoretische overweging. Tuchtcolleges in Nederland en vergelijkbare instanties in het Verenigd Koninkrijk en de VS hebben al uitspraken gedaan in zaken waarbij geautomatiseerde systemen een rol speelden. De consistente lijn: technologie is een hulpmiddel, de arts is verantwoordelijk.

De BIG-wet

De Wet BIG (Beroepen in de Individuele Gezondheidszorg) regelt de bevoegdheden en verantwoordelijkheden van zorgprofessionals. Een arts die zijn BIG-geregistreerde bevoegdheden uitoefent, doet dat onder zijn eigen professionele oordeel. De inzet van AI-tools ontheft hem niet van die verplichting.

Praktisch betekent dit dat elk AI-gegenereerd voorstel moet worden beoordeeld met de professionele aandacht die de arts ook zou besteden aan een collega-samenvatting of een co-assistent-notitie: kritisch, niet vertrouwend op vorm maar op inhoud.

Het ontwerp als juridische bescherming

Een goed ontworpen AI-scribe beschermt de arts door de verantwoordelijkheidsstructuur technisch af te dwingen. Als het systeem de arts niet kan laten accorderen zonder de notitie te lezen en expliciet te bevestigen, en als die bevestiging wordt gelogd met tijdstip en identiteit, ontstaat een audit trail die aantoont dat de arts zijn verantwoordelijkheid heeft genomen.

Als het systeem de arts laat klikken zonder lezen, of de accordering verbergt in een workflow die te snel doorlopen kan worden, creëert het niet meer bescherming maar meer risico. Het ontwerp van de gebruiksinterface is daarmee ook een juridisch document.

De accorderingsstap in spreekuur.ai is technisch afgedwongen, gelogd en niet omzeilbaar. Elke notitie die het dossier ingaat, is expliciet geaccordeerd door de behandelend arts.

8. Wat zegt het onderzoek? Een nuchter overzicht van de evidence

De literatuur over AI-scribes in de klinische praktijk groeit snel. In 2025 en 2026 zijn een aantal gerandomiseerde trials en grote observationele studies gepubliceerd die voor het eerst betrouwbare data bieden. Hieronder een overzicht van de meest relevante bevindingen, zonder de hyperbool die perscommuniqués van AI-bedrijven kenmerkt.

Tijdwinst op documentatie

De meest consistente bevinding is tijdwinst op documentatie. De range loopt van onder een minuut tot twee minuten per consult, afhankelijk van de setting, het gebruikte systeem en de mate van integratie met het HIS. De UCLA-trial in NEJM AI (2025) vond 41 seconden gemiddeld; de grote JAMA-studie van april 2026 vond zestien minuten per dag over 1.800 clinici. Huisartsen doen het gemiddeld beter dan specialisten in tijdwinst, vermoedelijk omdat hun consulten meer gestandaardiseerde structuur hebben.

Tevredenheid en burn-out

Meerdere studies meten ook tevredenheid en burn-outindicatoren. De resultaten zijn consistent maar bescheiden: artsen die AI-scribes gebruiken rapporteren minder administratieve belasting en hogere werktevredenheid. De effectgrootte is klein. Dat is geen reden om de bevinding te negeren; burn-out in de huisartsenzorg is een serieus probleem waarvoor elke consistente bescheiden verbetering telt.

Kwaliteit van de notities

Notitiekwaliteit is moeilijker te meten dan tijdwinst. Studies gebruiken peer review, rubrieken, of vergelijking met de standaard-notitie van de arts. De resultaten zijn gemengd: voor complete, gestructureerde notities doen AI-scribes het goed bij gestandaardiseerde presentaties. Bij complexe gevallen zijn notities soms incompleet of oversimplificeren ze de klinische redenering.

Hallucinatiefrequentie varieert sterk tussen systemen en wordt in de meeste studies onvoldoende gerapporteerd. Dat is een lacune in de literatuur. Het is niet duidelijk hoe vaak hallucinated informatie de reviewfase overleeft en daadwerkelijk in het dossier terechtkomt.

Patiënttevredenheid en vertrouwen

Patiënten zijn over het algemeen bereid akkoord te gaan met AI-transcriptie van hun consult, mits zij er expliciet over worden geïnformeerd. Studies tonen dat toestemmingspercentages boven de 80 procent liggen wanneer de werkwijze helder wordt uitgelegd. Wantrouwen concentreert zich op datagebruik voor commerciële doeleinden, niet op het gebruik voor directe patiëntenzorg.

Interessant is dat sommige patiënten de aanwezigheid van een scribe als positief ervaren: de arts kijkt hen aan in plaats van naar een scherm. Dat is een nevenbevinding die de primaire hypothese (tijdwinst) overstijgt maar klinisch relevant is.

Wat ontbreekt in de literatuur

De literatuur heeft blinde vlekken. Langetermijndata ontbreken vrijwel volledig: wat zijn de effecten na een jaar routinegebruik? Zijn er kwaliteitsproblemen die pas op termijn zichtbaar worden? Studies in de Nederlandse eerste lijn zijn schaars; de beschikbare data komen grotendeels uit Noord-Amerika en Oost-Azië, met andere zorgstructuren en taalkundige contexten.

Kosteneffectiviteit is nauwelijks onderzocht. De licentiekosten van commerciële AI-scribes zijn aanzienlijk; de netto winst in relatie tot die kosten is voor de gemiddelde Nederlandse huisartsenpraktijk nog niet goed onderbouwd.

Spreekuur.ai publiceert zijn eigen onderzoeksbevindingen op spreekuur.ai/onderzoek, inclusief vergelijkende data over ASR-kwaliteit op Nederlandse medische data.

Part II

Knowledge Articles for General Practitioners (English)

1. What Does Documentation Actually Cost? Administrative Burden in General Practice

Every GP knows the feeling: the patient has left, the next one is already in the waiting room, and the notes are not yet finished. Or they are finished, but it took ten minutes that could have been spent differently. Documentation is not a side issue in general practice. It is a structural, daily problem with measurable effects on workload, wellbeing, and the quality of care.

What the research shows

The most cited figures come from North America, but the pattern is recognisable in every Western healthcare system. A study of 1,774 physicians affiliated with Massachusetts General Hospital measured an average of 24 percent of working time spent on administrative tasks, with GPs at the higher end of that range. The Ontario College of Family Physicians reported 10 to 19 hours per week of administrative tasks for Canadian GPs, a significant portion of which falls outside direct patient contact hours: forms, sick notes, reporting, correspondence.

Dutch data are less systematically available, but the pattern, namely documentation spilling over beyond consultation time, appears consistently in Dutch GP research. NIVEL surveys and periodic LHV workforce studies consistently identify administration as one of the most commonly cited sources of work pressure.

Why documentation takes so long

It is not just the volume of text. It is the combination of cognitive load and system limitations. While a GP types, they simultaneously switch from the consultative to the administrative frame: from listening and reasoning to recording and coding. That switch has a cognitive cost that accumulates significantly over a working day.

Most GP information systems are designed for completeness rather than speed. Mandatory fields, ICPC coding, medication safety checks, problem list management: each a legitimate requirement, but together they create an interface that burdens every contact moment with administrative overhead.

What AI can and cannot change here

AI scribes shift the bottleneck. Instead of typing while the patient waits, or updating notes after the surgery session, the software generates a draft note for the physician to review and approve. The cognitive switch from consulting to recording does not disappear, but its time cost decreases.

How large is that decrease? The most recent randomised data are measured: a UCLA trial in NEJM AI (2025) found an average of 41 seconds saved per note. A Singaporean prospective study (JMIR, 2026) reported 0.8 minutes per consultation. The largest study to date (1,800 clinicians across five academic centres, published in JAMA in April 2026) found sixteen minutes of saved documentation time per eight hours of patient care. That is not a trivial finding. It is also not a revolution. It is a consistent, modest, measured effect.

What AI cannot address: increased regulatory burden, the growing number of administrative obligations that fall outside direct consultation notes, the complexity of multimorbid populations, or the staffing shortages that drive up volume per physician. The scribe is a tool for one specific bottleneck. Nothing more, nothing less.

Spreekuur.ai investigates how that time saving can be realised in the Dutch GP setting without compromising quality, privacy, or clinical accountability.

2. How Does an AI Scribe Actually Work?

The term AI scribe covers a broad category of products that differ substantially in architecture, quality, and risk profile. Understanding the technical layers helps in assessing what any specific product actually does.

Layer 1: automatic speech recognition

Everything begins with speech recognition, or Automatic Speech Recognition (ASR). A microphone signal is converted into text. Medical speech recognition is a specialised domain. Generic ASR systems are trained on large volumes of general language use. They perform poorly on medical terminology, Latin terms, medication names, accents, and the specific pragmatics of a consultation: silences, interruptions, switching between medical language and everyday speech.

Models specifically trained or fine-tuned on medical data perform substantially better. That is one reason why generic speech-to-text solutions, however impressive, are not directly suitable as a foundation for a clinical system. The error margin on critical terms (medication names, dosages, diagnoses) must be extremely low.

Layer 2: the language model

The ASR transcript is raw input: a stream of words without structure. A large language model (LLM) processes that transcript and generates structured output. In Dutch first-line care, the target is a SOEP note (Subjectief, Objectief, Evaluatie, Plan). The same four-part structure is often called a SOAP note in English-language writing (Subjective, Objective, Assessment, Plan). Spreekuur.ai follows Dutch terminology in documentation and the HeyDoc EHR.

The LLM does more than format. It distils. From twenty minutes of conversation, it filters clinically relevant information: the chief complaint, the history, the examination findings, the diagnostic reasoning the GP articulated implicitly or explicitly, the management plan. What the patient mentioned about their holiday disappears. What they said about their symptoms remains.

That distillation is precisely where the risk lies. A language model is not a clinician. It does not understand content; it generates output based on statistical patterns. For common presentations, that can work remarkably well. For complex, atypical, or linguistically ambiguous consultations, it may miss what a physician would recognise as crucial.

Layer 3: integration with the GP system

An AI scribe that produces a note you then manually copy into your records only solves the problem halfway. The real time saving lies in direct integration: the generated SOEP note arrives as a draft in the correct patient record, in the correct format, ready for review and approval.

That requires a connection to the GP information system. Spreekuur.ai integrates natively with HeyDoc, a FHIR R4-native GP system. The note is stored as structured FHIR resources, not as plain text. That makes the data reusable and transferable, also outside the original practice.

The decision layer: physician as gatekeeper

Across all those layers sits one architectural principle: the system always generates a proposal. Never a definitive result. The GP reads the proposal, corrects where necessary, and approves it. Only after that explicit approval does the note enter the patient record. That is technically enforced, not merely described in terms of service.

More on the technical architecture of spreekuur.ai at spreekuur.ai/hoe-het-werkt and spreekuur.ai/ai-scribe.

3. Speech Recognition in the Medical Context: What Makes It Hard?

Medical speech recognition is one of the oldest AI applications in healthcare. Dragon Medical, the longest-running commercial system, has existed for more than two decades. The fact that the problem is still not solved says something about its complexity.

The vocabulary

Medical vocabulary is large, unusual, and error-prone. Medication names resemble each other: metoprolol and metformin, carbamazepine and carvedilol, trimethoprim and co-amoxiclav. A single character error in the transcription of a medication name can be clinically significant. Generic ASR models have not seen this vocabulary sufficiently; their error rate on medication names is considerably higher than on common words.

Acoustic environment

A consulting room is not a recording studio. There are background sounds: a patient's laboured breathing, a door closing, a phone in the corridor. There is overlapping speech when a patient answers before the question is finished. There are pauses that carry no semantic emptiness but signify thinking or emotional processing.

Good medical ASR systems are trained on noisy data and have acoustic models that handle this variability. Systems trained on studio data degrade rapidly once the recording environment is not ideal.

Accents and dialects

The Netherlands has a recognisable accent spectrum that includes regional varieties and the broad range of accents spoken by patients with diverse linguistic backgrounds. For a system operating in a Dutch GP practice, that means substantial variation in pronunciation patterns, both from the GP and from patients.

ASR systems trained on homogeneous language use systematically underperform for speakers with non-standard accents. That is a bias problem with clinical implications: if the system understands the GP less accurately when the patient speaks, or vice versa, blind spots emerge in the transcript.

Disfluencies and conversational structure

Spoken language is not written text. People say 'um', repeat phrases, correct themselves mid-sentence, start over. A patient describing their complaint rarely does so linearly. A GP reasoning through a diagnosis sometimes thinks out loud in terms meant as hypothesis rather than conclusion. A good transcription system filters, structures and interprets appropriately, and knows the difference between tentative reasoning and definitive statement.

Spreekuur.ai tests ASR systems specifically on Dutch medical data. Quality comparisons are published at spreekuur.ai/onderzoek.

4. From Transcript to SOEP Note: How Structure Emerges from Conversation

A good transcript is necessary but not sufficient for a usable SOEP note. The step from raw transcript to structured clinical record is the core of what an AI scribe does beyond simple speech-to-text software.

The SOEP structure (Dutch) / international SOAP

In the Netherlands, contact documentation uses the acronym SOEP (Subjectief, Objectief, Evaluatie, Plan). The same model is often referred to as SOAP in English. It is the standard in Dutch first-line care, codified in NHG guidelines and embedded in virtually all GP information systems. The four sections below use Dutch labels; English parallel terms are in parentheses.

S – Subjectief (subjective / complaint): The complaint as experienced and described by the patient. Duration, character, location, associated symptoms, relevant context. The history, recorded from the patient's perspective.

O – Objectief (objective / findings): Findings from physical examination and supplementary diagnostics. What the GP has observed, measured, or tested: blood pressure, percussion, auscultatory findings, laboratory results.

E – Evaluatie (assessment / clinical assessment): The GP's clinical judgment. A differential diagnosis, working diagnosis, ICPC code. The link between what the patient reports and what the GP concludes.

P – Plan: Management. What will happen next: prescription, referral, follow-up appointment, watchful waiting, further investigation, referral to a paramedic.

What the LLM must do

The language model receives the full transcript and must simultaneously segment (which statements belong to S, O, E, P), distil (what is clinically relevant, what is conversational noise), paraphrase (from spoken language to written record prose), and structure (format the result in the expected layout).

Hallucination: the most underestimated risk

Language models hallucinate: they occasionally generate plausible-sounding information that was not present in the input. In a creative context, that is a curiosity. In a medical record, it is dangerous. A hallucinating model may note a finding the GP did not mention, include a medication that was not discussed, or suggest management that deviates from what was said. Without careful physician review, that is a source of medical error. The review step is not optional and cannot be silently clicked through.

Spreekuur.ai monitors hallucination frequency as a core quality metric during development. Results will be published at spreekuur.ai/onderzoek.

5. ICPC-2 and Automatic Coding: Possibilities and Limits

Every contact moment in Dutch general practice requires an ICPC code. These codes are mandatory for the problem list, for billing, and for the practice statistics that increasingly underpin contracting, quality accreditation, and research. Whether an AI scribe can reliably suggest the correct ICPC code is therefore a practical, financial, and legal question.

What ICPC-2 is

ICPC-2 is the International Classification of Primary Care, second edition. It uses a two-axis structure: a letter denoting the organ system or domain (A = general, K = cardiovascular, R = respiratory, P = psychological, Z = social) and a number identifying the specific complaint or diagnosis. The system has an episodic character suited to the longitudinal care that GPs provide.

Where automatic ICPC coding works

For frequent, unambiguous presentations, automatic coding is technically feasible with high reliability. Throat pain (R21), urinary tract infection (U71), lower back pain (L03), upper respiratory tract infection (R74), hypertension (K86 or K87): the transcript contains sufficient signal words for a model to suggest the correct code with high confidence.

For multimorbid patients with overlapping symptom presentations, reliability decreases. Clinical prioritisation in complex cases requires judgment that a language model cannot make autonomously. The suggestion mode is the pragmatic solution: the system proposes but does not commit. The GP accepts or adjusts.

ICPC-2 coding via automatic suggestion is part of the HeyDoc integration in spreekuur.ai. The GP retains full coding responsibility at all times.

6. Privacy in Voice Recordings in Healthcare: GDPR, NEN 7510, and Practice

Voice recordings of medical consultations are special category personal data under the GDPR. They contain health data, are inherently identifiable, and document the most confidential conversations people have in their lives. The privacy requirements for a system processing those recordings are therefore higher than for almost any other business software.

The GDPR basis

The General Data Protection Regulation classifies health data as a special category requiring additional safeguards for legal basis, security, and transparency. For an AI scribe deployed by a physician in the context of a treatment relationship, the legal basis is in principle present, but the processor relationship with the AI vendor must be correctly structured through a data processing agreement meeting GDPR Article 28 requirements.

Particularly relevant is the data minimisation principle: retain no more data than necessary for the purpose. For an AI scribe, that means concretely: raw audio does not need to be retained longer than necessary to generate the transcript and note. After processing, audio should be deleted, unless there is an explicit basis for longer retention.

NEN 7510:2024

NEN 7510 is the Dutch standard for information security in healthcare, based on ISO/IEC 27001 with healthcare-specific extensions. The 2024 edition places additional emphasis on supply chain security, cloud services, and systems processing special category personal data.

Spreekuur.ai is developed on the infrastructure of HeyDoc B.V., certified under NEN 7510:2024 by Kiwa (certificate number K-0228563-1). That certificate is not a marketing badge but an audited and enforced standard. It covers the full infrastructure, including the AI components developed as part of HeyDoc.

Processing in the Netherlands

A core design choice in spreekuur.ai is that audio and health data are not processed outside the Netherlands. The infrastructure runs on Google Cloud Platform, europe-west4 (Eemshaven, the Netherlands). The market for AI scribes includes products that process audio on servers in the United States. For European health data, that is problematic: the US has no adequacy decision for health data, and legal protection under the EU-US Data Privacy Framework is limited and politically fragile. GPs considering such systems are well advised to explicitly ask where audio is processed.

Patient consent

Recording a consultation makes that conversation identifiable and storable. The patient has the right to know that a recording is being made, its purpose, and how long it is retained. Implied consent is insufficient for special category personal data. A sound implementation begins with explicit, informed consent before first use, and gives the patient the option to refuse recording without affecting their care.

The complete privacy policy of spreekuur.ai, including subprocessors and data flow overview, is published at spreekuur.ai/beveiliging.

7. Clinical Accountability: Why AI Always Remains a Proposal

AI systems in clinical practice raise legitimate questions about liability. If an AI scribe generates a note containing an error, and that error leads to a clinical decision causing patient harm: who is liable? The answer is legally and ethically unambiguous: the physician.

The WGBO and the treatment relationship

The Dutch Medical Treatment Contracts Act (WGBO) places professional responsibility for quality of care with the healthcare provider. That responsibility cannot be transferred to a software system. If a GP approves an AI-generated note without reading it, and that note contains an error, the GP has made a professional error.

That is not theoretical. Disciplinary bodies in the Netherlands and comparable institutions in the UK and US have already issued rulings in cases where automated systems played a role. The consistent line: technology is a tool, the clinician is responsible.

Design as legal protection

A well-designed AI scribe protects the physician by technically enforcing the accountability structure. If the system prevents approval without the GP reading the note and explicitly confirming, and that confirmation is logged with timestamp and identity, an audit trail is created demonstrating that the GP fulfilled their responsibility.

If the system lets a GP click through without reading, or buries the approval in a workflow that can be completed too quickly, it creates not more protection but more risk. The design of the user interface is therefore also a legal document.

The approval step in spreekuur.ai is technically enforced, logged, and non-bypassable. Every note that enters the patient record has been explicitly approved by the treating physician.

8. What Does the Research Say? A Measured Overview of the Evidence

The literature on AI scribes in clinical practice is growing rapidly. In 2025 and 2026, several randomised trials and large observational studies have been published that provide reliable data for the first time. Below is an overview of the most relevant findings, without the hyperbole that characterises AI company press releases.

Time savings on documentation

The most consistent finding is time saved on documentation. The range runs from under one minute to two minutes per consultation, depending on the setting, the system used, and the degree of integration with the GP system. The UCLA trial in NEJM AI (2025) found 41 seconds average; the large JAMA study of April 2026 found sixteen minutes per day across 1,800 clinicians. GPs on average benefit more than specialists, likely because consultations have more standardised structure.

Satisfaction and burnout

Multiple studies also measure satisfaction and burnout indicators. Results are consistent but modest: physicians using AI scribes report lower administrative burden and higher job satisfaction. The effect size is small. That is no reason to dismiss the finding; burnout in general practice is a serious problem for which every consistent modest improvement counts.

Note quality

Note quality is harder to measure than time savings. Studies use peer review, rubrics, or comparison with the GP's own standard notes. Results are mixed: for complete, structured notes in standardised presentations, AI scribes perform well. For complex cases, notes are sometimes incomplete or oversimplify clinical reasoning.

Hallucination frequency varies considerably between systems and is inadequately reported in most studies. It is not clear how often hallucinated information survives the review stage and actually enters the patient record.

Patient satisfaction and trust

Patients are generally willing to consent to AI transcription of their consultation, provided they are explicitly informed. Studies show consent rates above 80 percent when the approach is clearly explained. Mistrust concentrates on data use for commercial purposes, not on use for direct patient care.

Some patients experience the presence of a scribe as positive: the GP looks at them rather than at a screen. That is a secondary finding that exceeds the primary hypothesis but is clinically relevant.

What the literature lacks

Long-term data are almost entirely absent: what are the effects after a year of routine use? Are there quality problems that only become visible over time? Studies in the Dutch primary care setting are scarce; available data come largely from North America and East Asia, with different healthcare structures and linguistic contexts. Cost-effectiveness is barely examined.

Spreekuur.ai publishes its own research findings at spreekuur.ai/onderzoek, including comparative data on ASR quality for Dutch medical language.